Recent theoretical results show transformers cannot express sequential reasoning problems over long inputs, intuitively because their computational *depth* is bounded. The research investigates the expressive power of transformers with padding tokens, comparing it to chain of thought methods Our results in this work give a precise theoretical understanding of how padding and looping—two ways to dynamically expand the computational resources of a transformer at inference time—increase the expressive power of transformers. Ask others google google scholar semantic scholar internet archive scholar citeseerx pubpeer share record bluesky reddit bibsonomy linkedin persistent url Exact expressive power of transformers with padding.corrabs/2505.18948 (2025) manage site settings In this work, we investigate theoretically the expressive power and the mechanisms of transformer for modeling long but sparse memories
WATCH