The Information-Theoretic Limits of Context Windows
There are fundamental limits on how much information a fixed-width attention mechanism can extract from n tokens. Here's the math from Shannon's channel capacity to attention bounds.
2 posts found
There are fundamental limits on how much information a fixed-width attention mechanism can extract from n tokens. Here's the math from Shannon's channel capacity to attention bounds.
Context compaction is a lossy compression problem. Rate-distortion theory gives the theoretical lower bound on how much conversation history can be compressed.