1 post found
As context grows, softmax normalizes attention weights so each relevant token gets less attention. This mathematical property is why AI accuracy drops with length.