1 post found
Multiple research approaches attack the quadratic bottleneck: Longformer, Reformer, Linformer, and linear attention. Here's the math behind each one.