What Is FlashAttention? A Complete Guide to Tri Dao’s GPU-Memory-Aware Attention Algorithm, FlashAttention-3, and How It Compares to PagedAttention
FlashAttention is Tri Dao's GPU-memory-aware attention algorithm delivering 2-4x speedup and long-context training. FA-3 reaches 75% of H100 FLOPS. Architecture, patterns, comparison.































