FlashAttention: Fast and Memory-Efficient Exact Attentionwith IO-Awareness FlashAttention: Fast Transformer Training with Long Sequences https://hazyresearch.stanford.edu/blog/2023-01-12-flashattention-long-sequences FlashAttention: Fast Transformer Training with Long Sequences https://arxiv.org/pdf/2205.14135.pdf Review-Log 2023.04.16