Review-Log

FlashAttention: Fast and Memory-Efficient Exact Attentionwith IO-Awareness

hellcat 2023. 4. 16. 10:20

FlashAttention: Fast Transformer Training with Long Sequences
https://hazyresearch.stanford.edu/blog/2023-01-12-flashattention-long-sequences

FlashAttention: Fast Transformer Training with Long Sequences
https://arxiv.org/pdf/2205.14135.pdf

'Review-Log' 카테고리의 다른 글

Model Parallelism  (0) 2023.10.03
18B 파라미터 GPT 모델을 Single GPU로 학습하기 (Colossal-AI)  (0) 2022.05.24