FlashAttention: Fast Transformer Training with Long Sequences
https://hazyresearch.stanford.edu/blog/2023-01-12-flashattention-long-sequences
FlashAttention: Fast Transformer Training with Long Sequences
https://arxiv.org/pdf/2205.14135.pdf
'Review-Log' 카테고리의 다른 글
Model Parallelism (0) | 2023.10.03 |
---|---|
18B 파라미터 GPT 모델을 Single GPU로 학습하기 (Colossal-AI) (0) | 2022.05.24 |