Efficient Streaming Language Models with Attention Sinks
Ring Attention with Blockwise Trnasformers for Near-Infinite Context
HyperAttention: Long-context Attention in Near-Linear Time
Flash-Decoding for long context inference
Efficient Memory Management for Large Language Model Serving with PagedAttention
'Daily-Trend-Review' 카테고리의 다른 글
2023/10/27: transformer-math (0) | 2023.10.27 |
---|---|
2023/10/24: attention (0) | 2023.10.24 |
2023/10/16: RAG (0) | 2023.10.16 |
2023/10/06: long context llms (0) | 2023.10.06 |
2023/09/27: Speed up Inference (0) | 2023.09.27 |