https://arxiv.org/abs/1904.10509?utm_source=pytorchkr
https://arxiv.org/abs/2004.05150v2?utm_source=pytorchkr
Paper Summary #8 - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
FLASHDECODING++: FASTER LARGE LANGUAGE MODEL INFERENCE ON GPUS
7 Ways To Speed Up Inference of Your Hosted LLMs
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Open Source Sodciety University
Multi-Model RAG Stack
'Daily-Trend-Review' 카테고리의 다른 글
2023/11/13: S-Lora 등 (0) | 2023.11.13 |
---|---|
MBU(Model Bandwidth Utilization) (0) | 2023.11.11 |
2023/10/27: transformer-math (0) | 2023.10.27 |
2023/10/24: attention (0) | 2023.10.24 |
2023/10/18: Long-context 최적화 방법 (0) | 2023.10.18 |