Language Modeling Is Compression
Building RAG-based LLM Applications for Production (Part 1)
10 Ways to Improve the Performance of Retrieval Augmented Generation Systems
Memory bandwidth constraints imply economies of scale in AI inference
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
'Daily-Trend-Review' 카테고리의 다른 글
2023/09/27: Speed up Inference (0) | 2023.09.27 |
---|---|
2023/09/24: 상품화를 위한 LLM 최적화 (0) | 2023.09.24 |
2023/09/18: Textbooks Are All You Need 등 (0) | 2023.09.18 |
2023/09/10: LLM 경제학 (0) | 2023.09.10 |
2023/08/28: inference optimization (0) | 2023.08.28 |