https://vgel.me/posts/faster-inference/
Table of Contents
- Why is simple inference so slow?
- Hardware
- Batching
- Shrinking model weights
- KV caching
- Speculative Decoding
- Training time optimizations
- Conclusion
'Daily-Trend-Review' 카테고리의 다른 글
2023/12/23: optimizing your llm in production (0) | 2023.12.23 |
---|---|
2023/12/23: RAG 101 (0) | 2023.12.23 |
2023/12/18: Mixtral 8x7B (1) | 2023.12.18 |
2023/12/14: Prompt Cache: Modular Attention Reuse For Low-Latency Inference (1) | 2023.12.14 |
2023/12/12: chip cloud 논문 (0) | 2023.12.14 |