https://huggingface.co/blog/whisper-speculative-decoding
https://huggingface.co/blog/moe#serving-techniques
https://www.baseten.co/blog/llm-transformer-inference-guide/
'Daily-Trend-Review' 카테고리의 다른 글
2024/01/05: Decoding Strategies in Large Language Models (1) | 2024.01.05 |
---|---|
2024/01/02: Transformer inference tricks (0) | 2024.01.02 |
2023/12/23: optimizing your llm in production (0) | 2023.12.23 |
2023/12/23: RAG 101 (0) | 2023.12.23 |
2023/12/23: how to make LLMs go fast (0) | 2023.12.23 |