https://towardsdatascience.com/decoding-strategies-in-large-language-models-9733a8f70539
Decoding Strategies in Large Language Models
A Guide to Text Generation From Beam Search to Nucleus Sampling
towardsdatascience.com
Transformers Optimization: Part1 - KV Cache
Understanding Llama2: KV Cache, Grouped Query Attention, Rotary Embedding and More
What’s in LLama2: Grouped Query Attention, Rotary Embedding, KV Cache, Root Mean Square Normalization
ai.plainenglish.io
'Daily-Trend-Review' 카테고리의 다른 글
2024/01/20: 스터디 내용 정리 (0) | 2024.01.20 |
---|---|
2024/01/20: LLM Agents, DPO (0) | 2024.01.20 |
2024/01/02: Transformer inference tricks (0) | 2024.01.02 |
2023/12/25: Towards 100x Speedup: Full Stack Transformer Inference Optimization (0) | 2023.12.25 |
2023/12/23: optimizing your llm in production (0) | 2023.12.23 |