Harmonizing Multi-GPUs: Efficient Scaling of LLM Inference
In the Fast Lane! Speculative Decoding - 10x Larger Model, No Extra Cost
Large Language Models - the hardware connection
Fast and Expressive LLM inference with Radixattention and SGLang
'Daily-Trend-Review' 카테고리의 다른 글
24/02/06: Why GPT-3.5 is (mostly) cheaper than Llama2 (0) | 2024.02.06 |
---|---|
24/02/04: fine-tune your lown llama 2 model in a colab note book (0) | 2024.02.04 |
2024/01/26: Leading with open Models, frameworks, and systems (0) | 2024.01.26 |
2024/01/20: 스터디 내용 정리 (0) | 2024.01.20 |
2024/01/20: LLM Agents, DPO (0) | 2024.01.20 |