Full Stack Optimization of Transformer Inference: a Survey
How long Can Open-Source LLMs Truly Promise on Context Length?
EFFICIENTLY SCALING TRANSFORMER INFERENCE
Dissecting Batching Effects in GPT Inference
Accelerating transformer inference on my RTX 4090
GPU Performance Background User's Guide
Code Llama: Open Foundation Models for Code
Math-Bound VS Memory-Bound Operations
Transformer Inference Arithmetic
Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking
GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2
Understanding QLoRA & LoRA: Fine-tuning of LLMs
'Daily-Trend-Review' 카테고리의 다른 글
2023/09/18: Textbooks Are All You Need 등 (0) | 2023.09.18 |
---|---|
2023/09/10: LLM 경제학 (0) | 2023.09.10 |
2023/08/20: llama2 inference, continuous batching 등 (0) | 2023.08.20 |
2023/08/16: 딥러닝 병렬처리 (0) | 2023.08.16 |
2023/08/08: GPT-3.5와 Llama2 비교, 벡터 DB, long contexts (0) | 2023.08.08 |