From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
NExT-GPT: Any-to-Any Multimodal LLM
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Optimizing your LLM in production
Do Machine Learning Models Memorize or Generalize?
In the long (context) run
Building RAG-based LLM Applications for Production (Part 1)
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
YaRN: Efficient Context Window Extension of Large Language Models
'Daily-Trend-Review' 카테고리의 다른 글
2023/10/06: long context llms (0) | 2023.10.06 |
---|---|
2023/09/27: Speed up Inference (0) | 2023.09.27 |
2023/09/21: 언어모델링=압축 (0) | 2023.09.21 |
2023/09/18: Textbooks Are All You Need 등 (0) | 2023.09.18 |
2023/09/10: LLM 경제학 (0) | 2023.09.10 |