From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
NExT-GPT: Any-to-Any Multimodal LLM
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Optimizing your LLM in production
Do Machine Learning Models Memorize or Generalize?
Do Machine Learning Models Memorize or Generalize?
By Adam Pearce, Asma Ghandeharioun, Nada Hussein, Nithum Thain, Martin Wattenberg and Lucas Dixon August 2023 In 2021, researchers made a striking discovery while training a series of tiny models on toy tasks . They found a set of models that suddenly flip
pair.withgoogle.com
In the long (context) run
In the long (context) run | Harm de Vries
It's not the quadratic attention; it's the lack of long pre-training data
www.harmdevries.com
Building RAG-based LLM Applications for Production (Part 1)
Building RAG-based LLM Applications for Production (Part 1)
In this guide, we will learn how to develop and productionize a retrieval augmented generation (RAG) based LLM application, with a focus on scale, evaluation and routing.
www.anyscale.com
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
YaRN: Efficient Context Window Extension of Large Language Models
'Daily-Trend-Review' 카테고리의 다른 글
2023/10/06: long context llms (0) | 2023.10.06 |
---|---|
2023/09/27: Speed up Inference (0) | 2023.09.27 |
2023/09/21: 언어모델링=압축 (0) | 2023.09.21 |
2023/09/18: Textbooks Are All You Need 등 (0) | 2023.09.18 |
2023/09/10: LLM 경제학 (0) | 2023.09.10 |