Daily-Trend-Review

2023/09/24: 상품화를 위한 LLM 최적화

hellcat 2023. 9. 24. 22:40

From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

NExT-GPT: Any-to-Any Multimodal LLM

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Optimizing your LLM in production

Do Machine Learning Models Memorize or Generalize?

 

Do Machine Learning Models Memorize or Generalize?

By Adam Pearce, Asma Ghandeharioun, Nada Hussein, Nithum Thain, Martin Wattenberg and Lucas Dixon August 2023 In 2021, researchers made a striking discovery while training a series of tiny models on toy tasks . They found a set of models that suddenly flip

pair.withgoogle.com

In the long (context) run

 

In the long (context) run | Harm de Vries

It's not the quadratic attention; it's the lack of long pre-training data

www.harmdevries.com

Building RAG-based LLM Applications for Production (Part 1)

 

Building RAG-based LLM Applications for Production (Part 1)

In this guide, we will learn how to develop and productionize a retrieval augmented generation (RAG) based LLM application, with a focus on scale, evaluation and routing.

www.anyscale.com

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

YaRN: Efficient Context Window Extension of Large Language Models

 

'Daily-Trend-Review' 카테고리의 다른 글

2023/10/06: long context llms  (0) 2023.10.06
2023/09/27: Speed up Inference  (0) 2023.09.27
2023/09/21: 언어모델링=압축  (0) 2023.09.21
2023/09/18: Textbooks Are All You Need 등  (0) 2023.09.18
2023/09/10: LLM 경제학  (0) 2023.09.10