Data Engineering for Scaling Language Models to 128K Context
Are All Large Language Models Really in 1.58 Bits?
Are all LLMs really 1.58 bits? Inference at 4x the speed or more?
Dive deep in to changes to the Transformer architecture to learn about how researchers have discovered a huge speedup in LLM inference.
learning-exhaust.hashnode.dev
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures,
arxiv.org
4-bit LLM Quantization with GPTQ
ML Blog - 4-bit LLM Quantization with GPTQ
mlabonne.github.io
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
OpenELM:An Efficient Language Model Family with Open Training and Inference Framework
Will infinite context windows kill LLM fine-tuning and RAG?
Will infinite context windows kill LLM fine-tuning and RAG?
LLMs with infinite context windows are making it easier to create proof-of-concepts and prototypes. But scale still requires careful engineering.
bdtechtalks.com
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Transformers have emerged as the backbone of large language models (LLMs). However, generation remains inefficient due to the need to store in memory a cache of key-value representations for past tokens, whose size scales linearly with the input sequence l
arxiv.org
'Daily-Trend-Review' 카테고리의 다른 글
24/05/29: MS build 2024 (0) | 2024.05.29 |
---|---|
24/05/12: LLM pricing (0) | 2024.05.12 |
24/04/16: Are All Large Language Models Really in 1.58 Bits? (0) | 2024.04.16 |
24/04/13: LLM cost vs. Performance (0) | 2024.04.13 |
24/03/31: Transformer math 101 (0) | 2024.03.31 |