24/04/16: Are All Large Language Models Really in 1.58 Bits?

Daily-Trend-Review

24/04/16: Are All Large Language Models Really in 1.58 Bits?

hellcat 2024. 4. 16. 21:12

https://learning-exhaust.hashnode.dev/are-all-large-language-models-really-in-158-bits?ref=twitter-share

Are all LLMs really 1.58 bits? Inference at 4x the speed or more?

Dive deep in to changes to the Transformer architecture to learn about how researchers have discovered a huge speedup in LLM inference.

learning-exhaust.hashnode.dev

https://www.latent.space/

Latent Space | swyx & Alessio | Substack

The AI Engineer newsletter + Top 10 US Tech podcast. Exploring AI UX, Agents, Devtools, Infra, Open Source Models. See https://latent.space/about for highlights from Chris Lattner, Andrej Karpathy, George Hotz, Simon Willison, Emad Mostaque, et al! Click t

www.latent.space

Comparison of Models: Quality, Performance & Price Analysis

https://artificialanalysis.ai/models

Comparison of AI Models across Quality, Performance, Price | Artificial Analysis

Comparison and analysis of AI models across key metrics including quality, price, performance and speed (throughput tokens per second & latency), context window & others.

artificialanalysis.ai

https://www.perplexity.ai/hub/blog/turbocharging-llama-2-70b-with-nvidia-h100

Turbocharging Llama 2 70B with NVIDIA H100

The journey of accelerated LLM inference

www.perplexity.ai

https://medium.com/@plienhar/llm-inference-series-4-kv-caching-a-deeper-look-4ba9a77746c8

LLM Inference Series: 4. KV caching, a deeper look

In this post, we will look at how big the KV cache, a common optimization for LLM inference, can grow and at common mitigation strategies.

medium.com

https://bea.stollnitz.com/blog/gpt-transformer/

Bea Stollnitz - The Transformer architecture of GPT models

Learn Azure ML and machine learning with Bea Stollnitz.

bea.stollnitz.com

https://medium.com/@fabio.matricardi/smarter-not-bigger-1-million-token-context-is-not-all-you-need-e9832ba9df66

Smarter, not Bigger — 1 Million token context is not all you need!

Open source at the test bench to verify long context information extraction. Are they really that good?

medium.com

MEGALODON: Efficient LLM Pretraining and Inference with Unlimited Context Length

https://arxiv.org/pdf/2404.08801.pdf

'Daily-Trend-Review' 카테고리의 다른 글

24/05/12: LLM pricing (0)	2024.05.12
24/05/10: 1.58 bits, FrugalGPT (0)	2024.05.10
24/04/13: LLM cost vs. Performance (0)	2024.04.13
24/03/31: Transformer math 101 (0)	2024.03.31
24/03/10: AGI의 정의 (0)	2024.03.10

현재글24/04/16: Are All Large Language Models Really in 1.58 Bits?

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

llma, 삼프로tv, transformer, vscode, ChatGPT, jupyter notebook, mdd, gpt-4, State of GPT, training, 퀀트투자, LLaMA-Adapter, etf, QLORA, GPT, 거인의포트폴리오, llm, 정채진프로, Generative-AI, 강환국,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

AI, Quant 투자 공부