Daily-Trend-Review

24/04/16: Are All Large Language Models Really in 1.58 Bits?

hellcat 2024. 4. 16. 21:12

https://learning-exhaust.hashnode.dev/are-all-large-language-models-really-in-158-bits?ref=twitter-share

 

Are all LLMs really 1.58 bits? Inference at 4x the speed or more?

Dive deep in to changes to the Transformer architecture to learn about how researchers have discovered a huge speedup in LLM inference.

learning-exhaust.hashnode.dev

https://www.latent.space/

 

Latent Space | swyx & Alessio | Substack

The AI Engineer newsletter + Top 10 US Tech podcast. Exploring AI UX, Agents, Devtools, Infra, Open Source Models. See https://latent.space/about for highlights from Chris Lattner, Andrej Karpathy, George Hotz, Simon Willison, Emad Mostaque, et al! Click t

www.latent.space

Comparison of Models: Quality, Performance & Price Analysis

https://artificialanalysis.ai/models

 

Comparison of AI Models across Quality, Performance, Price | Artificial Analysis

Comparison and analysis of AI models across key metrics including quality, price, performance and speed (throughput tokens per second & latency), context window & others.

artificialanalysis.ai

https://www.perplexity.ai/hub/blog/turbocharging-llama-2-70b-with-nvidia-h100

 

Turbocharging Llama 2 70B with NVIDIA H100

The journey of accelerated LLM inference

www.perplexity.ai

https://medium.com/@plienhar/llm-inference-series-4-kv-caching-a-deeper-look-4ba9a77746c8

 

LLM Inference Series: 4. KV caching, a deeper look

In this post, we will look at how big the KV cache, a common optimization for LLM inference, can grow and at common mitigation strategies.

medium.com

https://bea.stollnitz.com/blog/gpt-transformer/

 

Bea Stollnitz - The Transformer architecture of GPT models

Learn Azure ML and machine learning with Bea Stollnitz.

bea.stollnitz.com

https://medium.com/@fabio.matricardi/smarter-not-bigger-1-million-token-context-is-not-all-you-need-e9832ba9df66

 

Smarter, not Bigger — 1 Million token context is not all you need!

Open source at the test bench to verify long context information extraction. Are they really that good?

medium.com

MEGALODON: Efficient LLM Pretraining and Inference with Unlimited Context Length

https://arxiv.org/pdf/2404.08801.pdf

 

'Daily-Trend-Review' 카테고리의 다른 글

24/05/12: LLM pricing  (0) 2024.05.12
24/05/10: 1.58 bits, FrugalGPT  (0) 2024.05.10
24/04/13: LLM cost vs. Performance  (0) 2024.04.13
24/03/31: Transformer math 101  (0) 2024.03.31
24/03/10: AGI의 정의  (0) 2024.03.10