Daily-Trend-Review

24/04/13: LLM cost vs. Performance

hellcat 2024. 4. 13. 10:44

https://twitter.com/virattt/status/1778828787951546382

 

X의 virat님(@virattt)

Friday is LLM battle day. I added DBRX to the financial metrics challenge. Overall, very impressed with DBRX. Main takeaways: • correctly calculated metrics • ranked top 4 fastest models • competitive pricing DBRX was +50% cheaper and +100% faster th

twitter.com

 

https://colab.research.google.com/gist/virattt/f290fa4ec878f006ca1264899645182a/exploring-llm-pricing-cost.ipynb#scrollTo=dm_zP0j7J6av

 

llm-pricing-cost-quality.ipynb

Run, share, and edit Python notebooks

colab.research.google.com

 

 

https://coconut-mode.com/posts/ring-attention/

 

Ring Attention Explained | Coconut Mode

Near infinite context window for language models.

coconut-mode.com

https://twitter.com/bonniesjli/status/1778846068588814486

 

X의 Bonnie Li님(@bonniesjli)

How do LLMs scale to million token context window? Ring Attention is a nice trick to parallelize long sequence across devices and rotate them in a ring with zero overhead scaling. In our new blog, we cover the tricks behind this magic. It looks like this (

twitter.com

 

Leave No Context Behind: Efficient Infinite Context Trnasformers with Infini-attention

 

https://arxiv.org/abs/2404.07143

 

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. T

arxiv.org