2023/12/25: Towards 100x Speedup: Full Stack Transformer Inference Optimization

Daily-Trend-Review

2023/12/25: Towards 100x Speedup: Full Stack Transformer Inference Optimization

hellcat 2023. 12. 25. 19:49

https://yaofu.notion.site/Towards-100x-Speedup-Full-Stack-Transformer-Inference-Optimization-43124c3688e14cffaf2f1d6cbdf26c6c

Towards 100x Speedup: Full Stack Transformer Inference Optimization | Built with Notion

Imagine two companies have equally powerful models. Company A can serve the model to 10 users with 1 GPU, but company B can serve 20 users. Who will win in the long run?

yaofu.notion.site

https://huggingface.co/blog/whisper-speculative-decoding

Speculative Decoding for 2x Faster Whisper Inference

Speculative Decoding for 2x Faster Whisper Inference Open AI's Whisper is a general purpose speech transcription model that achieves state-of-the-art results across a range of different benchmarks and audio conditions. The latest large-v3 model tops the Op

huggingface.co

https://huggingface.co/blog/moe#serving-techniques

Mixture of Experts Explained

Mixture of Experts Explained With the release of Mixtral 8x7B (announcement, model card), a class of transformer has become the hottest topic in the open AI community: Mixture of Experts, or MoEs for short. In this blog post, we take a look at the building

huggingface.co

https://www.baseten.co/blog/llm-transformer-inference-guide/

A guide to LLM inference and performance

Learn if LLM inference is compute or memory bound to fully utilize GPU power. Get insights on better GPU resource utilization.

www.baseten.co

'Daily-Trend-Review' 카테고리의 다른 글

2024/01/05: Decoding Strategies in Large Language Models (1)	2024.01.05
2024/01/02: Transformer inference tricks (0)	2024.01.02
2023/12/23: optimizing your llm in production (0)	2023.12.23
2023/12/23: RAG 101 (0)	2023.12.23
2023/12/23: how to make LLMs go fast (0)	2023.12.23

현재글2023/12/25: Towards 100x Speedup: Full Stack Transformer Inference Optimization

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

정채진프로, State of GPT, llm, jupyter notebook, mdd, GPT, etf, 강환국, 거인의포트폴리오, ChatGPT, gpt-4, 퀀트투자, 삼프로tv, QLORA, transformer, LLaMA-Adapter, Generative-AI, llma, training, vscode,

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

AI, Quant 투자 공부