2023/12/25: Towards 100x Speedup: Full Stack Transformer Inference Optimization

Daily-Trend-Review

2023/12/25: Towards 100x Speedup: Full Stack Transformer Inference Optimization

hellcat 2023. 12. 25. 19:49

https://yaofu.notion.site/Towards-100x-Speedup-Full-Stack-Transformer-Inference-Optimization-43124c3688e14cffaf2f1d6cbdf26c6c

Towards 100x Speedup: Full Stack Transformer Inference Optimization | Built with Notion

Imagine two companies have equally powerful models. Company A can serve the model to 10 users with 1 GPU, but company B can serve 20 users. Who will win in the long run?

yaofu.notion.site

https://huggingface.co/blog/whisper-speculative-decoding

Speculative Decoding for 2x Faster Whisper Inference

Speculative Decoding for 2x Faster Whisper Inference Open AI's Whisper is a general purpose speech transcription model that achieves state-of-the-art results across a range of different benchmarks and audio conditions. The latest large-v3 model tops the Op

huggingface.co

https://huggingface.co/blog/moe#serving-techniques

Mixture of Experts Explained

Mixture of Experts Explained With the release of Mixtral 8x7B (announcement, model card), a class of transformer has become the hottest topic in the open AI community: Mixture of Experts, or MoEs for short. In this blog post, we take a look at the building

huggingface.co

https://www.baseten.co/blog/llm-transformer-inference-guide/

A guide to LLM inference and performance

Learn if LLM inference is compute or memory bound to fully utilize GPU power. Get insights on better GPU resource utilization.

www.baseten.co

'Daily-Trend-Review' 카테고리의 다른 글

2024/01/05: Decoding Strategies in Large Language Models (1)	2024.01.05
2024/01/02: Transformer inference tricks (0)	2024.01.02
2023/12/23: optimizing your llm in production (0)	2023.12.23
2023/12/23: RAG 101 (0)	2023.12.23
2023/12/23: how to make LLMs go fast (0)	2023.12.23

현재글2023/12/25: Towards 100x Speedup: Full Stack Transformer Inference Optimization

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

training, 거인의포트폴리오, QLORA, gpt-4, LLaMA-Adapter, transformer, Generative-AI, mdd, State of GPT, 강환국, etf, 퀀트투자, 삼프로tv, 정채진프로, GPT, vscode, ChatGPT, llma, jupyter notebook, llm,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

AI, Quant 투자 공부