2024/01/27: Harmonizing Multi-GPUs

Daily-Trend-Review

2024/01/27: Harmonizing Multi-GPUs

hellcat 2024. 1. 27. 11:36

Harmonizing Multi-GPUs: Efficient Scaling of LLM Inference

TitanML docs | TitanML docs

The machine learning optimization company

docs.titanml.co

In the Fast Lane! Speculative Decoding - 10x Larger Model, No Extra Cost

TitanML docs | TitanML docs

The machine learning optimization company

docs.titanml.co

Large Language Models - the hardware connection

Large Language Models — the hardware connection | APNIC Blog

Guest Post: The role of networking when scaling LLM architecture's gigantic models.

blog.apnic.net

Tensor Parallelism

Tensor Parallelism - NADDOD Blog

Tensor parallelism alleviates memory issues in large-scale training. RoCE enables efficient communication for GPU tensor parallelism, accelerating computations.

www.naddod.com

Fast and Expressive LLM inference with Radixattention and SGLang

Fast and Expressive LLM Inference with RadixAttention and SGLang | LMSYS Org

<p>Large Language Models (LLMs) are increasingly utilized for complex tasks that require multiple chained generation calls, advanced prompting techniques, co...

lmsys.org

All you need to know about LLMs

'Daily-Trend-Review' 카테고리의 다른 글

24/02/06: Why GPT-3.5 is (mostly) cheaper than Llama2 (0)	2024.02.06
24/02/04: fine-tune your lown llama 2 model in a colab note book (0)	2024.02.04
2024/01/26: Leading with open Models, frameworks, and systems (0)	2024.01.26
2024/01/20: 스터디 내용 정리 (0)	2024.01.20
2024/01/20: LLM Agents, DPO (0)	2024.01.20

현재글2024/01/27: Harmonizing Multi-GPUs

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

etf, 퀀트투자, 삼프로tv, llm, ChatGPT, State of GPT, gpt-4, training, 강환국, Generative-AI, vscode, 정채진프로, GPT, transformer, mdd, llma, 거인의포트폴리오, LLaMA-Adapter, QLORA, jupyter notebook,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

AI, Quant 투자 공부