2024/02/12: Large Language Models - the hardware connection

Daily-Trend-Review

2024/02/12: Large Language Models - the hardware connection

hellcat 2024. 2. 12. 22:43

Large Language Models - the hardware connection

HOW TO BUILD LOW-COST NETWORKS FOR LARGE LANGUAGE MODELS (WITHOUT SACRIFICING PERFORMANCE)?

Reducintg Activation Recomputation in Large Trnasformer Models


Attention Block	Q, K, V matrix multiplies	2sbh	11sbh + 5as2b
	QKT	4sbh
	Softmax	2as2b
	Softmax dropout	as2b
	Attention over Values(V)	2as2b(dropout output)+2sbh(Values)
	Linear projection	2sbh
	Attention dropout	sbh
MLP		2sbh	19sbh
		8sbh
		8sbh
		sbh
Layer normalization	Layernorm #1	2sbh	4sbh
Layer normalization	Layernorm #2	2sbh
Total Activation			=sbh(34+5as/h)

https://magazine.sebastianraschka.com/

'Daily-Trend-Review' 카테고리의 다른 글

24/03/09: Transformer Alternatives (0)	2024.03.10
24/02/25: OLMo (0)	2024.02.25
24/02/06: Why GPT-3.5 is (mostly) cheaper than Llama2 (0)	2024.02.06
24/02/04: fine-tune your lown llama 2 model in a colab note book (0)	2024.02.04
2024/01/27: Harmonizing Multi-GPUs (0)	2024.01.27

현재글2024/02/12: Large Language Models - the hardware connection

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

퀀트투자, vscode, gpt-4, LLaMA-Adapter, QLORA, 거인의포트폴리오, mdd, 정채진프로, 강환국, training, transformer, etf, llma, GPT, llm, ChatGPT, Generative-AI, jupyter notebook, 삼프로tv, State of GPT,

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

AI, Quant 투자 공부

2024/02/12: Large Language Models - the hardware connection

'Daily-Trend-Review' 카테고리의 다른 글

'Daily-Trend-Review'의 다른글

티스토리툴바

2024/02/12: Large Language Models - the hardware connection

'Daily-Trend-Review' 카테고리의 다른 글

'Daily-Trend-Review'의 다른글

관련글

티스토리툴바