2024/02/12: Large Language Models - the hardware connection

Daily-Trend-Review

hellcat 2024. 2. 12. 22:43


Attention Block	Q, K, V matrix multiplies	2sbh	11sbh + 5as2b
	QKT	4sbh
	Softmax	2as2b
	Softmax dropout	as2b
	Attention over Values(V)	2as2b(dropout output)+2sbh(Values)
	Linear projection	2sbh
	Attention dropout	sbh
MLP		2sbh	19sbh
		8sbh
		8sbh
		sbh
Layer normalization	Layernorm #1	2sbh	4sbh
Layer normalization	Layernorm #2	2sbh
Total Activation			=sbh(34+5as/h)