Daily-Trend-Review

2024/02/12: Large Language Models - the hardware connection

hellcat 2024. 2. 12. 22:43

Large Language Models - the hardware connection

 

LLM inference - HW/SW optimizations

 

HOW TO BUILD LOW-COST NETWORKS FOR LARGE LANGUAGE MODELS (WITHOUT SACRIFICING PERFORMANCE)?

 

Reducintg Activation Recomputation in Large Trnasformer Models





Attention Block Q, K, V matrix multiplies 2sbh 11sbh + 5as2b
QKT 4sbh
Softmax 2as2b
Softmax dropout as2b
Attention over Values(V) 2as2b(dropout output)+2sbh(Values)
Linear projection 2sbh
Attention dropout sbh
MLP
2sbh 19sbh

8sbh

8sbh

sbh
Layer normalization Layernorm #1 2sbh 4sbh
Layernorm #2 2sbh
Total Activation

=sbh(34+5as/h)

 

 

https://magazine.sebastianraschka.com/