Daily-Trend-Review
2024/02/12: Large Language Models - the hardware connection
hellcat
2024. 2. 12. 22:43
Large Language Models - the hardware connection
LLM inference - HW/SW optimizations
HOW TO BUILD LOW-COST NETWORKS FOR LARGE LANGUAGE MODELS (WITHOUT SACRIFICING PERFORMANCE)?
Reducintg Activation Recomputation in Large Trnasformer Models
Attention Block | Q, K, V matrix multiplies | 2sbh | 11sbh + 5as2b |
QKT | 4sbh | ||
Softmax | 2as2b | ||
Softmax dropout | as2b | ||
Attention over Values(V) | 2as2b(dropout output)+2sbh(Values) | ||
Linear projection | 2sbh | ||
Attention dropout | sbh | ||
MLP | 2sbh | 19sbh | |
8sbh | |||
8sbh | |||
sbh | |||
Layer normalization | Layernorm #1 | 2sbh | 4sbh |
Layernorm #2 | 2sbh | ||
Total Activation | =sbh(34+5as/h) |