Transformer Math 101
We present basic math related to computation and memory usage for transformers
blog.eleuther.ai
LLM inference - HW/SW Optimizations
LLM Inference - HW/SW Optimizations | Notion
Linkedin의 원저자(Sharada Yeluri)의 허락을 받아 원문을 번역 및 검수중입니다.
tulip-phalange-a1e.notion.site
Optimizing your LLM in production
Optimizing your LLM in production
Optimizing your LLM in production Note: This blog post is also available as a documentation page on Transformers. Large Language Models (LLMs) such as GPT3/4, Falcon, and LLama are rapidly advancing in their ability to tackle human-centric tasks, establish
huggingface.co
Reducing Activation Recomputation in Large Transformer Models
Cornell ECE 5545: ML HW & Systems. Lecture 7: Quantization
Nvidia Unveils Most powerful GPU Blackwell B200 Unleashes AI Performance Speed
NVIDIA Blackwell B200: Unveiling the Most Powerful GPU for AI Performance Speed - NADDOD Blog
Discover NVIDIA's Blackwell B200, the ultimate GPU for unleashing AI performance speed. Learn about its breakthrough technology and how it enhances data center operations. Explore NADDOD's optical module technology and its seamless integration with NVIDIA'
www.naddod.com
State-space LLMs: Do we need Attention?
State-space LLMs: Do we need Attention?
Mamba, StripedHyena, Based, research overload, and the exciting future of many LLM architectures all at once.
www.interconnects.ai
Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI
Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI
Listen now | The PyTorch creator riffs on geohot's Tinygrad, Chris Lattner's Mojo, Apple's MLX, the PyTorch Mafia, the upcoming Llama 3 and MTIA ASIC, AI robotics, and what it takes for open source AI to win!
www.latent.space
A little guide to building Large Language Models in 2024
FP8-LM: Training FP8 Large Language Models
1bitLLM/bitnet_b1_58-3B · Hugging Face
This is a reproduction of the BitNet b1.58 paper. The models are trained with RedPajama dataset for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following paper. All models are open-source in the
huggingface.co
'Daily-Trend-Review' 카테고리의 다른 글
24/04/16: Are All Large Language Models Really in 1.58 Bits? (0) | 2024.04.16 |
---|---|
24/04/13: LLM cost vs. Performance (0) | 2024.04.13 |
24/03/10: AGI의 정의 (0) | 2024.03.10 |
24/03/10: It is fake AGI, stupid! (0) | 2024.03.10 |
24/03/09: Transformer Alternatives (0) | 2024.03.10 |