24/03/31: Transformer math 101

Daily-Trend-Review

24/03/31: Transformer math 101

hellcat 2024. 3. 31. 07:44

Transformer math 101

Transformer Math 101

We present basic math related to computation and memory usage for transformers

blog.eleuther.ai

LLM inference - HW/SW Optimizations

LLM Inference - HW/SW Optimizations | Notion

Linkedin의 원저자(Sharada Yeluri)의 허락을 받아 원문을 번역 및 검수중입니다.

tulip-phalange-a1e.notion.site

Optimizing your LLM in production

Optimizing your LLM in production Note: This blog post is also available as a documentation page on Transformers. Large Language Models (LLMs) such as GPT3/4, Falcon, and LLama are rapidly advancing in their ability to tackle human-centric tasks, establish

huggingface.co

Reducing Activation Recomputation in Large Transformer Models

Cornell ECE 5545: ML HW & Systems. Lecture 7: Quantization

Nvidia Unveils Most powerful GPU Blackwell B200 Unleashes AI Performance Speed

NVIDIA Blackwell B200: Unveiling the Most Powerful GPU for AI Performance Speed - NADDOD Blog

Discover NVIDIA's Blackwell B200, the ultimate GPU for unleashing AI performance speed. Learn about its breakthrough technology and how it enhances data center operations. Explore NADDOD's optical module technology and its seamless integration with NVIDIA'

www.naddod.com

State-space LLMs: Do we need Attention?

Mamba, StripedHyena, Based, research overload, and the exciting future of many LLM architectures all at once.

www.interconnects.ai

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Listen now | The PyTorch creator riffs on geohot's Tinygrad, Chris Lattner's Mojo, Apple's MLX, the PyTorch Mafia, the upcoming Llama 3 and MTIA ASIC, AI robotics, and what it takes for open source AI to win!

www.latent.space

Efficiently Serving LLMs

A little guide to building Large Language Models in 2024

FP8-LM: Training FP8 Large Language Models

1bitLLM 재현 근황

1bitLLM/bitnet_b1_58-3B · Hugging Face

This is a reproduction of the BitNet b1.58 paper. The models are trained with RedPajama dataset for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following paper. All models are open-source in the

huggingface.co

'Daily-Trend-Review' 카테고리의 다른 글

24/04/16: Are All Large Language Models Really in 1.58 Bits? (0)	2024.04.16
24/04/13: LLM cost vs. Performance (0)	2024.04.13
24/03/10: AGI의 정의 (0)	2024.03.10
24/03/10: It is fake AGI, stupid! (0)	2024.03.10
24/03/09: Transformer Alternatives (0)	2024.03.10

현재글24/03/31: Transformer math 101

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

Generative-AI, gpt-4, 삼프로tv, QLORA, State of GPT, vscode, etf, 거인의포트폴리오, llma, mdd, training, 정채진프로, jupyter notebook, 퀀트투자, transformer, GPT, 강환국, LLaMA-Adapter, ChatGPT, llm,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

AI, Quant 투자 공부