AI, Quant 투자 공부

전체 글 135

2024/02/12: Large Language Models - the hardware connection

Large Language Models - the hardware connection LLM inference - HW/SW optimizations HOW TO BUILD LOW-COST NETWORKS FOR LARGE LANGUAGE MODELS (WITHOUT SACRIFICING PERFORMANCE)? Reducintg Activation Recomputation in Large Trnasformer Models Attention Block Q, K, V matrix multiplies 2sbh 11sbh + 5as2b QKT 4sbh Softmax 2as2b Softmax dropout as2b Attention over Values(V) 2as2b(dropout output)+2sbh(Va..

Daily-Trend-Review 2024.02.12

24/02/06: Why GPT-3.5 is (mostly) cheaper than Llama2

Why GPT-3.5 is (mostly) cheaper than Llama2 Why GPT-3.5 is (mostly) cheaper than Llama 2 Llama-2 is more expensive than you'd think. In this post, we explore why it's often more expensive than gpt-3.5-turbo. cursor.sh

Daily-Trend-Review 2024.02.06

24/02/04: fine-tune your lown llama 2 model in a colab note book

Fine-Tune Your Own Llama 2 Model in a Colab ML Blog - Fine-Tune Your Own Llama 2 Model in a Colab Notebook mlabonne.github.io Decoding Strategies in Large Language Models ML Blog - Decoding Strategies in Large Language Models mlabonne.github.io Introduction to Weight Quantization ML Blog - Introduction to Weight Quantization mlabonne.github.io LLM Inference Series: 4. KV caching, a deeper look L..

Daily-Trend-Review 2024.02.04

2024/01/27: Harmonizing Multi-GPUs

Harmonizing Multi-GPUs: Efficient Scaling of LLM Inference TitanML docs | TitanML docs The machine learning optimization company docs.titanml.co In the Fast Lane! Speculative Decoding - 10x Larger Model, No Extra Cost TitanML docs | TitanML docs The machine learning optimization company docs.titanml.co Large Language Models - the hardware connection Large Language Models — the hardware connectio..

Daily-Trend-Review 2024.01.27

2024/01/26: Leading with open Models, frameworks, and systems

Leading with open Models, frameworks, and systems Deploying Large Language Models in Production: LLM Deployment Challenges Deploying Large Language Models in Production: LLM Deployment Challenges Learn about the deployment challenges that come up when users want to deploy LLMs within their own environment. www.seldon.io On Optimizing the communication of model parallelism How to Maximize Through..

Daily-Trend-Review 2024.01.26

2024/01/20: 스터디 내용 정리

https://www.semianalysis.com/p/gpt-4-architecture-infrastructure GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE Demystifying GPT-4: The engineering tradeoffs that led OpenAI to their architecture. www.semianalysis.com 인간은 평균적으로 250 words/sec 속도로 단어를 읽음 일부는 최대 1000 words/sec 속도로 단어를 읽음 Extending Context Length in Large Language Models Long sequence의 모델을 어떻게 학습시킬 수 있을까? 컨..

Daily-Trend-Review 2024.01.20

2024/01/20: LLM Agents, DPO

LLM Agents - Intuitively and Exhaustively Explained DPO, Open-Source’s New Weapon in the AI War Hardware-aware Algorithms for Sequence Modeling - Tri Dao | Stanford MLSys #87 Democratizing LLMs: 4-bit Quantization for Optimal LLM Inference A Detailed Explanation of Mixtral 8x7B Model

Daily-Trend-Review 2024.01.20

2024/01/05: Decoding Strategies in Large Language Models

https://towardsdatascience.com/decoding-strategies-in-large-language-models-9733a8f70539 Decoding Strategies in Large Language Models A Guide to Text Generation From Beam Search to Nucleus Sampling towardsdatascience.com Transformers Optimization: Part1 - KV Cache https://ai.plainenglish.io/understanding-llama2-kv-cache-grouped-query-attention-rotary-embedding-and-more-c17e5f49a6d7?gi=d834690972..

Daily-Trend-Review 2024.01.05

2024/01/02: Transformer inference tricks

https://www.artfintel.com/p/transformer-inference-tricks Transformer inference tricks How to make your model run faster than a greased pig www.artfintel.com

Daily-Trend-Review 2024.01.02

2023/12/25: Towards 100x Speedup: Full Stack Transformer Inference Optimization

https://yaofu.notion.site/Towards-100x-Speedup-Full-Stack-Transformer-Inference-Optimization-43124c3688e14cffaf2f1d6cbdf26c6c Towards 100x Speedup: Full Stack Transformer Inference Optimization | Built with Notion Imagine two companies have equally powerful models. Company A can serve the model to 10 users with 1 GPU, but company B can serve 20 users. Who will win in the long run? yaofu.notion.s..

Daily-Trend-Review 2023.12.25

1 2 3 4 5 ··· 14

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

mdd, llma, QLORA, State of GPT, etf, 정채진프로, transformer, vscode, GPT, llm, ChatGPT, gpt-4, jupyter notebook, 강환국, 거인의포트폴리오, 퀀트투자, LLaMA-Adapter, 삼프로tv, training, Generative-AI,

Today :
Yesterday :

« 2025/11 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

전체 글 135

티스토리툴바