'Daily-Trend-Review' 카테고리의 글 목록 (4 Page)

Daily-Trend-Review 107

2023/12/10: optimizing your llm in production

∈https://huggingface.co/blog/optimize-llm Optimizing your LLM in production Optimizing your LLM in production Note: This blog post is also available as a documentation page on Transformers. Large Language Models (LLMs) such as GPT3/4, Falcon, and LLama are rapidly advancing in their ability to tackle human-centric tasks, establish huggingface.co 효율적인 LLM deployment를 위해 가장 효과적인 기술 Lower Precision..

Daily-Trend-Review 2023.12.10

2023/12/08: LLM transformer inference guide

https://www.baseten.co/blog/llm-transformer-inference-guide/ A guide to LLM inference and performance To attain the full power of a GPU during LLM inference, you have to know if the inference is compute bound or memory bound. Learn how to better utilize GPU resources. www.baseten.co ops:byte 비율 계산 A10의 ops:byte = (125 TFLOPS)/(600GB/s) = 208.3 ops/byte ops:byte = 208.3 이하이면 memory-bound ops:byte..

Daily-Trend-Review 2023.12.08

2023/12/06: The New Stack and Ops for AI (OpenAI dev)

GPT-4를 사용하여 생성한 학습 데이터셋을 GPT-3.5 turbo을 finetuning하는데 사용가능하다. GPT-3.5 turbo는

Daily-Trend-Review 2023.12.08

2023/12/01: Accelerating Generative AI with PyTorch II: GPT, Fast

Accelerating Generative AI with PyTorch II: Accelerating Generative AI with PyTorch II: GPT, Fast This post is the second part of a multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch. We are excited to share a breadth of newly released PyTorch performance features alongside practical examples to see how far we pytorch.org Adding Long Term Memory to Open..

Daily-Trend-Review 2023.12.01

PagedAttention + vLLM

Transformer Serving의 문제점 Transformer의 generation process는 memory-bound임 GPUs의 computation power를 제대로 사용하지 못함 이로 인해 serving throughput을 제한함 Throughput을 개선하기 위해선 multi-requests를 모아 처리해야 함 하지만 많은 requests를 배치 처리하기 위해서는 각 requests에 대한 메모리 공간을 효율적으로 관리해야 함 모델 weight는 일정하며 KV cache를 관리하는 방식이 최대 batch size를 결정함 KV cache를 비효율적으로 관리하면 batch size가 크게 되므로 LLM의 throughput이 제한됨 기존 LLM serving system은 KV cach..

Daily-Trend-Review 2023.11.30

좋은 개발 리더 되기

https://yozm.wishket.com/magazine/detail/2338/ 좋은 개발 리더가 되기 위해 고민해 본 것들 | 요즘IT 이번 글에서는 지난 3년간 개인 기여자(Individual Contributor, IC)가 아닌 한 명의 리더로서 좋은 리더란 무엇인지, 또 좋은 리더가 되려면 어떤 역량이 필요한지에 스스로 고민해 봤던 내용에 관해 적 yozm.wishket.com

Daily-Trend-Review 2023.11.29

2023/11/13: S-Lora 등

S-LORA: SERVING THOUSANDS OF CONCURRENT LORA ADAPTERS A Survey on Hallucination in Large Language Model: Principles, Taxonomy, Challenges and Open Questions SIMPLIFYING TRANSFORMER BLOCKS Alternating updates for efficient transformers Alternating updates for efficient transformers Posted by Xin Wang, Software Engineer, and Nishanth Dikkala, Research Scientist, Google Research Contemporary deep l..

Daily-Trend-Review 2023.11.13

MBU(Model Bandwidth Utilization)

MBU 정의 MBU(Model Bandwidth Utilizaiton)는 HW utilizaiton을 측정하기 위한 새로운 metric MBU가 100% 에 근접할수록 시스템의 가용 BW를 제대로 활용하는 것임 MPU = (achieved memory bandwidth) / (peak memory bandwidth) achieved memory bandwidth = ((total model parameter size) + KV cache size)/TPOT TPOT(Time Per Output Token) 예제) 7B model with 16bit precision, KV cache size는 무시, TPOT=14 ms/token, memory bandwidth=2TB/sec MPU = (14GB/14 ..

Daily-Trend-Review 2023.11.11

2023/11/11: Sliding Window Attention(SWA) 메커니즘

https://arxiv.org/abs/1904.10509?utm_source=pytorchkr Generating Long Sequences with Sparse Transformers Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to $O(n \sqrt{n})$. We also introduce a) a arxiv.org https://arxiv.org/abs/2004...

Daily-Trend-Review 2023.11.11

2023/10/27: transformer-math

Transformer Math 101 Transformer Math 101 We present basic math related to computation and memory usage for transformers blog.eleuther.ai MemGPT: Towards LLMs As Operating Systems Understanding the Performance of Transformer Inference Understanding the Performance of Transformer Inference Abstract The state of the art results in natural language processing tasks have been obtained by scaling up ..

Daily-Trend-Review 2023.10.27

1 2 3 4 5 6 7 ··· 11

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

State of GPT, gpt-4, GPT, etf, ChatGPT, 정채진프로, Generative-AI, QLORA, 강환국, jupyter notebook, training, 퀀트투자, mdd, transformer, llm, LLaMA-Adapter, 삼프로tv, 거인의포트폴리오, vscode, llma,

Today :
Yesterday :

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Daily-Trend-Review 107

티스토리툴바