'분류 전체보기' 카테고리의 글 목록 (4 Page)

분류 전체보기 135

2023/12/10: optimizing your llm in production

∈https://huggingface.co/blog/optimize-llm Optimizing your LLM in production Optimizing your LLM in production Note: This blog post is also available as a documentation page on Transformers. Large Language Models (LLMs) such as GPT3/4, Falcon, and LLama are rapidly advancing in their ability to tackle human-centric tasks, establish huggingface.co 효율적인 LLM deployment를 위해 가장 효과적인 기술 Lower Precision..

Daily-Trend-Review 2023.12.10

2023/12/08: LLM transformer inference guide

https://www.baseten.co/blog/llm-transformer-inference-guide/ A guide to LLM inference and performance To attain the full power of a GPU during LLM inference, you have to know if the inference is compute bound or memory bound. Learn how to better utilize GPU resources. www.baseten.co ops:byte 비율 계산 A10의 ops:byte = (125 TFLOPS)/(600GB/s) = 208.3 ops/byte ops:byte = 208.3 이하이면 memory-bound ops:byte..

Daily-Trend-Review 2023.12.08

2023/12/06: The New Stack and Ops for AI (OpenAI dev)

GPT-4를 사용하여 생성한 학습 데이터셋을 GPT-3.5 turbo을 finetuning하는데 사용가능하다. GPT-3.5 turbo는

Daily-Trend-Review 2023.12.08

2023/12/01: Accelerating Generative AI with PyTorch II: GPT, Fast

Accelerating Generative AI with PyTorch II: Accelerating Generative AI with PyTorch II: GPT, Fast This post is the second part of a multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch. We are excited to share a breadth of newly released PyTorch performance features alongside practical examples to see how far we pytorch.org Adding Long Term Memory to Open..

Daily-Trend-Review 2023.12.01

PagedAttention + vLLM

Transformer Serving의 문제점 Transformer의 generation process는 memory-bound임 GPUs의 computation power를 제대로 사용하지 못함 이로 인해 serving throughput을 제한함 Throughput을 개선하기 위해선 multi-requests를 모아 처리해야 함 하지만 많은 requests를 배치 처리하기 위해서는 각 requests에 대한 메모리 공간을 효율적으로 관리해야 함 모델 weight는 일정하며 KV cache를 관리하는 방식이 최대 batch size를 결정함 KV cache를 비효율적으로 관리하면 batch size가 크게 되므로 LLM의 throughput이 제한됨 기존 LLM serving system은 KV cach..

Daily-Trend-Review 2023.11.30

좋은 개발 리더 되기

https://yozm.wishket.com/magazine/detail/2338/ 좋은 개발 리더가 되기 위해 고민해 본 것들 | 요즘IT 이번 글에서는 지난 3년간 개인 기여자(Individual Contributor, IC)가 아닌 한 명의 리더로서 좋은 리더란 무엇인지, 또 좋은 리더가 되려면 어떤 역량이 필요한지에 스스로 고민해 봤던 내용에 관해 적 yozm.wishket.com

Daily-Trend-Review 2023.11.29

개발자에서 아키텍트로

→ 1장. 소프트웨어 아키텍트가 되다. SW 아키텍트가 하는 일 SW가 언제 어떻게 전달되는지 결정하는 사람 비즈니스 목표에 부합하도록 만드는 사함 코딩을 하지만 알고리즘이나 코드를 짜기보다는 더 크고 많은 것을 설계함 엔지니어링 관점에서 문제 정의하기 Product Manager: 기능(feature)를 정의함 SW 아키텍트: Product Manager, Project Manager, 모든 stackholder와 협업하면서 비즈니스 목표와 요구사항을 만듬 품질 속성을 또 하나의 요구사항으로 만듬 SW 아키텍처가 정해진 방향으로 갈 수 있도록 제약과 기능을 꾸준히 확인해야 함 SW 시스템을 여러 조각으로 나누고 조각마다 품질 속성과 요구사항을 달성하도록 전략을 만듬 큰 그림 그리기 & Trade-off..

책리뷰 2023.11.25

2023/11/13: S-Lora 등

S-LORA: SERVING THOUSANDS OF CONCURRENT LORA ADAPTERS A Survey on Hallucination in Large Language Model: Principles, Taxonomy, Challenges and Open Questions SIMPLIFYING TRANSFORMER BLOCKS Alternating updates for efficient transformers Alternating updates for efficient transformers Posted by Xin Wang, Software Engineer, and Nishanth Dikkala, Research Scientist, Google Research Contemporary deep l..

Daily-Trend-Review 2023.11.13

MBU(Model Bandwidth Utilization)

MBU 정의 MBU(Model Bandwidth Utilizaiton)는 HW utilizaiton을 측정하기 위한 새로운 metric MBU가 100% 에 근접할수록 시스템의 가용 BW를 제대로 활용하는 것임 MPU = (achieved memory bandwidth) / (peak memory bandwidth) achieved memory bandwidth = ((total model parameter size) + KV cache size)/TPOT TPOT(Time Per Output Token) 예제) 7B model with 16bit precision, KV cache size는 무시, TPOT=14 ms/token, memory bandwidth=2TB/sec MPU = (14GB/14 ..

Daily-Trend-Review 2023.11.11

2023/11/11: Sliding Window Attention(SWA) 메커니즘

https://arxiv.org/abs/1904.10509?utm_source=pytorchkr Generating Long Sequences with Sparse Transformers Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to $O(n \sqrt{n})$. We also introduce a) a arxiv.org https://arxiv.org/abs/2004...

Daily-Trend-Review 2023.11.11

1 2 3 4 5 6 7 ··· 14

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

llma, llm, ChatGPT, 정채진프로, State of GPT, QLORA, 강환국, LLaMA-Adapter, 퀀트투자, 거인의포트폴리오, gpt-4, training, 삼프로tv, mdd, transformer, Generative-AI, GPT, etf, vscode, jupyter notebook,

Today :
Yesterday :

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

AI, Quant 투자 공부

분류 전체보기 135

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

2025. 04
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30