2023/10/18: Long-context 최적화 방법

Daily-Trend-Review

2023/10/18: Long-context 최적화 방법

hellcat 2023. 10. 18. 09:21

Efficient Streaming Language Models with Attention Sinks

Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly, during the decoding stage, caching previous tokens' Key and Val

arxiv.org

Ring Attention with Blockwise Trnasformers for Near-Infinite Context

HyperAttention: Long-context Attention in Near-Linear Time

Flash-Decoding for long context inference

Stanford CRFM

Motivation Large language models (LLM) such as ChatGPT or Llama have received unprecedented attention lately. However, they remain massively expensive to run. Even though generating a single response can cost about $0.01 (a few seconds of an 8xA100 instanc

crfm.stanford.edu

Efficient Memory Management for Large Language Model Serving with PagedAttention

'Daily-Trend-Review' 카테고리의 다른 글

2023/10/27: transformer-math (0)	2023.10.27
2023/10/24: attention (0)	2023.10.24
2023/10/16: RAG (0)	2023.10.16
2023/10/06: long context llms (0)	2023.10.06
2023/09/27: Speed up Inference (0)	2023.09.27

현재글2023/10/18: Long-context 최적화 방법

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

ChatGPT, vscode, QLORA, etf, training, 퀀트투자, LLaMA-Adapter, 정채진프로, llm, jupyter notebook, mdd, Generative-AI, gpt-4, 강환국, GPT, State of GPT, 거인의포트폴리오, transformer, llma, 삼프로tv,

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

AI, Quant 투자 공부