Daily-Trend-Review

2023/11/11: Sliding Window Attention(SWA) 메커니즘

hellcat 2023. 11. 11. 08:13

 

https://arxiv.org/abs/1904.10509?utm_source=pytorchkr

 

Generating Long Sequences with Sparse Transformers

Transformers are powerful sequence models, but require time and memory that grows quadratically with the sequence length. In this paper we introduce sparse factorizations of the attention matrix which reduce this to $O(n \sqrt{n})$. We also introduce a) a

arxiv.org

https://arxiv.org/abs/2004.05150v2?utm_source=pytorchkr

 

Longformer: The Long-Document Transformer

Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linear

arxiv.org

https://news.hada.io/

 

GeekNews - 개발/기술/스타트업 뉴스 서비스

개발 뉴스, 기술 관련 새소식, 스타트업 정보와 노하우, 세상의 재미난 것들을 좋아하는 사람들을 위한 뉴스 사이트. 이메일 뉴스레터/트위터/슬랙 봇으로 구독 가능

news.hada.io

Paper Summary #8 - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

 

Paper Summary #8 - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Shreyansh Singh

Paper: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Link: https://arxiv.org/abs/2205.14135 Authors: Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré Code: https://github.com/HazyResearch/flash-attention I hav

shreyansh26.github.io

FLASHDECODING++: FASTER LARGE LANGUAGE MODEL INFERENCE ON GPUS

7 Ways To Speed Up Inference of Your Hosted LLMs

 

7 Ways to Speed Up Inference of Your Hosted LLMs

TLDR; techniques to speed up inference of LLMs to increase token generation speed and reduce memory consumption

betterprogramming.pub

 

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

 

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

Open Source Sodciety University

 

GitHub - ossu/computer-science: :mortar_board: Path to a free self-taught education in Computer Science!

:mortar_board: Path to a free self-taught education in Computer Science! - GitHub - ossu/computer-science: :mortar_board: Path to a free self-taught education in Computer Science!

github.com

Multi-Model RAG Stack

'Daily-Trend-Review' 카테고리의 다른 글

2023/11/13: S-Lora 등  (0) 2023.11.13
MBU(Model Bandwidth Utilization)  (0) 2023.11.11
2023/10/27: transformer-math  (0) 2023.10.27
2023/10/24: attention  (0) 2023.10.24
2023/10/18: Long-context 최적화 방법  (0) 2023.10.18