Daily-Trend-Review

2023/07/21: MQA, LLaMA2, Flashattention2

hellcat 2023. 7. 21. 08:12

Multi-Query Attention is All You Need

source: https://blog.fireworks.ai/multi-query-attention-is-all-you-need-db072e758055

by James K Reed, Dmytro Dzhulgakov, Dmytro Ivchenko, and Lin Qiao

blog.fireworks.ai

LLaMA 2: The Dawn of a New Era

source: https://betterprogramming.pub/the-dawn-of-a-new-era-llama2-b0b1a9175029

LLaMA 2: The Dawn of a New Era

Key differences from LLaMA 1, safety & violations, Ghost Attention and model performance.

betterprogramming.pub

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

source: https://crfm.stanford.edu/2023/07/17/flash2.html

Stanford CRFM

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning Just within the last year, there have been several language models with much longer context than before: GPT-4 with context length 32k, MosaicML’s MPT with context length 65

crfm.stanford.edu

Can Longer Sequences Help Take the Next Leap in AI?

source: https://ai.stanford.edu/blog/longer-sequences-next-leap-ai/

Can Longer Sequences Help Take the Next Leap in AI?

Deep learning has revolutionized machine learning. To a first approximation, deeper has been better. However, there is another dimension to scale these models: the size of the input. Even the world’s most impressive models can only process long-form cont

ai.stanford.edu

How does in-context learning work? A framework for understanding the differences from traditional supervised learning

source: https://ai.stanford.edu/blog/understanding-incontext/

How does in-context learning work? A framework for understanding the differences from traditional supervised learning

The official Stanford AI Lab blog

ai.stanford.edu

Generative AI - Learn the LangChain Basics by Building a Berlin Travel Guide

source: https://medium.com/google-cloud/generative-ai-learn-the-langchain-basics-by-building-a-berlin-travel-guide-5cc0a2ce4096

Generative AI - Learn the LangChain Basics by Building a Berlin Travel Guide

LangChain is a framework that’s like a Swiss army knife for large language models (LLMs).

medium.com

Augmenting Language Models with Long-Term Memory

source: https://arxiv.org/abs/2306.07174

Augmenting Language Models with Long-Term Memory

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Lon

arxiv.org

'Daily-Trend-Review' 카테고리의 다른 글

2023/07/24: LongNet (0)	2023.07.24
2023/07/21(2) : In-Context Learning, Emergent Abilities, (0)	2023.07.21
2023/07/18: Long Sequence (0)	2023.07.18
2023/07/16: LLM에 대한 실용적인 소개 등 (0)	2023.07.16
2023/07/11: GPT-4, Longnet, knowledge base (0)	2023.07.11

현재글2023/07/21: MQA, LLaMA2, Flashattention2

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

transformer, gpt-4, llma, training, Generative-AI, 정채진프로, 강환국, llm, QLORA, State of GPT, 퀀트투자, jupyter notebook, LLaMA-Adapter, 거인의포트폴리오, mdd, ChatGPT, GPT, 삼프로tv, vscode, etf,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

AI, Quant 투자 공부

2023/07/21: MQA, LLaMA2, Flashattention2

Multi-Query Attention is All You Need

LLaMA 2: The Dawn of a New Era

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Can Longer Sequences Help Take the Next Leap in AI?

How does in-context learning work? A framework for understanding the differences from traditional supervised learning

Generative AI - Learn the LangChain Basics by Building a Berlin Travel Guide

Augmenting Language Models with Long-Term Memory

'Daily-Trend-Review' 카테고리의 다른 글

'Daily-Trend-Review'의 다른글

티스토리툴바

2023/07/21: MQA, LLaMA2, Flashattention2

Multi-Query Attention is All You Need

LLaMA 2: The Dawn of a New Era

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Can Longer Sequences Help Take the Next Leap in AI?

How does in-context learning work? A framework for understanding the differences from traditional supervised learning

Generative AI - Learn the LangChain Basics by Building a Berlin Travel Guide

Augmenting Language Models with Long-Term Memory

'Daily-Trend-Review' 카테고리의 다른 글

'Daily-Trend-Review'의 다른글

관련글

티스토리툴바