
2023/07/21: MQA, LLaMA2, Flashattention2

hellcat 2023. 7. 21. 08:12

Multi-Query Attention is All You Need

source: https://blog.fireworks.ai/multi-query-attention-is-all-you-need-db072e758055


Multi-Query Attention is All You Need

by James K Reed, Dmytro Dzhulgakov, Dmytro Ivchenko, and Lin Qiao



LLaMA 2: The Dawn of a New Era

source: https://betterprogramming.pub/the-dawn-of-a-new-era-llama2-b0b1a9175029


LLaMA 2: The Dawn of a New Era

Key differences from LLaMA 1, safety & violations, Ghost Attention and model performance.


FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

source: https://crfm.stanford.edu/2023/07/17/flash2.html


Stanford CRFM

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning Just within the last year, there have been several language models with much longer context than before: GPT-4 with context length 32k, MosaicML’s MPT with context length 65



Can Longer Sequences Help Take the Next Leap in AI?

source: https://ai.stanford.edu/blog/longer-sequences-next-leap-ai/


Can Longer Sequences Help Take the Next Leap in AI?

Deep learning has revolutionized machine learning. To a first approximation, deeper has been better. However, there is another dimension to scale these models: the size of the input. Even the world’s most impressive models can only process long-form cont


How does in-context learning work? A framework for understanding the differences from traditional supervised learning

source: https://ai.stanford.edu/blog/understanding-incontext/


How does in-context learning work? A framework for understanding the differences from traditional supervised learning

The official Stanford AI Lab blog



Generative AI - Learn the LangChain Basics by Building a Berlin Travel Guide

source: https://medium.com/google-cloud/generative-ai-learn-the-langchain-basics-by-building-a-berlin-travel-guide-5cc0a2ce4096


Generative AI - Learn the LangChain Basics by Building a Berlin Travel Guide

LangChain is a framework that’s like a Swiss army knife for large language models (LLMs).



Augmenting Language Models with Long-Term Memory

source: https://arxiv.org/abs/2306.07174


Augmenting Language Models with Long-Term Memory

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Lon
