Full Stack Optimization of Transformer Inference: a Survey
How long Can Open-Source LLMs Truly Promise on Context Length?
How Long Can Open-Source LLMs Truly Promise on Context Length? | LMSYS Org
<p>In this blogpost, we introduce our latest series of chatbot models, LongChat-7B and LongChat-13B, featuring a new level of extended context length up to 1...
lmsys.org
EFFICIENTLY SCALING TRANSFORMER INFERENCE
Dissecting Batching Effects in GPT Inference
Dissecting Batching Effects in GPT Inference
Machine learning models relying on batching to improve inference throughput, especially for smaller computer vision models such as ResNet and DenseNet. GPT, as well as other large language models (LLMs), is the hottest model these days. Does batching still
le.qun.ch
Accelerating transformer inference on my RTX 4090
Accelerating transformer inference on my RTX 4090
Justifying an unjustifiable purchase. (I am financially ruined)
www.ericjwang.com
GPU Performance Background User's Guide
GPU Performance Background User's Guide - NVIDIA Docs
The GPU is a highly parallel processor architecture, composed of processing elements and a memory hierarchy. At a high level, NVIDIA® GPUs consist of a number of Streaming Multiprocessors (SMs), on-chip L2 cache, and high-bandwidth DRAM. Arithmetic and ot
docs.nvidia.com
Code Llama: Open Foundation Models for Code
Math-Bound VS Memory-Bound Operations
Math-Bound VS Memory-Bound Operations
Computation Bandwidth, Memory Bandwidth, and Data Reuse
leimao.github.io
Transformer Inference Arithmetic
Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking
GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2
GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2
Large language model quantization for affordable fine-tuning and inference on your computer
towardsdatascience.com
Understanding QLoRA & LoRA: Fine-tuning of LLMs
Understanding QLoRA & LoRA: Fine-tuning of LLMs
In this short note, we gently review LoRA [1] and QLoRA [2] papers. Fine-tuning LLMs is a popular subject these days. These two papers have…
medium.com
'Daily-Trend-Review' 카테고리의 다른 글
2023/09/18: Textbooks Are All You Need 등 (0) | 2023.09.18 |
---|---|
2023/09/10: LLM 경제학 (0) | 2023.09.10 |
2023/08/20: llama2 inference, continuous batching 등 (0) | 2023.08.20 |
2023/08/16: 딥러닝 병렬처리 (0) | 2023.08.16 |
2023/08/08: GPT-3.5와 Llama2 비교, 벡터 DB, long contexts (0) | 2023.08.08 |