2023/08/28: inference optimization

Daily-Trend-Review

2023/08/28: inference optimization

hellcat 2023. 8. 28. 00:37

Full Stack Optimization of Transformer Inference: a Survey

How long Can Open-Source LLMs Truly Promise on Context Length?

How Long Can Open-Source LLMs Truly Promise on Context Length? | LMSYS Org

<p>In this blogpost, we introduce our latest series of chatbot models, LongChat-7B and LongChat-13B, featuring a new level of extended context length up to 1...

lmsys.org

EFFICIENTLY SCALING TRANSFORMER INFERENCE

Dissecting Batching Effects in GPT Inference

Machine learning models relying on batching to improve inference throughput, especially for smaller computer vision models such as ResNet and DenseNet. GPT, as well as other large language models (LLMs), is the hottest model these days. Does batching still

le.qun.ch

Accelerating transformer inference on my RTX 4090

Justifying an unjustifiable purchase. (I am financially ruined)

www.ericjwang.com

GPU Performance Background User's Guide

GPU Performance Background User's Guide - NVIDIA Docs

The GPU is a highly parallel processor architecture, composed of processing elements and a memory hierarchy. At a high level, NVIDIA® GPUs consist of a number of Streaming Multiprocessors (SMs), on-chip L2 cache, and high-bandwidth DRAM. Arithmetic and ot

docs.nvidia.com

Code Llama: Open Foundation Models for Code

Math-Bound VS Memory-Bound Operations

Computation Bandwidth, Memory Bandwidth, and Data Reuse

leimao.github.io

Transformer Inference Arithmetic

Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking

Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute

GPTQ or bitsandbytes: Which Quantization Method to Use for LLMs — Examples with Llama 2

Large language model quantization for affordable fine-tuning and inference on your computer

towardsdatascience.com

Understanding QLoRA & LoRA: Fine-tuning of LLMs

In this short note, we gently review LoRA [1] and QLoRA [2] papers. Fine-tuning LLMs is a popular subject these days. These two papers have…

medium.com

'Daily-Trend-Review' 카테고리의 다른 글

2023/09/18: Textbooks Are All You Need 등 (0)	2023.09.18
2023/09/10: LLM 경제학 (0)	2023.09.10
2023/08/20: llama2 inference, continuous batching 등 (0)	2023.08.20
2023/08/16: 딥러닝 병렬처리 (0)	2023.08.16
2023/08/08: GPT-3.5와 Llama2 비교, 벡터 DB, long contexts (0)	2023.08.08

현재글2023/08/28: inference optimization

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

Generative-AI, GPT, QLORA, 강환국, 거인의포트폴리오, vscode, llma, 삼프로tv, 퀀트투자, transformer, 정채진프로, llm, ChatGPT, training, State of GPT, mdd, gpt-4, etf, jupyter notebook, LLaMA-Adapter,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

AI, Quant 투자 공부