'분류 전체보기' 카테고리의 글 목록 (3 Page)

분류 전체보기 135

2023/12/23: optimizing your llm in production

https://huggingface.co/blog/optimize-llm Optimizing your LLM in production Optimizing your LLM in production Note: This blog post is also available as a documentation page on Transformers. Large Language Models (LLMs) such as GPT3/4, Falcon, and LLama are rapidly advancing in their ability to tackle human-centric tasks, establish huggingface.co

Daily-Trend-Review 2023.12.23

2023/12/23: RAG 101

https://developer.nvidia.com/blog/rag-101-demystifying-retrieval-augmented-generation-pipelines/ RAG 101: Demystifying Retrieval-Augmented Generation Pipelines | NVIDIA Technical Blog Large language models (LLMs) have impressed the world with their unprecedented capabilities to comprehend and generate human-like responses. Their chat functionality provides a fast and natural… developer.nvidia.co..

Daily-Trend-Review 2023.12.23

2023/12/23: how to make LLMs go fast

https://vgel.me/posts/faster-inference/ Table of Contents Why is simple inference so slow? Metrics Hardware Compilers Batching Continuous Batching Shrinking model weights 16 bit floats Even smaller! KV caching Multi-Query Attention PagedAttention Speculative Decoding Threshold decoding? Staged speculative decoding Guided generation Lookahead decoding Prompt lookup decoding Training time optimiza..

Daily-Trend-Review 2023.12.23

2023/12/18: Mixtral 8x7B

https://mistral.ai/news/mixtral-of-experts Mixtral of experts A high quality Sparse Mixture-of-Experts. mistral.ai Total Parameteters : 46.7B 실제 토큰 생성 시 활성화되는 파라미터는 12.9B Perfromance 벤치마크 결과, LLaMA2 70B과 GPT-3.5에 비해 더 좋은 성능을 보여준다.

Daily-Trend-Review 2023.12.18

2023/12/14: Prompt Cache: Modular Attention Reuse For Low-Latency Inference

https://arxiv.org/abs/2311.04934 Prompt Cache: Modular Attention Reuse for Low-Latency Inference We present Prompt Cache, an approach for accelerating inference for large language models (LLM) by reusing attention states across different LLM prompts. Many input prompts have overlapping text segments, such as system messages, prompt templates, and docu arxiv.org 선행 연구 문제 제기 많은 입력 프롬프트는 겹치는 텍스트 세그..

Daily-Trend-Review 2023.12.14

2023/12/12: chip cloud 논문

논문: chipset cloud: building AI Supercomputers for Serving Large Generative Language Models GPU의 문제점 GPU에서 LLM을 제공하는 것은 확장성 측면에서 어려움 GPT-3 처리량 새로운 LLM용 chiplet 기반 ASIC AI 슈퍼컴퓨터 아키텍처-chiplet cloud를 제안함 LLM에 실행에 따른 자본 지출과 에너지 비용을 모두 해결하려면 Total Cost of Ownership(TCO) per token을 달성하는 하드웨어 시스템을 설계해야 함 이 논문의 기여 섹션 #2: 생성 LLM을 서빙하기 위해 현재 하드웨어에 대한 연구, ASIC 슈퍼컴퓨터 구축의 필요성에 대한 동기 부여 섹션 #3: 더 나은 TCO/token을 ..

Daily-Trend-Review 2023.12.14

2023/12/11: LLM and Transformers Series

LLM and Transformers Series: Part 1 — Are LLMs Just a Memory Trick? Part 2 — LLMs; Beyond Memorization Part 3 — Mathematically Assessing Closed-LLMs for Generalization Part 4 — Enhancing Safety in LLMs: A Rigorous Mathematical Examination of Jailbreaking Part 5 — In-Depth Analysis of Red Teaming in LLMs: A Mathematical and Empirical Approach Part 6 — Adversarial Attacks on LLM. A Mathematical an..

Daily-Trend-Review 2023.12.11

2023/12/11: LLM Visualization

https://bbycroft.net/llm LLM Visualization bbycroft.net

Daily-Trend-Review 2023.12.11

2023/12/11: Reproducible Performance Metrics for LLM inference

https://www.anyscale.com/blog/reproducible-performance-metrics-for-llm-inference Reproducible Performance Metrics for LLM inference Anyscale is releasing LLMPerf for benchmarking LLMs on current LLM offerings. See benchmarking results for Anyscale Endpoints vs Fireworks.ai. www.anyscale.com LLM의 정량적인 성능 지표 분당 완료된 요청 (requests/sec) TTFT(Time To First Token) ITL(Inter-Token Latency) End-to-End Lat..

Daily-Trend-Review 2023.12.11

2023/12/10: 아이패드에서 colab 사용법

https://colab.research.google.com/notebooks/ Google Colaboratory Notebook Run, share, and edit Python notebooks colab.research.google.com 레퍼런스: https://st-soul.tistory.com/104

Daily-Trend-Review 2023.12.10

1 2 3 4 5 6 ··· 14

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

삼프로tv, 강환국, mdd, jupyter notebook, training, 정채진프로, ChatGPT, State of GPT, 거인의포트폴리오, Generative-AI, 퀀트투자, llm, transformer, GPT, llma, vscode, gpt-4, QLORA, LLaMA-Adapter, etf,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

분류 전체보기 135

티스토리툴바