transformer 3

2023/06/22: Generative AI 등

1. Generative AI - Document Retrieval and Question Answering with LLMs source: https://medium.com/google-cloud/generative-ai-document-retrieval-and-question-answering-with-llms-2b0fb80ae76d Fine-tuning vs Indexing New Documents Fine-tuning은 몇 시간이 걸리는 반면, Indexing은 실시간으로 이용할 수 있다. Context Size Limiation 대부분의 LLM은 4K 토큰만을 허용하므로 많은 양의 데이터를 제공할 수 없음. Indexing 접근 방식을 사용하면 관련 문서와 유사한 문서를 검색. LLM은 무제한 ..

Daily-Trend-Review 2023.06.22

2023/05/07: Single GPU로 LLM 추론하기, 효율적인 Transformers 등

1. High-throughput Generative Inference of Large Language Models with a Single GPU source: https://arxiv.org/pdf/2303.06865.pdf 2. Deploying Large NLP Models: Infrastructure Cost Optimization source: https://neptune.ai/blog/nlp-models-infrastructure-cost-optimization 3. What Are Transformer Models and How Do They Work? source: https://txt.cohere.com/what-are-transformer-models/ 4. Efficient Tran..

Daily-Trend-Review 2023.05.07

2023/03/06: LLaMA, OpenAI ChatGPT&Whisper APIs 등

1. Full Stack Optimization of Transformer Inference: a Survey source: https://arxiv.org/pdf/2302.14017.pdf UC Berkeley 저자들이 참여하여 쓴 Survey 논문 주요 내용 Transformer Model Architecture and Performance Bottlenecks HW Design Model Optimization Mapping Transformers To HW 2. LLaMA Test!! Local Machine source: http://rentry.org//llama-tard LLaMA의 weight를 이용하여 Single Card에서 수행할 수 있도록 함 (LLaMA INT8 Inferencde..

Daily-Trend-Review 2023.03.06