1. Full Stack Optimization of Transformer Inference: a Survey source: https://arxiv.org/pdf/2302.14017.pdf UC Berkeley 저자들이 참여하여 쓴 Survey 논문 주요 내용 Transformer Model Architecture and Performance Bottlenecks HW Design Model Optimization Mapping Transformers To HW 2. LLaMA Test!! Local Machine source: http://rentry.org//llama-tard LLaMA의 weight를 이용하여 Single Card에서 수행할 수 있도록 함 (LLaMA INT8 Inferencde..