1. High-throughput Generative Inference of Large Language Models with a Single GPU source: https://arxiv.org/pdf/2303.06865.pdf 2. Deploying Large NLP Models: Infrastructure Cost Optimization source: https://neptune.ai/blog/nlp-models-infrastructure-cost-optimization 3. What Are Transformer Models and How Do They Work? source: https://txt.cohere.com/what-are-transformer-models/ 4. Efficient Tran..