
24/02/04: fine-tune your lown llama 2 model in a colab note book

hellcat 2024. 2. 4. 22:07

Fine-Tune Your Own Llama 2 Model in a Colab


ML Blog - Fine-Tune Your Own Llama 2 Model in a Colab Notebook




Decoding Strategies in Large Language Models


ML Blog - Decoding Strategies in Large Language Models




Introduction to Weight Quantization


ML Blog - Introduction to Weight Quantization




LLM Inference Series: 4. KV caching, a deeper look


LLM Inference Series: 4. KV caching, a deeper look

In this post, we will look at how big the KV cache, a common optimization for LLM inference, can grow and at common mitigation strategies.



LLM Inference Series: 5. Dissecting model performance


LLM Inference Series: 5. Dissecting model performance

In this post, we look deeper into the different types of bottleneck that affect model latency and explain what arithmetic intensity is.



How GPT models work: for data scientists and ML engineers


Bea Stollnitz - How GPT models work: for data scientists and ML engineers

Learn Azure ML and machine learning with Bea Stollnitz.



The Transformer architecture of GPT models


Bea Stollnitz - The Transformer architecture of GPT models

Learn Azure ML and machine learning with Bea Stollnitz.



Some intuitions about large language models


Some intuitions about large language models — Jason Wei

An open question these days is why large language models work so well. In this blog post I will discuss six basic intuitions about large language models. Many of them are inspired by manually examining data, which is an exercise that I’ve found helpful a
