How continuous batching enables 23x throughput in LLM inference while reducing p50 latency
Why GPT-3.5 is (mostly) cheaper than Llama2
Implementation of Llama v2.0, FAISS in Python using LangChain
Optimize LLM Enterprise Applications through Embeddings and Chunking Strategy.
'Daily-Trend-Review' 카테고리의 다른 글
2023/09/10: LLM 경제학 (0) | 2023.09.10 |
---|---|
2023/08/28: inference optimization (0) | 2023.08.28 |
2023/08/16: 딥러닝 병렬처리 (0) | 2023.08.16 |
2023/08/08: GPT-3.5와 Llama2 비교, 벡터 DB, long contexts (0) | 2023.08.08 |
2023/07/31: Aligning LLMs 등 (0) | 2023.07.31 |