https://developer.nvidia.com/blog/rag-101-demystifying-retrieval-augmented-generation-pipelines/
https://developer.nvidia.com/blog/rag-101-retrieval-augmented-generation-questions-answered/
https://github.com/NVIDIA/GenerativeAIExamples/blob/main/docs/rag/architecture.md
'Daily-Trend-Review' 카테고리의 다른 글
2023/12/25: Towards 100x Speedup: Full Stack Transformer Inference Optimization (0) | 2023.12.25 |
---|---|
2023/12/23: optimizing your llm in production (0) | 2023.12.23 |
2023/12/23: how to make LLMs go fast (0) | 2023.12.23 |
2023/12/18: Mixtral 8x7B (1) | 2023.12.18 |
2023/12/14: Prompt Cache: Modular Attention Reuse For Low-Latency Inference (1) | 2023.12.14 |