AI, Quant 투자 공부

전체 글 135

2023/07/24: LongNet

Microsoft Just Showed us the Future of ChatGPT with LongNet Context Length (Sequence Length) Sequence Length와 계산 비용 사이에 2차 관계에 있음 입력 텍스트 시퀀스의 길이를 두배로 늘리면 챗봇을 실행하는데 드는 비용은 4배가 됨 결과적으로 LLM을 구축하는 AI 회사는 입력 시퀀스의 최대 크기를 제한할 수 밖에 없음 Sequence length의 중요성 입력이 풍부할수록 결과가 더 좋아짐 →프롬프트에 제공되지 않는 한 질문에 대답할 때 챗봇은 학습 중에 얻은 weitht에 포함된 지식을 신뢰함 하지만 모델은 거의 필터링없이 인터넷 텍스트의 상당 부분을 학습하였음 pre-trained 모델에서 사전 학습된 지식에 대해 ..

Daily-Trend-Review 2023.07.24

2023/07/21(2) : In-Context Learning, Emergent Abilities,

1. Reducing LLM Costs & Latency with Semantic Cache source: https://portkey.ai/blog/reducing-llm-costs-and-latency-semantic-cache 2. In-Context Learning Approaches in Large Language Models source: https://towardsdatascience.com/in-context-learning-approaches-in-large-language-models-9c0c53b116a1 3. Llama 2: Open Foundation and Fine-Tuned Chat Models source: https://arxiv.org/pdf/2307.09288.pdf 4..

Daily-Trend-Review 2023.07.21

2023/07/21: MQA, LLaMA2, Flashattention2

Multi-Query Attention is All You Need source: https://blog.fireworks.ai/multi-query-attention-is-all-you-need-db072e758055 Multi-Query Attention is All You Need by James K Reed, Dmytro Dzhulgakov, Dmytro Ivchenko, and Lin Qiao blog.fireworks.ai LLaMA 2: The Dawn of a New Era source: https://betterprogramming.pub/the-dawn-of-a-new-era-llama2-b0b1a9175029 LLaMA 2: The Dawn of a New Era Key differe..

Daily-Trend-Review 2023.07.21

2023/07/18: Long Sequence

Can Longer Sequences Help Take the Next Leap in AI? source: https://ai.stanford.edu/blog/longer-sequences-next-leap-ai/ Transformer의 long sequence 지원 sequence 길이를 늘리는 것은 성능과 품질 이유로 인해

Daily-Trend-Review 2023.07.18

2023/07/16: LLM에 대한 실용적인 소개 등

https://medium.com/towards-data-science/a-practical-introduction-to-llms-65194dda1148 A Practical Introduction to LLMs 3 levels of using LLMs in practice towardsdatascience.com A Practical Introduction to LLMs LLM이 특별한 이유 정량적으로 LLM을 구별하는 것은 모델에 사용되는 파라미터의 수 10B ~ 100B개의 파라미터를 가지고 있음 질적으로 LM이 커지게 되면 창발적인 속성이 나타남 LM이 충분히 큰 사이즈에 도달하면 갑자기 나타나는 속성임 Zero-shot Learning GPT-3의 주요 혁신은 다양한 상황에서 Zero-shot ..

Daily-Trend-Review 2023.07.16

2023/07/11: GPT-4, Longnet, knowledge base

1. Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System source: https://arxiv.org/pdf/2304.13343.pdf 2. GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE source: https://www.semianalysis.com/p/gpt-4-architecture-infrastructure 3. LONGNET: Scaling Transformers to 1,000,000,000 Tokens source: https://arxiv.org/pdf/2307...

Daily-Trend-Review 2023.07.11

2023/07/10: An Infinite Memory ChatGPT?

https://medium.com/@ignacio.de.gregorio.noblejas/is-tiktok-planning-an-infinite-memory-chatgpt-c195b1a6eced Is TikTok Planning An Infinite Memory ChatGPT? ByteDance Research Hints At It medium.com Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System https://arxiv.org/pdf/2304.13343.pdf 바이트댄스는 SCM (Self-Controlled Memory)를 발표하고 챗봇을 무한 입력이 가능..

Daily-Trend-Review 2023.07.10

2023/07/07: SW 애플리케이션에서 대규모 언어모델 활용

→https://medium.com/@simon_attard/leveraging-large-language-models-in-your-software-applications-9ea520fb2f34 Leveraging Large Language Models in your Software Applications How can you leverage the capabilities of Large Language Models (LLMs) within your software applications? medium.com Overview LLM 위에 단순히 얇은 응용 프로그램을 구축하는 것은 다음과 같은 문제가 있음 사용자에 대한 응답은 예측할 수 없으며 환각을 포함 응답은 애플리케이션의 데이터 및 사용 사례에 근..

Daily-Trend-Review 2023.07.07

2023/07/06: Vector DB, Transformer, Context Window, vLLM 등

1. Vector databases source: https://medium.com/aimonks/vector-databases-7d46054e933 2. Leveraging Large Language Models in your Software Applications source: https://medium.com/@simon_attard/leveraging-large-language-models-in-your-software-applications-9ea520fb2f34 3. GPT in 60 Lines of NumPy source: https://jaykmody.com/blog/gpt-from-scratch/#gpt-architecture 4. hatGPT의 전두엽(장기기억 저장소)으로 각광받고 있는..

Daily-Trend-Review 2023.07.06

2023/07/01: Emerging Architectures for LLM Applications

https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/ Emerging Architectures for LLM Applications | Andreessen Horowitz A reference architecture for the LLM app stack. It shows the most common systems, tools, and design patterns used by AI startups and tech companies. a16z.com 디자인 패턴: In-Context Learning In-Context Learning의 핵심 아이디어 Fine-tuning 없이 LLMs를 사용 대신 개인 컨텍스트 데이터에 대한 영..

Daily-Trend-Review 2023.07.01

1 ··· 4 5 6 7 8 9 10 ··· 14

AI, Quant 투자 공부

글쓰기 좋아하는 AI 엔지니어의 AI와 Quant 투자 스터디를 위한 공간

llm, gpt-4, training, jupyter notebook, ChatGPT, QLORA, transformer, GPT, LLaMA-Adapter, Generative-AI, 삼프로tv, 퀀트투자, 거인의포트폴리오, 강환국, etf, mdd, llma, State of GPT, vscode, 정채진프로,

Today :
Yesterday :

« 2025/11 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

전체 글 135

티스토리툴바