1. Introduction
  2. Table of Contents
  3. 1. Positional Embeddings
    1. 1.1. Fixed Positional Embeddings
    2. 1.2. Learned Positional Embeddings
    3. 1.3. Rotary Positional Embeddings (RoPE)
    4. 1.4. No Positional Embeddings (NoPE)
  4. 2. Attention
    1. 2.1. Multi-Headed Attention (MHA)
    2. 2.2. Multi-Query Attention (MQA)
    3. 2.3. Grouped-Query Attention (GQA)
    4. 2.4. Sliding-Window Attention (SWA)
    5. 2.5. Attention Sink
    6. 2.6. KV Cache
  5. 3. Sampling
    1. 3.1. Top-K
    2. 3.2. Top-P
    3. 3.3. Temperature
    4. 3.4. Speculative Sampling
  6. 4. Finetuning
    1. 4.1. Low Rank Adaptation (LoRA) of LLMs
    2. 4.2. Efficient Finetuning of Quantized LLMs (QLoRA)
    3. 4.3. Reinforcement Learning from Human Feedback (RLHF)
    4. 4.4. Reinforcement Learning from AI Feedback (RLAIF)
  7. 5. Prompting
    1. 5.1. Chain of Thought (CoT)
    2. 5.2. Tree of Thought (ToT)
  8. 6. Batching
    1. 6.1. Continuous Batching

The Large Language Model Playbook

Summary

Introduction

Table of Contents

  • Positional Embeddings
    • Fixed Positional Embeddings
    • Learned Positional Embeddings
    • Rotary Positional Embeddings (RoPE)
    • No Positional Embeddings (NoPE)
  • Attention
    • Multi-Headed Attention (MHA)
    • Multi-Query Attention (MQA)
    • Grouped-Query Attention (GQA)
    • Sliding-Window Attention (SWA)
    • Attention Sink
    • KV Cache
  • Sampling
    • Top-K
    • Top-P
    • Temperature
    • Speculative Sampling
  • Finetuning
    • Low Rank Adaptation (LoRA) of LLMs
    • Efficient Finetuning of Quantized LLMs (QLoRA)
    • Reinforcement Learning from Human Feedback (RLHF)
    • Reinforcement Learning from AI Feedback (RLAIF)
  • Prompting
    • Chain of Thought (CoT)
    • Tree of Thought (ToT)
  • Batching
    • Continuous Batching