Introduction
Table of Contents
1.
Positional Embeddings
1.1.
Fixed Positional Embeddings
1.2.
Learned Positional Embeddings
1.3.
Rotary Positional Embeddings (RoPE)
1.4.
No Positional Embeddings (NoPE)
2.
Attention
2.1.
Multi-Headed Attention (MHA)
2.2.
Multi-Query Attention (MQA)
2.3.
Grouped-Query Attention (GQA)
2.4.
Sliding-Window Attention (SWA)
2.5.
Attention Sink
2.6.
KV Cache
3.
Sampling
3.1.
Top-K
3.2.
Top-P
3.3.
Temperature
3.4.
Speculative Sampling
4.
Finetuning
4.1.
Low Rank Adaptation (LoRA) of LLMs
4.2.
Efficient Finetuning of Quantized LLMs (QLoRA)
4.3.
Reinforcement Learning from Human Feedback (RLHF)
4.4.
Reinforcement Learning from AI Feedback (RLAIF)
5.
Prompting
5.1.
Chain of Thought (CoT)
5.2.
Tree of Thought (ToT)
6.
Batching
6.1.
Continuous Batching
Light (default)
Rust
Coal
Navy
Ayu
The Large Language Model Playbook
Summary
Introduction
Table of Contents
Positional Embeddings
Fixed Positional Embeddings
Learned Positional Embeddings
Rotary Positional Embeddings (RoPE)
No Positional Embeddings (NoPE)
Attention
Multi-Headed Attention (MHA)
Multi-Query Attention (MQA)
Grouped-Query Attention (GQA)
Sliding-Window Attention (SWA)
Attention Sink
KV Cache
Sampling
Top-K
Top-P
Temperature
Speculative Sampling
Finetuning
Low Rank Adaptation (LoRA) of LLMs
Efficient Finetuning of Quantized LLMs (QLoRA)
Reinforcement Learning from Human Feedback (RLHF)
Reinforcement Learning from AI Feedback (RLAIF)
Prompting
Chain of Thought (CoT)
Tree of Thought (ToT)
Batching
Continuous Batching