Temperature

Temperature is a hyperparameter used to control the randomness in the probabilistic sampling of tokens (words, in most cases) from a distribution. It's applied to the logits (the raw scores or predictions) before the Softmax operation. Intuitively, you can think of the temperature as a knob to adjust how conservatively or liberally you want to sample the next token.

Here's the basic formula to apply temperature:

import numpy as np

def apply_temperature(logits, temperature):
    logits = logits / temperature  # Apply temperature scaling
    probs = np.exp(logits) / np.sum(np.exp(logits))  # Softmax to get probabilities
    return np.random.choice(np.arange(len(logits)), p=probs)  # Sample from the distribution

Let's break down what happens:

Scaling: The logits are divided by the temperature. Lower temperature (< 1) makes the model more confident in its top choices, whereas a higher temperature (> 1) makes the model more uncertain, effectively flattening the distribution.
Softmax: After scaling, the logits are transformed into probabilities using the Softmax function.
Sampling: Finally, a word is sampled from this distribution.

When to Use Temperature Scaling

Temperature is widely applicable across different sampling methods and provides fine-grained control over the randomness of output text. Whether you are using greedy decoding, Top-K, or nucleus sampling, adding a temperature parameter can help you adjust the output to meet specific quality-diversity criteria.

Limitations and Considerations

Hyperparameter Tuning: The choice of temperature can have a significant impact on your results.
Context-Insensitive: Temperature scaling is not adaptive to the context, which may or may not be a limitation based on your use-case.

The Large Language Model Playbook

Temperature

When to Use Temperature Scaling

Limitations and Considerations