# Temperature

Temperature is a hyperparameter used to control the randomness in the probabilistic sampling of tokens (words, in most cases) from a distribution. It's applied to the logits (the raw scores or predictions) before the Softmax operation. Intuitively, you can think of the temperature as a knob to adjust how conservatively or liberally you want to sample the next token.

Here's the basic formula to apply temperature:

```
import numpy as np
def apply_temperature(logits, temperature):
logits = logits / temperature # Apply temperature scaling
probs = np.exp(logits) / np.sum(np.exp(logits)) # Softmax to get probabilities
return np.random.choice(np.arange(len(logits)), p=probs) # Sample from the distribution
```

Let's break down what happens:

**Scaling:**The logits are divided by the temperature. Lower temperature (< 1) makes the model more confident in its top choices, whereas a higher temperature (> 1) makes the model more uncertain, effectively flattening the distribution.**Softmax:**After scaling, the logits are transformed into probabilities using the Softmax function.**Sampling:**Finally, a word is sampled from this distribution.

### When to Use Temperature Scaling

Temperature is widely applicable across different sampling methods and provides fine-grained control over the randomness of output text. Whether you are using greedy decoding, Top-K, or nucleus sampling, adding a temperature parameter can help you adjust the output to meet specific quality-diversity criteria.

### Limitations and Considerations

**Hyperparameter Tuning:**The choice of temperature can have a significant impact on your results.**Context-Insensitive:**Temperature scaling is not adaptive to the context, which may or may not be a limitation based on your use-case.