# Top-P

While Top-K sampling restricts the sampling pool to the K most likely next words, Top-P sampling, also known as "nucleus sampling," adds a twist. Instead of specifying a set number of top candidates (K), you specify a probability mass (P) and sample only from the smallest group of words that have a collective probability greater than P.

Let's implement Top-P sampling using a NumPy function for better understanding:

```
import numpy as np
def top_p_sampling(logits, p):
sorted_indices = np.argsort(logits) # Sort logits
sorted_probs = np.exp(logits[sorted_indices]) / np.sum(np.exp(logits)) # Convert sorted logits to probabilities
cum_probs = np.cumsum(sorted_probs) # Calculate the cumulative probability
valid_indices = np.where(cum_probs >= (1 - p))[0] # Get valid indices where cumulative probability is above threshold
if len(valid_indices) > 0:
min_valid_index = valid_indices[0]
mask = sorted_indices[min_valid_index:] # Mask for valid logits
else:
mask = sorted_indices[-1:] # If no valid indices, select the last one (highest probability)
selected_index = np.random.choice(mask) # Randomly select an index from the valid set
return selected_index
```

Here's the step-by-step breakdown:

**Sort and Convert:**Sort the logits and convert them to probabilities.**Cumulative Sum:**Calculate the cumulative sum of the sorted probabilities.**Thresholding:**Identify the subset of words whose collective probability mass exceeds the given threshold (P).**Sampling:**Randomly sample the next word from this set of valid candidates.

### When to Use Top-P Sampling

Top-P sampling is particularly useful when you want more adaptive and context-sensitive text generation. Unlike Top-K, which has a fixed number of candidates, Top-P allows for a variable number of candidates based on the context, making it more flexible.

### Limitations and Considerations

- Computational Cost: The sorting operation increases the computational cost slightly compared to Top-K sampling.
- Hyperparameter Sensitivity: The choice of P can significantly influence the generated text. A smaller P will make the text more random, while a larger P will make it more deterministic. Top-P sampling provides an adaptive method for balancing the trade-off between diversity and informativeness in generated text. It has gained popularity in several NLP applications, from automated customer service to creative writing aids.