Temperature, Top-K and Top-P visualization

Parameters

Prompt

Temperature 1.0 Top-K 6 Top-P 1.0

Next Token Probabilities

Understand this visualization

This visualization shows the probabilities for the top 10 most likely next tokens for the selected prompt, and how those probabilities are affected by changing the temperature and top-k parameters.

Parameters

Temperature: The temperature parameter controls the randomness of the predictions. A temperature of 1.0 is the default probability distribution generated by the model. When increasing the temperature, the distribution becomes more even - tokens that were less likely to be selected become more likely, and tokens that were more likely to be selected become less likely, which makes the output of the model more creative. Conversely, decreasing the temperature causes the tokens that were more likely to be selected to become even more likely, and tokens that were less likely to be even less likely, making the model output more predictable.
Top-K: The top-k parameter controls the number of tokens to consider when generating the probabilities for the next token to be selected. A top-k of 1 will always return the most likely token, while a top-k of 10 will consider all 10 tokens shown here.
Top-P (Nucleus Sampling): The top-p parameter filters the vocabulary considered for the next token based on cumulative probability. It selects the smallest set of the most probable tokens whose cumulative probability exceeds the threshold `p`. For example, if `p=0.9`, it keeps adding the most likely tokens until their combined probability reaches 0.9, and then discards the rest. A `p` of 1.0 considers all tokens. When used with Top-K, only tokens that satisfy *both* conditions are kept.

While this visualization only shows the top 10 tokens for the prompt, in reality the large language model will return the likelyhood for thousands of tokens - 262,144 in the case of Gemma 3 1B, the model used to generate the list of tokens.

Note that, while the data in this visualization was extracted from a a real LLM Prompt, the values are hard-coded, and there's no LLM running in the background. To learn more about how the data was generated using Gemma 3 - 1B, take a look at this Colab.