
LLM Settings ⚙️
Understanding the core parameters that control LLM behavior, including Temperature, Top P, Max Length, and Penalties. Learn how to tune these settings for deterministic or creative outputs.
This content is adapted from Prompting Guide: LLM Settings. It has been curated and organized for educational purposes on this portfolio. No copyright infringement is intended.
Configuring the LLM
When designing and testing prompts, you typically interact with the LLM via an API or a playground. You can configure several parameters to get different results. Tweaking these settings is crucial for improving the reliability and desirability of responses.
1. Temperature
Temperature controls the "" of the output.
- Lower Temperature (e.g., 0.1 - 0.3): Makes the model . It will consistently pick the highest probability next token. Best for factual QA, data extraction, and technical tasks.
- Higher Temperature (e.g., 0.7 - 1.0): Encourages randomness and creativity. Best for poem .
Recommendation: Use a lower temperature for tasks where accuracy is paramount, and a higher temperature for creative tasks.
2. Top P (Nucleus Sampling)
Top P is an . It restricts the model to only consider tokens that make up a certain percentage of the probability mass.
- Low Top P (e.g., 0.1): Only considers the most confident tokens.
- High Top P (e.g., 0.9): Allows the model to look at a broader range of possible words.
Note: It is generally recommended to alter either Temperature or Top P, but not both.
3. Max Length
This setting manages .
- Purpose: Prevents long-winded or irrelevant responses and helps control API costs.
- Units: Measured in tokens (roughly 4 characters per token in English).
4. Stop Sequences
A stop sequence is a specific string (like \n or User:) that tells the model to stop generating any further tokens.
- Example: If you want a model to generate a single list item, you might set the stop sequence to
2.to ensure it doesn't start a second item.
5. Penalties (Frequency & Presence)
These parameters help reduce word repetition and encourage diversity.
- Frequency Penalty: Penalties are applied to tokens based on how many times they have already appeared in the text. The .
- Presence Penalty: . This encourages the model to talk about new topics/words rather than sticking to the same ones.
Comparison: Frequency penalty is "incremental" (more usage = higher penalty), while Presence penalty is "boolean" (appeared? yes = penalty).
Conclusion
Before starting with complex prompts, remember that results may vary depending on the specific model version (e.g., gpt-4o vs gpt-3.5-turbo). Finding the "sweet spot" for these settings usually requires a bit of experimentation for your specific use case.