Engineering

LLM Settings ⚙️

Understanding the core parameters that control LLM behavior, including Temperature, Top P, Max Length, and Penalties. Learn how to tune these settings for deterministic or creative outputs.

Driptanil DattaSoftware Developer

Mar 20265 min read

🌍

References & Disclaimer

This content is adapted from Prompting Guide: LLM Settings. It has been curated and organized for educational purposes on this portfolio. No copyright infringement is intended.

Configuring the LLM

When designing and testing prompts, you typically interact with the LLM via an API or a playground. You can configure several parameters to get different results. Tweaking these settings is crucial for improving the reliability and desirability of responses.

1. Temperature

Temperature controls the "randomness" of the output.

Lower Temperature (e.g., 0.1 - 0.3): Makes the model more deterministic. It will consistently pick the highest probability next token. Best for factual QA, data extraction, and technical tasks.
Higher Temperature (e.g., 0.7 - 1.0): Encourages randomness and creativity. Best for poem generation, creative writing, or brainstorming.

🌡️

Recommendation: Use a lower temperature for tasks where accuracy is paramount, and a higher temperature for creative tasks.

2. Top P (Nucleus Sampling)

Top P is an alternative to temperature for controlling determinism. It restricts the model to only consider tokens that make up a certain percentage of the probability mass.

Low Top P (e.g., 0.1): Only considers the most confident tokens.
High Top P (e.g., 0.9): Allows the model to look at a broader range of possible words.

⚠️

Note: It is generally recommended to alter either Temperature or Top P, but not both.

3. Max Length

This setting manages the maximum number of tokens the model can generate.

Purpose: Prevents long-winded or irrelevant responses and helps control API costs.
Units: Measured in tokens (roughly 4 characters per token in English).

4. Stop Sequences

A stop sequence is a specific string (like \n or User:) that tells the model to stop generating any further tokens.

Example: If you want a model to generate a single list item, you might set the stop sequence to 2. to ensure it doesn't start a second item.

5. Penalties (Frequency & Presence)

These parameters help reduce word repetition and encourage diversity.

Frequency Penalty: Penalties are applied to tokens based on how many times they have already appeared in the text. The more a word is used, the harder it becomes for the model to use it again.
Presence Penalty: Applies a flat penalty to all tokens that have appeared at least once. This encourages the model to talk about new topics/words rather than sticking to the same ones.

⚖️

Comparison: Frequency penalty is "incremental" (more usage = higher penalty), while Presence penalty is "boolean" (appeared? yes = penalty).

Conclusion

Before starting with complex prompts, remember that results may vary depending on the specific model version (e.g., gpt-4o vs gpt-3.5-turbo). Finding the "sweet spot" for these settings usually requires a bit of experimentation for your specific use case.

🧠 Prompt Engineering Basics of Prompting