Prompt Engineering and LLM - Improvement Guide

Introduction

Despite its seemingly endless complexity, the topic of LLM prompt engineering is far more crucial than ever before, influencing multiple different industries across the planet. The very concept of an LLM is designed to improve content generation, simplify and personalize individual customer experiences, and more.

At the same time, LLM as a concept is not perfect, and there are quite a few challenges and shortcomings that can be faced when interacting with an LLM as a regular customer. This article tries to explain the topic in question using various details and examples.

The Definition of a Prompt and an LLM

It is not particularly difficult to explain prompts as a concept outside of the AI-related context – they are specific commands that showcase what needs to be done. It works similarly with LLMs – prompts are used to request specific information or actions from LLMs.

A prompt can take many different forms, including examples, questions, instructions, and even context in the form of additional information. Some of the latest models of LLM can now even include images as part of their prompts – one of the most famous examples is GPT-4.

The overall quality of a prompt’s result varies greatly depending on the prompt itself. A simple text prompt such as “What is the capital of Ireland?” will receive a short and not exceptionally detailed response as a result. On the contrary, it is also possible to create and refine very specific and detailed prompts for complicated use cases – different text styles, tones, or even different answer lengths.

What is a Prompt Category?

Prompts themselves tend to vary greatly depending on what is needed from a model. As such, it is a lot easier to try and separate prompts into major categories, such as:

Opinion-seeking prompts. “What would happen if we could clone other people?”
Role-based prompts. “As an English teacher, create a short list of topics that a student needs to cover to learn the basics of the language”
Information-seeking prompts. “What is the capital of Ireland?”
Comparative prompts. “What are the benefits and shortcomings of iOS compared to Android?”
Instruction-based prompts. “Set a timer for fifteen minutes”
Reflective prompts. Uses multiple prompts on the same topic as a means of making the answer more accurate and to the point.
Context-providing prompts. “I want to learn more information about yoga as an activity, what can I use as a resource at first?”

These prompts should be relatively self-explanatory and not particularly difficult to understand. Examples of each prompt category are included to ensure a better understanding of each category. Some of these categories do not apply to LLMs, such as the Instruction-based category since Large Language Models cannot perform actions on the same level as various assistants such as Alexa or Siri.

What is an LLM and its Architecture?

LLM can be described as a Large Language Model; it can perform various NLP-related tasks (Natural Language Processing) such as language generation, classification, etc. Every single LLM has to learn from an objectively massive data set before being able to respond to queries and prompts, it uses a complex training process to learn a variety of information from text information.

A combination of LLM’s design principles and the model's overall underlying structure is commonly referred to as LLM’s architecture. An average LLM architecture consists of multiple “layers”, with each layer performing its share of essential tasks (in this context, a model used in ChatGPT prompt engineering is used as an example):

Input encoding. A sequence of tokens representing the text is transformed in various numerical embeddings with the original request's information and semantic meaning.
Transformer. Multiple transformer layers are used to analyze and transform the data above, relying on the feed-forward network to analyze attention outputs.
Self-attention. The self-attention mechanism analyzes the weight of different words in the original sequence to note potential dependencies between separate words, this is what makes LLM capable of understanding both the meaning and context of phrases.
Context. The aforementioned embeddings are modified in accordance with the context as they go through all the transformer layers.
Decoding with generation. After the request is processed and transformed, LLM begins to generate the answer based on its vocabulary and various probability distributions, with each word being influenced by all the previously generated words.
Training. The ability to predict every single subsequent word relies a lot on the information the LLM “trains” upon, making it possible for the model to capture parameters such as semantic relationship, grammar, patterns, etc.

The Basics of Prompt Engineering

The definition of prompt engineering stems directly from the definition of a prompt we have mentioned above. Prompt engineering is a process of refining the overall communication with an LLM.

A direct dependency between a response’s quality and a prompt’s quality resulted in the explosive popularity of prompt engineering as a profession within the AI market. The main goal of a prompt engineer is to analyze and refine existing prompts to ensure detailed and thorough responses for all kinds of requests.

Prompt engineering has dramatically changed the overall user experience with LLMs, even though the first public version of such a model only appeared in 2020. Most initial LLM versions, such as GPT-3, required specific and detailed prompts to receive quality information.

These LLMs are mostly outdated by now, and newer versions of those are much better suited for handling both simple and complex requests (most of it can be attributed to two factors – the work of prompt engineers and the ability of LLMs to “hold” a lot more context than ever before).

Three main factors contribute heavily to the overall quality of an LLM response to this day:

Using both negative and positive prompting to frame the original request to be more specific.
Using concise and detailed language for prompts, in general, to avoid unnecessary confusion on the AI side.
Using goals and roles to further hone the original request and make it more accurate, both of these are connected since roles are assigned to LLMs to change the “perspective” of the response, while goals are used to highlight the exact purpose of a prompt.

The process of prompt refining is continuous and necessary to achieve the best possible result for the end user. Some models can also adjust their parameters manually to manipulate the response in some way.

Common Strategies for LLM Prompt Engineering

LLMs, as a whole, remain a very fluid and ever-evolving industry, constantly introducing all kinds of strategies, capabilities, and so on. Some of the potential options also change depending on the actual model the end user is working with. However, there are several basic options in terms of LLM prompt engineering strategies that should be similar to most examples on the market:

Chain-of-Thought Prompting. Remains one of the most popular strategies for LLM prompt engineering. It relies on attempting to encourage critical thinking to achieve the necessary result.
Iterative Prompting. It can be considered a variation of the Chain-of-Thought prompting, although it differs quite a lot. This particular strategy uses additional questions based on the original response to improve it, make it more accurate, and so on.
Self-Criticism Prompting. It acts as a relatively specific variation of the previous two strategies. The strategy revolves solely around forcing the LLM to criticize its original responses for better results.
Zero-Shot Prompting. This is one of the most basic strategies for ChatGPT prompt engineering (also applicable to most other LLM versions). Implies a single question or request with no context or additional information attached to it.
One-Shot Prompting. A continuation of the Zero-Shot methodology introduces a single example to the original prompt to hone the accuracy of the response.
Few-Shot Prompting. Another strategy based on the Zero-Shot methodology uses multiple parameters as examples for the original prompt.

How Prompt Engineering works with LLMs

The introduction of prompt engineering brought an entirely new way of interacting with language models as a whole. The original approach to the same topic was referred to as Machine Learning Engineering consists of three different phases, as shown below:

Some of the most problematic aspects of this approach included:

Manual data annotations were used since data labeling could not be performed automatically.
Inconsistent performance predictions prevented engineers and data scientists from assuming how much data is necessary to improve the accuracy of the language model.
A large amount of data is necessary to receive any substantial result.

Alternatively, Prompt Engineering operates in a slightly different fashion, which both looks and operates in a much more simplistic manner in comparison, as shown below:

Despite the overall simplicity of the model, there are still three different phases to this process:

Evaluation dataset construction using a relatively small data amount.
The dataset in question is used to evaluate the newly-created prompts and their interactions with one ore several models.
Prompt deployment to production with the subsequent tracking of their performance using a combination of feedback and user input.

While prompt engineering itself still has to rely on feedback and prompt improvement to operate as a whole, there is no need for massive datasets from the get-go, which greatly reduces the need for human involvement. The overall performance of prompt engineering is also better on average compared with older models due to the fact that LLMs are extremely powerful in their own right. The speed of such models is also surprisingly high, as well.

The ability of prompt engineering to work with context allows for LLM to receive new knowledge while also leveraging and improving its own capabilities in following instructions, leveraging logical reasoning, and so on. Other advantages of introducing context-relevant information in prompts for LLMs are better security, lower training costs, etc.

Best Practices for LLM Prompt Engineering

Prompt engineering in the context of LLM fine-tuning can be surprisingly tricky, considering how vast and varied the knowledge of these models can be. It is also not as easy as it might seem to make the LLM perform an output that satisfies the user's needs, which is why most people cannot receive the responses they want from an LLM.

Experimentation is one of the critical parameters here, and several other methodologies can also be used to improve the response quality across the board:

The usage of leading words is highly advised; structures such as “step-by-step”, “first”, and “think about this logically” have proven themselves to be far more accurate in the overall selection of prompt results than regular prompts.
Examples are an extremely effective alternative when it comes to creating more accurate prompt responses, which is why strategies such as self-criticism and chain-of-thought are usually far more accurate than the zero-shot strategy.
Trying to rely on positive reinforcement at first is far better than starting with a negative prompt from the get-go; it dramatically increases the chances of an answer being accurate and to the point instead of vague and unclear.
A careful approach to prompt structuring can save plenty of time for most users. Question marks, line breaks, and bullet points are just a few examples of how LLM prompts can be accentuated to receive better answers to prompts.

Plenty of other pieces of advice could be made here, including prioritizing important information, breaking down queries into smaller, more manageable chunks, and so on. The main takeaway from all this is to try and understand that LLM prompt engineering is not magic and it may take some time to receive the best possible response for a specific question or topic.

LLM Applications and Use Cases

ChatGPT prompt engineering may be one of the most well-known variations of the process as a whole, but there are plenty of other LLMs available on the market, and most of them are capable of performing multiple different actions and processes:

“Summarize the content”
“Translate”
“Understand the context”
“Create personalized recommendations”
“Complete the text”
“Answer the question”

Most of these examples are relatively self-explanatory, but we can offer a few examples to ensure a complete understanding of the subject.

“Answer the question”

Answering the question is different from telling LLM to report something to you. This particular request may or may not have additional context to it, such as:

Prompt:
Output:

“Summarize the content”

Unlike its original meaning, the content does not have to be presented by the user to receive a topic summary. In fact, this is where most of the Q – A prompts belong to, including sequences such as:

Prompt: “Explain bubblegum”
Output: “Bubblegum can refer to two things: 1. The Chewing Gum: This is the most common meaning [...] it's designed to be stretched into bubbles by blowing air into it. 2. Pop Culture: Sometimes "bubblegum" is used to describe things aimed at a young audience [...] movies or tv shows with a simple and fun story.”

Limitations of AI as a Concept

One significant disadvantage of AI models regarding prompt engineering is a significant limitation on the number of symbols per request. Luckily, this kind of issue has already been circumvented to a certain degree by more prominent and more complex models such as GPT-4, which is why ChatGPT prompt engineering is now at its most effective – since it can also “remember” far more context than ever before, as well.

The issue of potential bias also exists to this day, as well. Since all LLM models are trained using large amounts of human-generated data, if the humans in question are biased in some way – then the model learned from said data will also be just as biased with its results and answers.

Another issue is the primary reason why most LLM models require some form of human oversight regularly – because an improperly configured model may perform some tasks in a completely different manner compared with what was expected, going overboard or not doing enough in the end.

AI solutions still struggle with complex human-oriented tasks, as well. The biggest reason for this is that they cannot “feel” the same way humans do, which makes it possible for various false predictions and decisions to be made without proper supervision.

Conclusion

Despite its highly fluid and ever-evolving state, LLM remains a very young and underdeveloped topic in the modern IT environment. It is also considered extremely new in its own AI sphere, which is a lot older as a whole. At the same time, this particular state of the LLM prompt engineering as a technology makes it so powerful in the right hands.

Three key aspects that can be gathered from our article are experimentation, context, and understanding. Experimenting with different ChatGPT prompt engineering queries helps identify the best ones for a specific situation. Providing context to any LLM makes its response on the subject more accurate and specific. Understanding how LLM operates and how it is different from a human conversation is a great way to make the overall quality of interaction with LLMs higher than ever before.

Prompt engineering in all of its different forms offered plenty of new possibilities for NLP applications – including virtual assistance, content generation, translation, and much more. The ongoing research on the subject will continue improving existing models to improve the overall results for end users – and LLM prompt engineering will most likely be one of the most significant contributors to that.