Key Takeaways
- ChatGPT is a transformer model for chat-based interactions.
- GPT is pretrained on vast internet data for generating responses.
- You can use ChatGPT for context-rich prompts, writing styles, and creative brainstorming.
Some people use ChatGPT like Google, while others expect it to perform like a support rep or digital assistant. However, ChatGPT is neither a search engine nor a human, resulting in disappointing answers. Understanding how the AI tool works can help you get better responses out of it.
Understanding the GPT Model Powering ChatGPT
There’s a belief that knowing someone’s (or something’s) name can give you power over them, and I’m surprised at how much it holds true for ChatGPT. The moment I understood what ChatGPT stood for, I gained a deeper insight into the tool’s inner workings, which in turn allowed me to use it better.
For reference, ChatGPT is short for Chat Generative Pre-Trained Transformer. This means it’s basically an application of the GPT (Generative Pre-trained Transformer) architecture tailored for conversational, (i.e., chat-based) interactions. Now, here’s a quick breakdown of what each of those words means so you can get a better understanding.
What Is The “Transformer” in GPT?
The “Transformer” model is a neural network architecture used for natural language processing. It’s a revolutionary tool in the AI space which forms the backbone of ChatGPT along with all the popular AI chatbots we have today.
The basic concept behind transformer models is to analyze a wide range of inputs all at once (i.e. in parallel) to determine which elements are the most relevant. This is in stark contrast to previous models, which analyzed text sequentially, making it harder to find more important or relevant parts of a passage.
This process of analyzing the text as a whole is called “self-attention” and was first conceptualized by a group of AI researchers at Google Brain in their 2017 paper “Attention Is All You Need.”
Using “self-attention”, Transformer models can look at an entire sequence of words simultaneously. This allows it to understand context and relationships between words much more effectively. As a result, transformer models can capture context from human-provided input and then generate a contextually relevant output that actually makes sense.
How Is GPT “Pretrained”?
GPT is “Pretrained” on a massive dataset—almost the entirety of the open internet. By processing this vast amount of text, GPT learns patterns, relationships, and structures in language. It also learns about various facts and information, which may or may not be true, but these get stored in its dataset, which it can use when generating a response.
The presence of so much misinformation (expected from internet data) in its training dataset is one of the main reasons why ChatGPT can output incorrect information when you ask it a question—a phenomenon called hallucination.
To mitigate this issue, human annotators and data curators come into the picture. Their job is to filter the training data and remove low-quality or inappropriate content. Additionally, techniques like Reinforcement Learning from Human Feedback (RLHF) are used to fine-tune the model, enhancing its factual accuracy and alignment with user expectations.
Thanks to this curated training approach, GPT models often consistently provide accurate information for specific queries, even though the internet contains plenty of misinformation.
The model learns to distinguish reliable information based on patterns and consistency across its training data. If a piece of information appears frequently and consistently from reputable sources, the model is more likely to consider it reliable.
How does GPT “Generate” Content?
By now, you should have a solid understanding of how ChatGPT knows what it knows—its pretraining, and how it comprehends your input text—the transformer model. The last piece of the puzzle is to understand how it’s able to generate contextually relevant responses!
While ChatGPT uses self-attention to process the input all at once, it generates responses one word at a time. Each word is predicted based on the context of the previous words and your input—a process called autoregressive generation.
Imagine you’re playing a word game where you look at the words that came before and guess what word should come next. You keep on guessing the next word one at a time until you finish a sentence and, eventually, the entire thought! That’s basically how GPT generates a response—only it does this at superhuman speed and has a vast “vocabulary” at its disposal.
For example, let’s say you ask ChatGPT: “What is the capital of France?”
It will then generate a response one word at a time like this:
The
The capital
The capital of
The capital of France
The capital of France is
The capital of France is Paris.
The reason why it was able to accurately say “Paris” instead of, let’s say, “New York” is due to a combination of its extensive pretraining on diverse and vast datasets and the Human Reinforcement Learning it underwent. This training helps the model understand factual information and prioritize accurate responses.
So, as you can see, GPT does not pull pre-written answers from a database when you ask a question. Instead, it generates new text each time you ask it a question based on its training data.
How Did Learning This Help Me Better Use ChatGPT?
Understanding the inner workings of ChatGPT has significantly improved my ability to leverage this powerful tool. Here are some key insights and tips that directly stem from knowing how ChatGPT functions:
I Can Leverage Its Contextual Understanding
One of the best parts of the GPT model is that it excels at deriving context from your input. Unlike Google searches where brevity is ideal, with ChatGPT, you’re better off entering verbose prompts. Even if you end up including redundant information, you can rest assured that the tool is smart enough to look past the unnecessary bits and zone in on important parts to provide a more thorough and useful answer.
Moreover, you also have the opportunity to ask follow-up questions. This means if you didn’t like a particular response because of its formatting or lack of information, you can cite that in your next input. ChatGPT will now look at all the previous chat history, which includes all your inputs and its past generations, to create a better response that’s more in line with your expectations.
For example, instead of asking:
Tell me about renewable energy.
Try the prompt:
I'm interested in renewable energy sources, particularly solar and wind power. Can you provide detailed information on how these technologies work, their benefits, and their impact on the environment?
I Utilize Its Vast Pretraining
GPT is pre-trained on a huge collection of data. This means it’s aware of various writing styles, tones, and voices. You can potentially utilize this to augment your own writing and give it that extra style and panache. For example, I can write a boring old email and then ask ChatGPT to make it more engaging or professional or even make it read like it was written by Seth Godin.
The vast training set also means ChatGPT is aware of a lot of different topics and concepts. You can use this to your advantage to do some topical exploring or brainstorming. I mean, what is creativity but finding a connection between topic A and topic B? You can use ChatGPT to rapidly cycle through different permutations and combinations of related topics and stumble into new novel ideas. It’s a great way to get your brain’s juices flowing, especially when you’re stuck in a creative slump.
I Use It For Fiction, Not Facts
While, thanks to RLHF, ChatGPT does generate more accurate answers than it used to, it’s still not immune to hallucinations and can display misinformation with all the confidence in the world. Now, it’s easy to catch these hallucinations if you’re well-versed in the topic you’re exploring with ChatGPT. However, if you’re exploring stuff you know little about, then it’s hard to trust ChatGPT, especially once you know how ChatGPT works.
This is why I never use ChatGPT to get factual answers. Instead, I use it to explore topics I am already knowledgeable about and express them in a more accessible language. For example, I might write down, in my own words, how to solve differential equations and then use ChatGPT to make my method easier to understand for a 10-year-old.
By keeping these tips in mind and understanding the underlying mechanisms of ChatGPT, you can craft more effective prompts, engage in more productive conversations, and ultimately extract more value from this powerful AI tool.