How ChatGPT works: a guide for all levels of curiosity

ChatGPT has revolutionized AI, becoming a household name. This guide unravels how it works — from basic concepts to training neural networks, and the intricate transformer model behind its human-like responses.

Stay on top of the latest tech trends & AI news with Le Wagon’s newsletter

This article is written by Andrei Danila, a Machine Learning Engineer, who explains the inner workings of ChatGPT across three levels of complexity.

Since 2022, ChatGPT has transformed the tech world and brought generative artificial intelligence (AI) into the mainstream. However, its inner workings are still a mystery to most people, so in this article, I will explain it to you on three different levels of complexity. By the end, you should have a better understanding of this groundbreaking technology. Hey, you might as well start calling yourself a machine learning expert!

Level 1: The basics

Imagine ChatGPT as a super-smart “human” who has read an immense amount of text. This “human” has a fantastic memory and can quickly come up with answers to almost anything you ask. In reality, this “human” is a neural network—a type of complex computer model that learns from experience. ChatGPT, for instance, has been trained on vast amounts of text data from the internet, covering everything from cooking recipes and history lessons to complex scientific articles and popular fiction. All of this knowledge allows it to generate responses based on the patterns it learned from that data.

By combining this understanding with its memory-like structure, ChatGPT can predict what to say next based on context, even if it has never seen a specific question before. It’s like a highly advanced version of autocomplete on your phone, but way more sophisticated. The model takes your input, processes it based on its training, and generates a response that best matches what a human might say. The result? It can handle conversations, answer questions, and even assist in creative writing—just like a friendly, chatty companion who has read the entire internet.

Level 2: Training a neural network

To make a neural network like ChatGPT capable of understanding and generating human-like text, it first needs to be trained. Training involves showing the model lots of examples and getting it to predict what comes next—something we call minimizing a loss function. This loss function essentially tells the model how “wrong” its prediction is, allowing it to learn by adjusting its parameters. Imagine giving the model pairs of sentences—one as the input (source) and the other as the expected output (target). The model keeps tweaking itself until the predicted output matches the target as closely as possible.

Training ChatGPT isn’t a quick process—it requires enormous amounts of computing power and diverse data. The model is trained on a vast range of text sources, helping it learn the nuances of human language. This training happens in massive data centers filled with powerful graphics processing units (GPUs), all working day and night to process this diverse data and refine the model’s internal parameters. It takes weeks or even months to train a model like this, and the costs are astronomical—not just for the cutting-edge hardware but also the electricity needed to keep everything running. Just think about the electricity bill! However, all of this effort allows the model to generalize well, generating coherent answers to questions it has never explicitly seen before by relying on the patterns it learned during training.

Level 3: Inside the transformer

The real magic behind ChatGPT is something called a “transformer” model, introduced in a groundbreaking research paper titled Attention Is All You Need authored by Vaswani et al. Transformers use a mechanism called “attention,” which allows the model to focus on the most relevant parts of the input when making a prediction. You can think of attention as a way of calculating the similarity between words in a sentence, allowing the model to understand which words are related and how they connect. This gives ChatGPT a deep understanding of context—like how “apple” could mean either a fruit or a tech company, depending on the other words around it.

At its core, the transformer model assigns probabilities to all possible next words in a given context. Essentially, it scans the vocabulary and ranks each word by likelihood, then picks the word with the highest probability as the next one to use. This is why some people call it a “stochastic parrot”—it doesn’t truly “understand” in the human sense but rather assembles text in a statistically probable way based on its training. Despite its limitations, this approach allows ChatGPT to create remarkably coherent and human-like responses, making it feel as though you’re chatting with someone who knows a bit about everything.

Conclusion

From the basics of what ChatGPT does to the intricacies of transformer models, there’s a lot going on behind the scenes to make this AI function the way it does. At its heart, it’s a blend of sophisticated neural networks, vast amounts of data, and smart training techniques. Even though it’s not truly intelligent, it does an incredible job of mimicking human-like conversation through an elaborate mix of math and pattern recognition. Now, armed with this understanding, you’re ready to dive deeper into the world of AI—or at least impress your friends at your next dinner party with a cool explanation of how ChatGPT works.

But what about the future? Is AI just a passing fad, or is it here to stay? It’s hard to say for certain, but its impact seems to be growing. In the next few years, we might see more advanced AI systems becoming part of our daily lives in unexpected ways. Whether these changes will be revolutionary or just incremental improvements is something we’ll have to wait and see.