On the four ingredients that made 2022 the AI moment, the interface nobody talks about, and a way of thinking about technological change that you can use for the rest of your career.

A model nobody cared about
In June of 2020, OpenAI released GPT-3. It was, at the time, the largest language model ever built — 175 billion parameters, trained on 45 terabytes of text, capable of writing essays, answering questions, generating code, and producing prose that was, to many readers, indistinguishable from human writing. The technical press covered it with a mix of awe and anxiety. Researchers called it a breakthrough. Sam Altman, OpenAI’s CEO, publicly warned people not to overhype it.
And then, for about two and a half years, almost nobody outside of the AI research community used it.
GPT-3 was available through an API — a programmer’s interface that required you to write code to interact with the model. If you were a developer, you could build applications on top of it. If you were a researcher, you could run experiments with it. If you were a normal person who wanted to ask it a question, you couldn’t. There was no place to type. There was no chat window. There was no “talk to GPT-3” button anywhere on the internet. The most powerful language model in the world was sitting behind a developer console, waiting for someone to build a front door.
On November 30, 2022, OpenAI built the front door. They called it ChatGPT. Within five days, it had a million users. Within two months, it had a hundred million — making it the fastest-growing consumer application in the history of the internet. The technology that had been sitting quietly for two and a half years became, overnight, the most talked-about product on Earth.
Here is the question I want to spend this post answering, because the answer teaches you something that goes far beyond AI: why did that particular tool, in that particular moment, work?
The short answer is that November 30, 2022 wasn’t a single breakthrough. It was a confluence — four ingredients arriving at the same table, finally in the right amounts, at the right time. And none of them, alone, would have been enough.
Ingredient One: The architecture — Attention Is All You Need
The foundational ingredient was a technical breakthrough that happened five years before ChatGPT launched, in a paper that almost nobody outside of machine learning has read.
In 2017, a team of eight researchers at Google — Ashish Vaswani and seven co-authors — published a paper titled “Attention Is All You Need” in the proceedings of the NeurIPS conference. The paper introduced a new neural network architecture called the transformer, and it changed everything.
Before the transformer, the dominant architectures for processing language were recurrent neural networks and their variants (LSTMs, GRUs), which processed text sequentially — one word at a time, left to right, maintaining a running memory of what had come before. This worked, but it was slow, and the running memory degraded over long sequences. The transformer replaced this sequential processing with something called self-attention, a mechanism that allows the model to look at every word in a sequence simultaneously and learn which words are most relevant to which other words, regardless of how far apart they are. The result was dramatically faster training (because you could parallelize the computation across all words at once instead of processing them one at a time) and dramatically better performance on long-range dependencies (because the model could directly attend to a word five hundred tokens back instead of hoping the running memory hadn’t degraded by then).
The transformer is the T in GPT. Without it, none of the models that followed — GPT-2, GPT-3, GPT-4, BERT, PaLM, Claude, Gemini — would have been possible. It is the architectural foundation on which the entire current generation of AI is built. And it was published five years before ChatGPT, in a paper that most people who use ChatGPT every day have never heard of.
Ingredient Two: Scale — the discovery that more is different
The architecture was necessary but not sufficient. The transformer made it possible to build large language models. The second ingredient was the discovery that making them very large produced qualitatively different behavior — not just incremental improvement, but the emergence of capabilities that smaller models simply did not have.
GPT-2, released in 2019, had 1.5 billion parameters. GPT-3, released in 2020, had 175 billion — more than a hundred times larger. The jump was not just quantitative. GPT-3 could do things GPT-2 could not do at all: few-shot learning (performing a new task after being shown just a few examples), zero-shot reasoning (attempting a task it had never been trained on), code generation, and coherent long-form writing. These capabilities were not programmed in. They emerged from scale — from the combination of a larger model, more training data, and more computation.
This phenomenon — capabilities appearing suddenly as models get bigger, rather than improving gradually — is one of the most debated and fascinating findings in modern AI research. The researchers who built these systems did not predict many of the emergent capabilities in advance. They built bigger models, ran them, and discovered that the bigger models could do things nobody had told them to do. Scale, it turned out, was not just “more of the same.” Scale was a phase transition, the way heating water doesn’t just make it hotter water — at some point, it becomes steam. GPT-3 was steam. GPT-2 was hot water. Same substance, different state.
Ingredient Three: The polish — RLHF and the InstructGPT breakthrough
Here is where most popular accounts of the ChatGPT story stop. Architecture plus scale equals ChatGPT. Transformer plus big data equals AI revolution. That framing is not wrong, but it is incomplete, because it leaves out the ingredient that made GPT-3 go from “impressive but unreliable research tool” to “thing your grandmother can use.”
In March 2022 — eight months before ChatGPT launched — OpenAI published a paper describing a model called InstructGPT. The paper introduced a training technique called Reinforcement Learning from Human Feedback, or RLHF. The idea was simple in concept: after training the model on text data (the standard approach), you add a second phase where human evaluators rate the model’s outputs for quality, helpfulness, and safety. Those ratings are used to train a separate “reward model,” which is then used to fine-tune the original model’s behavior through reinforcement learning. The model learns, in effect, what humans consider a good response versus a bad one, and it adjusts its behavior accordingly.
The difference was measurable and dramatic. InstructGPT, despite being much smaller than GPT-3 (1.3 billion parameters versus 175 billion), was preferred by human evaluators over GPT-3 in head-to-head comparisons. A model one-hundredth the size was producing outputs that humans rated as better, because the RLHF training had aligned the model’s behavior with what humans actually wanted — clear answers, helpful explanations, honest caveats — rather than what was statistically likely in the training data.
RLHF is the polish layer. Without it, you have a brilliant but erratic model that sometimes produces genius and sometimes produces nonsense with equal confidence. With it, you have a model that behaves like a reasonably helpful, reasonably cautious conversational partner. The raw capability came from the architecture and the scale. The usability came from the polish. And usability, as we are about to see, turned out to be the ingredient that mattered most.
Ingredient Four: The interface — and this is the one nobody talks about
I want to make a claim that I think is underappreciated in most analysis of the 2022 AI moment, and I want to make it as clearly as I can, because it has implications far beyond this one product launch.
The interface was the breakthrough.
Not the model. Not the scale. Not the RLHF polish. The interface. The chat window. The simple, empty text box with a cursor blinking in it, waiting for you to type a question in plain English and get an answer back in plain English. That was the thing that changed everything.
GPT-3 had existed for two and a half years. It had 175 billion parameters. It could write essays and generate code and hold conversations. And almost nobody used it, because the only way to interact with it was through an API — a programmer’s tool, designed for programmers, accessible only to people who could write code. The most powerful language model in the world was locked behind a developer console, and the lock was not technical. It was experiential. There was no way for a normal human being to sit down and talk to the thing.
ChatGPT changed one thing: it gave the model a face. A chat window. A conversational interface that reframed the interaction from “submit a prompt to a language model and parse the JSON response” to “ask a question and get an answer.” The underlying technology was GPT-3.5 — an improvement over GPT-3, but not a revolutionary leap. The revolutionary leap was the interface, which took a research tool and turned it into a social experience.
One observer put it perfectly: ChatGPT shifted the user’s relationship to the model from “a piece of writing for the model to finish” to “a question calling for an answer.” That is not a technical change. That is a design change. It is a change in how the human being sitting at the keyboard understands what they are doing. And that change — the reframing — is what produced a hundred million users in two months.
This is the ingredient most analyses leave out. They give you three: transformer architecture, scale, RLHF. And those three are necessary. But without the fourth — the interface that made the technology legible to a non-technical human being — GPT-3 would still be sitting behind its API, used by developers, unknown to the public, waiting for someone to build the front door.
The front door was the breakthrough.
The wave that proves the confluence
If 2022 had only produced ChatGPT, you could argue that the moment was about one product, one company, one good decision about interface design. But 2022 didn’t produce just ChatGPT. In an eight-month window, it produced an entire wave of generative AI tools that hit the public simultaneously:
DALL-E 2 was announced in April 2022 and opened to everyone on September 28. Midjourney entered open beta on July 12, 2022. Stable Diffusion was publicly released in August 2022. ChatGPT launched on November 30, 2022.
Four major generative AI products, from different companies, using different architectures, applied to different media (images and text), all arriving in the same eight-month window. That is not a coincidence. That is a systemic convergence — a moment when multiple ingredients that had been developing independently for years reached a threshold simultaneously.
The transformer architecture (2017) had matured enough to support both language and image generation at scale. The scale of available training data and computation had crossed a critical threshold. The alignment and fine-tuning techniques (RLHF for language, classifier-free guidance for images) had gotten good enough to produce usable outputs. And the interfaces — Discord bots for Midjourney, web apps for DALL-E and Stable Diffusion, a chat window for ChatGPT — had finally made the technology accessible to non-technical users.
No single ingredient caused the wave. The wave was the confluence of all of them.

Confluence thinking: a lens you can use for the rest of your career
Here is the part of this post I want you to carry forward long after the specific details of transformers and RLHF have been superseded by whatever comes next.
The 2022 AI moment was not a single breakthrough. It was a confluence — a moment when multiple independent ingredients, each developing on its own timeline, converged in a way that produced something none of them could have produced alone. The transformer was necessary. Scale was necessary. RLHF was necessary. The interface was necessary. Remove any one of them and the moment doesn’t happen. The model without the interface sits unused. The interface without the model has nothing to offer. The model without RLHF is too erratic to trust. The RLHF without the scale has nothing to polish.
This pattern — big shifts come from multiple ingredients aligning, not single breakthroughs — is not unique to AI. It is how almost every major technological change in history actually happened, if you look closely enough.
The printing press was a confluence: movable type (which had existed in China centuries earlier), oil-based ink (which Gutenberg adapted from painting), the wine press (which he repurposed as the mechanical frame), and affordable paper (which had recently become available in Europe). Remove any one of those ingredients and Gutenberg’s press doesn’t work, or doesn’t scale, or doesn’t transform European civilization the way it did.
The iPhone was a confluence: capacitive multi-touch screens, miniaturized processors, mobile broadband networks, and a software ecosystem (the App Store) that let third-party developers build on top of it. Remove any one of those and you get a different, lesser product — a PDA, a fancy phone, a media player, but not the thing that remade the world.
The pattern repeats across every domain of human invention. The breakthrough that looks, in retrospect, like a single moment is almost always, on closer inspection, a confluence of ingredients that were developing independently and converged at a specific point in time.
Confluence thinking is the habit of looking at a new technology — or any major shift — and asking: what are the ingredients? How many are there? Which ones are mature and which ones are still developing? And what happens when the last one crosses the threshold? This is a thinking tool, not a technical skill. It works whether you’re analyzing AI, evaluating a startup, reading a history book, or planning your own career. The person who can identify the ingredients of a coming confluence before the confluence happens is the person who is standing in the right place when the wave arrives.
What I want you to take with you
The next time somebody tells you that a single product or a single company or a single genius “invented” something that changed the world, be polite, and then look for the ingredients. The world almost never changes because of one thing. It changes because of four things, or five, or six, arriving at the same table at the same time, each one necessary and none of them sufficient.
GPT-3 sat for two and a half years before the world noticed. The model was ready. The scale was there. The polish was coming. But the front door hadn’t been built yet. When somebody finally built the front door — a simple chat window, a blinking cursor, a place to type a question in plain English — a hundred million people walked through it in two months.
The model was the engine. The interface was the door. The door was the breakthrough.
Remember that. Not just for AI. For everything you will ever build. The most powerful engine in the world is useless if nobody can find the door.
Build the door.
Sources and further reading
On the transformer architecture: Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017), “Attention Is All You Need,” Advances in Neural Information Processing Systems (NeurIPS 2017). This paper introduced the transformer architecture and has been cited tens of thousands of times. It is the foundational technical document for the entire current generation of large language models.
On GPT-3 and the discovery of emergent capabilities at scale: Brown, T. B., et al. (2020), “Language Models are Few-Shot Learners,” Advances in Neural Information Processing Systems (NeurIPS 2020). This is the GPT-3 paper, documenting the 175-billion-parameter model and its few-shot and zero-shot learning capabilities. GPT-3 was released via API in June 2020 and licensed exclusively to Microsoft in September 2020.
On InstructGPT and RLHF: Ouyang, L., et al. (2022), “Training language models to follow instructions with human feedback,” arXiv preprint arXiv:2203.02155 (later published in NeurIPS 2022). This paper describes the InstructGPT model and the RLHF technique that became the polish layer for ChatGPT. The finding that a 1.3B-parameter InstructGPT model was preferred by human evaluators over the 175B-parameter GPT-3 is one of the most striking results in the alignment literature.
On ChatGPT’s launch and adoption: ChatGPT was released on November 30, 2022, built on GPT-3.5 (a fine-tuned variant of GPT-3). It reached 1 million users in five days and 100 million users in approximately two months, making it the fastest-growing consumer application in internet history at the time.
On the 2022 generative AI wave timeline: DALL-E 2 was announced April 6, 2022, entered beta in July, and opened to the public September 28, 2022. Midjourney entered open beta July 12, 2022. Stable Diffusion was publicly released August 2022. ChatGPT launched November 30, 2022. All four products arrived within an eight-month window, from different companies, using different architectures, applied to different media.
On the interface-as-breakthrough framing: The observation that ChatGPT shifted the user’s relationship from “a piece of writing for the model to finish” to “a question calling for an answer” draws on analysis from multiple sources covering the launch, including coverage in The Verge, The New York Times, and the essay “What Was ChatGPT?” (2025) at cyberneticforests.com.
On confluence as a pattern in technological change: The concept of technological convergence is discussed across the innovation literature. For the printing press example: Eisenstein, E. L. (1979), The Printing Press as an Agent of Change, Cambridge University Press. For the general principle that major innovations are typically confluences of multiple independent advances: Arthur, W. B. (2009), The Nature of Technology: What It Is and How It Evolves, Free Press.
Note to readers: the four-ingredient framework presented here — transformer architecture, scale, RLHF, and the chat interface — synthesizes concepts that exist independently in the technical literature. The contribution of this post is the specific framing, weighting, and connection of those ingredients, particularly the elevation of the interface as a co-equal ingredient alongside the technical components. The concept of “confluence thinking” as a transferable analytical lens is, to my knowledge, original to this series. Verify the primary sources yourself before quoting.











