- One AI Thing
- Posts
- 📝 Transformer Changed Everything
📝 Transformer Changed Everything
The $1.5 billion research paper that launched it all
Hello, and thanks for reading One AI Thing. Understand artificial intelligence, one thing at a time.
👀 Today’s Thing: Transformer Changed Everything
🤖 Ever wonder what the G P T in ChatGPT and GPT-4 stands for? Generative pre-trained transformer. Generative as in can generate human-like content. Pre-trained as in trained on a ton of text. Transformer as in an artificial neural network based on the transformer architecture. While generative pre-training had already been a thing for awhile in machine learning circles, the invention of transformers in 2017 paved the way for large language models (LLMs) and other machine learning advances, primarily in natural language processing (NLP) and computer vision (CV). The current Gen AI craze is, in many ways, a direct result of the invention of the transformer.
🎧 Get an introduction to the AI boom and the key concepts behind it in my podcast episode with Will Ramey, Sr. Director and Global Head of Developer Programs at Nvidia.
📖 Backstory
☞ Transformer was introduced in the … iconic? seminal? highly influential? industry-upending-five-years-later??? … 2017 research paper, Attention Is All You Need. The paper was authored by researchers from Google Brain and Google Research. One of them was an intern.
☞ The paper, presented at the 2017 NeurIPS conference, introduced the transformer model as a new way of tackling NLP tasks. Unlike recurrent neural networks (RNNs), the then-state of the art in NLP processing, the transformer relied solely on something called an attention mechanism for improved performance. While RNNs can use attention mechanisms to boost their own performance, transformers gain a leg up by avoiding recursion, which opens up the benefits of parallel computation (faster training time) while avoiding performance drop offs due to long dependencies.
☞ An August 2017 Google Research blog post, written by Jakob Uszkoreit (one of Attention’s co-authors), introduced the paper, saying transformers seemed “particularly well suited for language understanding”:
On top of higher translation quality, the Transformer requires less computation to train and is a much better fit for modern machine learning hardware, speeding up training by up to an order of magnitude.
🔑 Keys to Understanding
🥇 Standard deep learning frameworks including TensorFlow and PyTorch use the transformer model, as do many well-known language models, as mentioned above. Stanford researchers called transformers foundation models in an August 2021 paper, writing, “The sheer scale and scope of foundation models over the last few years have stretched our imagination of what is possible.”
🥈 Six of Attention’s eight authors are now founders of tech startups who’ve raised more than $1.5 billion in funding between them: Ashish Vaswani and Niki Parmar (Adept AI), Noam Shazeer (Character.ai), Jakob Uszkoreit (Inceptive Nucleics), Aidan Gomez (Cohere), and Illia Polosukhin (NEAR Protocol).
🥉 If my clunky summaries of transformers, RNNs, and attention left you wanting for a more comprehensive understanding of how the models work and why transformers were such a leap forward, try:
This Nvidia explainer,
This Packt paper summary, and
This Towards Data Science deep dive.
🕵️ Need More?
Searching for a certain kind of AI thing? Reply to this email and let me know what you'd like to see more of.
Until the next thing,
- Noah
p.s. Want to sign up for the One AI Thing newsletter or share it with a friend? You can find me here.