Machines Behind AI - CPU, GPU, TPU

A short read to understand the hardware and its history

Apr 02, 2025

Brains Behind AI: The Silent Heroes Making AI Work

We often talk about how smart AI has become, but rarely do we pause to ask: what gives AI its power? Is it just clever coding and algorithms? Not quite. Behind the scenes, there’s an entire world of hardware—chips, circuits, and processors—working silently to make the magic happen.

🎧 Want to Listen Instead?

Click the play button below to hear this article narrated aloud, perfect for your walk, commute, or chai break!

1×

0:00

-12:15

In this article, we’ll break down AI hardware in the simplest way possible—using real-life examples, analogies, and zero jargon. By the end of this journey, you'll know exactly what powers the smart devices around you.

🧩 1. Why AI Needs Special Hardware

AI is like a really smart brain—but for it to think fast and smart, it needs a strong body. That strong body comes in the form of specialized hardware—chips and components that help AI learn, make decisions, and act quickly.

Regular computers were designed for spreadsheets and emails. But AI? It deals with thousands of images, videos, and words—all at once. That’s a job for a different kind of machine.

🕰️ 2. From CPUs to GPUs: The Evolution of AI Hardware Power

🧠 CPUs: The Generalists

The CPU (Central Processing Unit) is like the manager of your computer—it handles a wide range of tasks like opening apps, browsing the web, or playing music. CPUs are great at sequential processing, which means they do one thing at a time, but they do it really well and very precisely.

A modern CPU might have 4 to 16 powerful cores, each capable of handling complex logic. This works perfectly for general-purpose computing tasks that don’t need a lot of repetition or simultaneous actions.

But when artificial intelligence, especially deep learning, entered the scene—it broke the mold.

🚧 The Problem with CPUs for AI

AI tasks—especially training Large Language Models (LLMs) like GPT—are incredibly data-heavy. They involve:

Processing millions of inputs (like words, images, or data points)
Performing billions of calculations (like matrix multiplications)
Doing it again and again during training cycles

CPUs simply couldn’t keep up. They weren’t built to handle so many repetitive, parallel computations at once. As a result, training a large AI model on CPUs would take weeks or even months, with enormous energy consumption and inefficiency.

🎮 Enter GPUs: The Accidental AI Hero

Originally designed to power video games, Graphics Processing Units (GPUs) were made to render images and graphics in real-time. Video games require rendering thousands of pixels, textures, and lighting effects simultaneously—which means GPUs were already good at parallel processing.

A GPU doesn’t have just a few powerful cores like a CPU. Instead, it has hundreds or even thousands of smaller cores. While each core is not very powerful on its own, together they can tackle thousands of tasks at the same time.

This turned out to be perfect for AI.

🤯 Why GPUs Work So Well for AI and LLMs

Training an AI model—especially a Large Language Model—means running millions of matrix operations, such as multiplying huge grids of numbers. These operations are:

Predictable
Repetitive
Easily divisible into smaller tasks

In computer science, this is called being “embarrassingly parallel”—a task that’s so parallel-friendly, it would almost be embarrassing to do it sequentially!

🔍 Example: Imagine training an AI to understand language. It reads thousands of sentences at once, processes them into vectors, runs them through dozens of layers in a neural network, and does this millions of times. Instead of doing this one sentence at a time (like a CPU would), a GPU can process hundreds of sentences in parallel across its cores.

This made GPUs the default workhorse for training AI models, especially deep learning models like convolutional neural networks (for image recognition) and transformers (used in LLMs like ChatGPT).

💡 Real-World Impact:

What used to take weeks on CPUs could now be done in days or even hours on a high-end GPU cluster.
GPUs enabled researchers to experiment faster, train larger models, and push the boundaries of what AI could do.

🚀 3. Beyond GPUs: Meet TPUs and NPUs – The AI Specialists

As AI models became larger, smarter, and more power-hungry, even the mighty GPUs started hitting limits—especially in terms of speed, energy efficiency, and cost. That’s when tech companies started designing hardware exclusively for AI.

🔷 TPUs (Tensor Processing Units): Google's AI Powerhouse

Developed by Google, TPUs are custom-built chips designed specifically for machine learning tasks, especially the kind used in deep learning models like transformers.

What makes TPUs special?

They are optimized for tensor operations, which are the core math behind most AI models (like multiplying large matrices of numbers).
Instead of handling general graphics like GPUs, TPUs focus only on what AI needs—making them faster and more efficient for tasks like training and inference.
They use a component called a Matrix Multiply Unit (MXU) to handle billions of operations per second, all tailored to neural networks.

💡 Fun fact: Google’s AI models like BERT and parts of ChatGPT are often trained on TPU-powered cloud infrastructure.

🔶 NPUs (Neural Processing Units): AI at the Edge

NPUs are the AI engines inside your smartphone, tablet, or IoT device. While TPUs power the cloud, NPUs bring intelligence right to your pocket.

Here’s what NPUs do well:

Handle on-device AI tasks like face unlock, voice recognition, and camera enhancements—without sending data to the cloud.
They use low power and are optimized for real-time inference, making them perfect for mobile and embedded systems.
NPUs allow your device to react quickly, securely, and even offline.

📱 Example: When your phone unlocks instantly just by seeing your face—even in the dark—that’s your NPU analyzing facial patterns and making a decision in milliseconds.

💾 4. Memory, Storage & Interconnects: The AI Support Crew

Having a fast chip is great—but if it has to wait for data, performance crashes.

🧠 RAM & Cache:

These are like the short-term memory of your AI system.

RAM holds active data while training or running models.
Cache is even faster, feeding the chip with immediate instructions.

More RAM = faster learning, less lag.

💽 SSDs:

Storage matters too. Solid-State Drives (SSDs) are much faster than old HDDs and can handle the huge datasets AI needs.
Faster storage = quicker model load and smoother training.

🔗 Interconnects:

Think of these as data highways between chips, memory, and storage.
Slow interconnects = traffic jams. Fast ones keep data flowing without delays.

Together, these parts make sure your AI chip isn’t just powerful—but also efficient.

📱 5. AI Hardware in Everyday Life

You interact with AI hardware more than you think:

📷 Smartphone cameras use NPUs to beautify selfies in real-time.
🎮 Gaming consoles run AI for non-playable characters using GPUs.
🚗 Self-driving cars use custom chips to analyze surroundings in milliseconds.

So yes, AI hardware isn’t just for scientists—it’s already in your pocket, your car, and your home.

⚠️ 6. The Challenges with AI hardware

As powerful as AI hardware is, it comes with its own set of challenges:

🔥 Heat & Power: More computing = more electricity = more heat. Managing this is a constant struggle.
💸 Cost: High-performance chips are expensive—often thousands of dollars.
📦 Supply Issues: Just like with PS5s, sometimes the demand is so high, manufacturers can’t keep up.
🧩 Size Constraints: AI is moving to smaller devices. How do you fit powerful chips into tiny gadgets?

🔮 7. The Future: Where AI Hardware is Headed

A few trends to keep your eye on:

⚛️ Quantum Computing: Could make today's fastest hardware look like sloths—still experimental, but full of potential.
🌍 Edge Computing: Bringing AI to your phone, car, and even your fridge. No cloud required!
🍃 Green AI: Designing hardware that’s energy-efficient and environmentally friendly.

🎉 Final Thoughts: The Unsung Heroes

While AI software gets all the applause, it’s the hardware that makes the magic possible.

So next time your voice assistant nails your question, or your camera blurs the background perfectly—remember the tiny yet mighty chips inside, working tirelessly to power your smart world.

📚 TL;DR

Term What it does Where it's found CPU General processing All computers GPU Parallel processing for AI & graphics Gaming PCs, AI labs TPU Supercharged tensor math Google servers NPU Fast AI in small devices Smartphones RAM/Cache Fast data access All systems SSD Quick storage Laptops, AI servers

Sumit’s Substack

Discussion about this post