Google’s Tensor Processing Unit: How TPUs Are Reshaping AI Chips

As companies invest in the infrastructure needed to run artificial intelligence, Google’s tensor processing unit (TPU) has become even more valuable. Here’s what to know about TPUs and how they’ll continue to reshape the AI chip landscape.

Written by Matthew Urwin
Published on Jun. 06, 2025
Image of letters "TPU" on a black background with green lines.
Image: Shutterstock
Summary: Google’s tensor processing unit (TPU) is an AI-focused chip built to accelerate machine learning tasks. Used in products like Search and Gemini, TPUs support large-scale model training and inference. The latest generation, Ironwood, aims to improve efficiency and reduce latency for enterprise AI.

The rise of artificial intelligence has led to a spike in demand for the hardware needed to run AI models, especially AI chips. According to Statista, the AI chip market reached $71 billion in 2024 and is set to expand by as much as 30 percent in 2025. Nvidia has long dominated this sector, but Google has been making strides with its tensor processing unit, or TPU.

Tensor Processing Unit (TPU) Definition

A tensor processing unit (TPU) is Google’s computer chip that specializes in facilitating AI model training and the ability of models to make predictions based on new data. Google uses TPUs to support products like Search, Maps and Gemini.

A TPU is Google’s computer chip designed specifically for developing AI models. TPUs excel at processing logical and mathematical tasks, making them ideal for accelerating machine learning workloads. And they’ll only continue to grow more powerful as the race to produce AI chips heats up amid geopolitical tensions

Here’s everything you need to know about the tensor processing unit and what lies ahead for Google’s prized chip.

AI Hardware NewsOpenAI Is Building a New Device. Here’s What We Know So Far.

 

What Is a Tensor Processing Unit (TPU)?

A tensor processing unit is a computer chip built by Google to handle the computational demands of machine learning workloads. As an application-specific integrated circuit (ASIC), a TPU specializes in training AI models and instilling in neural networks AI inference, or the ability of a model to analyze and draw insights from new data.   

Google created the TPU for its internal use, making it compatible with the TensorFlow framework — Google’s open-source library of tools and resources dedicated to building, training and deploying machine learning models. This framework has the capacity to run ML models on advanced chips like TPUs and scale these models as needed.

 

How Do TPUs Work? 

Let’s start with the building blocks of TPUs. A tensor is a data structure that stores numbers, objects and other types of data in a multi-dimensional format. It’s similar to a table of data stored in rows and columns, or a matrix — butjust in more than two dimensions. To handle tensors, TPUs perform many multiplications and additions as part of what’s called matrix processing. For this reason, a TPU possesses thousands of interconnected components called multiply-accumulators that form a physical matrix known as a systolic array.  

There are a couple of other features that are key to understanding how TPUs function: 

  • Matrix Multiply Unit (MXU): The matrix multiply unit (MXU) enables a TPU to process large volumes of data in parallel, so it can complete matrix operations. With this ability, TPUs can accelerate ML workloads and solve neural network problems. 
  • High Bandwidth Memory (HBM): High bandwidth memory (HBM) allows TPUs to retrieve data from memory more quickly, removing potential data bottlenecks and ensuring the MXU is supplied with a steady stream of data. 

What does this look like in practice? To start, a host device gathers data in a queue. The TPU on the device then takes this data and stores it in HBM, where it performs the computations. For each computation, the TPU loads parameters and data from the HBM into the MXU. Within the MXU, each multiply-accumulator performs a multiplication and passes the result to the next one from left to right, with the final output being the sum of these multiplications. The host device can then read and save these results in its memory.    

Because TPUs excel at tensor calculations, they can solve problems involving more complex and larger data sets than computer processing units (CPUs) and graphics processing units (GPUs) can. This makes them the perfect fit for situations involving neural networks, as well as machine learning and deep learning models, which use tensors to process massive volumes of data and inform their predictions.

 

Evolution of TPUs

TPUs are primed to continue shaping how machine learning models are trained and run, thanks to adjustments Google has made over the years.   

TPU v1

In 2015, Google introduced its first version of the tensor processing unit, TPU v1. The company realized that CPUs and GPUs lacked the resources required to build and scale AI applications. That’s why TPU v1 focused on inference, enabling machine learning models to complete tasks more quickly. This initial generation of TPUs accelerated machine learning workflows run in TensorFlow.  

TPU v2

In 2017, Google released the second version of the TPU, known as TPU v2. This second generation was Google’s first TPU pod — a group of interconnected TPU chips that distribute workloads among each other, completing tasks in parallel. TPU v2 then had the computational resources to execute both inference and model training. Google researchers also avoided overtraining TPU v2 to keep it flexible and relevant in the years ahead. 

TPU v3

In 2018, Google took another step forward with TPU v3. The third generation of TPUs possessed greater memory bandwidth and more TPU pods, performing better than previous generations on inference and model training. In addition, TPU v3 used liquid cooling to operate more efficiently than earlier chips, which used air cooling. 

TPU v4

In 2021, Google further cemented TPUs as being essential to AI innovation with the announcement of TPU v4. This fourth generation of TPUs showcased even more memory bandwidth and extensive TPU pods. In particular, it used a circuit switch to speed up communication between chips within TPU pods.

AI Hardware and PolicyWhat Trump’s Tariffs Mean for Apple, Nvidia and Big Tech Hardware Manufacturing in Asia

 

TPUs vs. CPUs vs. GPUs vs. NPUs

To get a better sense of how TPUs are used, it helps to view them alongside their counterparts in CPUs, GPUs and NPUs. 

Central Processing Unit (CPU) 

A central processing unit acts as the brain of a computer, following instructions from a computer program or operating system to complete various tasks. The CPU is known for being general-purpose, performing everyday functions like saving files, enabling text editing and integrating with other devices like a computer mouse.

CPUs can manage more complex tasks as well, including running gaming software, performing mathematical calculations and even handling smaller deep learning problems like AI inference in lightweight models. However, CPUs aren’t capable of parallel processing, progressing one calculation at a time in what’s known as sequential processing. In other words, a CPU can’t multitask. GPUs and TPUs are much more powerful and better suited for advanced AI and machine learning tasks.  

Graphics Processing Unit (GPU)

Graphics processing units are specialized chips that use parallel processing to execute far more complex calculations than CPUs. GPUs can either exist as separate chips or be integrated into a CPU to cover graphics needs. 

While initially intended for image and video processing, GPUs now encompass other applications like crypto mining, simulations and machine learning. CPUs emphasize parallel processing, making them a viable option for training neural networks and working with machine learning workloads to a degree. But GPUs are still not designed specifically for AI and ML tasks like TPUs. Plus, they consume a lot of energy and are less efficient than TPUs. 

Neural Processing Unit (NPU)

A neural processing unit is dedicated to neural network processing. Artificial neural networks aim to mimic how the human brain works, with two neurons that frequently exchange information forging a stronger connection. As a result, NPUs are meant for training neural networks — especially in the area of AI inference — and accelerating neural network tasks. 

Compared to TPUs, NPUs can deliver equivalent performance and consume a minimal amount of energy to execute operations. The difference lies in where each type of chip is applied: TPUs are tailored more toward cloud environments while NPUs are intended for edge computing, particularly mobile devices.

 

TPU Use Cases 

Tensor processing units play a major role in training AI models, but they can be used for far more than just AI research.  

Conducting AI Model Research and Training

TPUs can be helpful for building, training and fine-tuning models, as demonstrated by Google’s uses of these chips. For example, Google used TPUs to train models like BERT and PaLM. In the case of PaLM, Google trained the model via TPU pods.  

Supporting Cloud Applications 

Since TPUs can process massive volumes of data, Google Cloud has come to rely on them to power many of Google’s most well-known applications, including Photos, Maps and Search. The family of Gemini models and Google’s Gemini chatbot were also trained on TPUs. 

Building Chatbots

Tensor processing units can be used to train models in more complex areas like natural language processing and speech recognition, allowing users to create chatbots. For instance, Anthropic established a partnership with Google Cloud to train its AI models and deploy its chatbot Claude to a wider audience. 

Developing Self-Driving Cars

TPUs can enhance edge computing, which involves processing data at or near the data source itself. Take autonomous driving, for example. Self-driving car company Waymo uses TPUs to train neural networks before testing its models through simulations. This enables models to handle large amounts of data and react to the car’s environment in real time. 

Designing Facial Recognition Systems

Training computer vision systems can be strengthened with the introduction of TPUs, so these systems can learn to identify particular objects. This is useful for applications like facial recognition, resulting in systems that can remember a user’s face to unlock their phone or pick out individuals in crowds as part of security camera systems. 

Creating Recommendation Engines

TPUs can equip models with AI inference, enabling models to make predictions based on new data. This is especially crucial in tools like recommendation engines that suggest new shows, products or other content to users based on their past habits.  

More Developments in AI HardwareHow DeepSeek Is Accelerating the Growth of AI Infrastructure

 

Advantages of TPUs

Because they’re tailor-made to facilitate machine learning training and development, TPUs offer several benefits.   

Accelerated AI Model Training

TPUs perform well in areas like linear algebra and use parallel processing to handle complex computations, making them ideal for machine learning workloads. Due to their high performance, TPUs can cut down the amount of time it takes to train ML models and accelerate model deployment.  

Reduced Costs and Energy Usage

TPUs can complete more demanding computations while using less energy than GPUs. Combine this with the fact that TPUs can lower costs by shortening the time needed to train and deploy models, and TPUs stand out as a more efficient option for developing ML models. 

Easy Scalability 

TPUs are meant to operate at scale, adjusting to the complexity and growing volumes of data associated with machine learning problems. This makes it easier for teams to design advanced applications that support language translation, voice recognition and other services.

Cloud Compatibility

TPUs are accessible via Google’s TensorFlow framework, allowing users to quickly build and deploy machine learning models. And because users can work with TPUs through Google Cloud, they don’t need to spend extra on additional hardware. 

More on the History of Computing PowerWhat Is Moore’s Law? Is It Dead?

 

Challenges of TPUs

Although TPUs provide clear upsides for those working with machine learning, they also come with some limitations to consider. 

Higher Costs Compared to Other Chips

While TPUs possess the capacity to complete more in-depth computations, this ability may not be relevant for all projects. CPUs and GPUs could be viable and cheaper alternatives to TPUs for smaller, less complex problems. 

Restricted Framework Support

TPUs are designed to work in tandem with Google’s TensorFlow framework. Other options like JAX and PyTorch do exist, but they don’t match the convenience and TPU resources that come with using TensorFlow.  

Access Solely Through Google Cloud

A Google Cloud subscription is necessary to access TPUs, forcing those working with TPUs to rely heavily on Google’s ecosystem. This stands in contrast to CPUs and GPUs, which are created by different vendors. 

Less Flexible Than CPUs and GPUs

By nature, TPUs only specialize in machine learning applications, such as working with neural networks. This makes them much less versatile than CPUs and GPUs, which are intended for more general use.  

 

Future of TPUs and AI Hardware

Google’s latest generation of TPUs hints at a future where the company’s chips play an increasingly important role in the AI chip landscape

Improving AI Inference Capabilities

In April 2025, Google released its seventh-generation TPU called Ironwood. The TPU demonstrates higher performance than previous generations while reducing latency to more efficiently complete computations, addressing concerns about the energy consumption of generative AI. Ironwood is intended to spearhead the “age of inference,” shifting away from models that merely retrieve real-time information to models that generate insights and serve as more active collaborators. 

This move also signals a pivot from enhancing the training process to making deployment more seamless for business applications. As a result, Google has positioned itself to continue being a dominant player in the AI ecosystem as it adapts its TPUs for an era where more companies are regularly running AI solutions at scale. 

Making Data Centers More Sustainable

Google’s advancements in TPUs could hold game-changing implications for AI infrastructure. According to an internal Google study, the Trillium generation of TPUs proved to be three times more carbon-efficient than previous generations like TPU v4 when managing AI workloads. 

The study comes at a time when data centers have been seen as a contributor to AI’s energy problem. If Google’s Trillium TPUs can be deployed successfully, they could help data centers reduce their carbon footprint and run AI tasks much more efficiently. 

Expanding Access Beyond TensorFlow

Although TPUs are meant to work with TensorFlow, users can turn to non-Google alternatives in JAX and PyTorch. Cloud TPUs can run JAX and PyTorch code to perform calculations, often using a TPU slice, or a small group of TPUs within a TPU pod. Still, JAX and PyTorch aren’t as easy to use with TPUs as TensorFlow. 

But there’s reason to believe Google could further open up access to its TPUs outside of TensorFlow. If the company wants to be a key cog in an ever-growing AI ecosystem, it makes sense to broaden TPUs’ accessibility beyond Google Cloud subscribers. This is mere speculation, but it may not be too far-fetched given the apparent direction of Google’s chip strategy as revealed through its Ironwood announcement.  

Fueling Increased AI Chip Competition

Google’s upgrades to its TPUs only add more fuel to the heated race between AI chip makers. While Google’s TPU must contend with Nvidia’s GPU, it also faces fierce competition from Amazon’s Inferentia, Meta’s MTIA and Microsoft’s Maia, as well as other producers like Groq and Cerebras. Meanwhile, Chinese researchers have developed the first-ever TPU consisting of carbon nanotubes, adding another energy-efficient chip to a crowded field. 

At the same time, Google’s technology has aided competitors. For instance, Apple used Google’s TPUs to train its models that equip products with Apple Intelligence. Even if Google doesn’t become the top AI chip company, its TPUs ensure it will remain essential to the progression of the AI chip sector as a whole.

Frequently Asked Questions

TPUs can solve more complex problems and do so more efficiently than GPUs, but deciding between the two depends on the situation. GPUs can handle some machine learning tasks and perform other actions like running simulations and mining crypto. For more demanding ML workloads, a TPU may be a better fit.

A tensor processing unit (TPU) is Google’s computer chip designed for training AI models and giving models the ability to make predictions from new data. Google uses TPUs to support popular applications like Search, Maps and Photos.

TPUs were built specifically to accelerate machine learning workloads. They have the capacity to perform advanced calculations in parallel, making them ideal for training and deploying machine learning models and neural networks.

No, TPUs are not individually sold. To access TPUs, users must sign up for a Google Cloud subscription.

TPUs excel at training complicated machine learning models, deep learning models and neural networks, making them especially useful for AI research. However, they can also be used to create chatbots, develop facial recognition systems and design self-driving cars.

Yes, users can run PyTorch code and perform calculations on TPUs. However, PyTorch receives less support compared to TensorFlow.

Explore Job Matches.