Google has developed its second-generation tensor processor—four 45-teraflops chips packed onto a 180 TFLOPS tensor processor unit (TPU) module, to be used for machine learning and artificial intelligence—and the company is bringing it to the cloud. TPU-based computation will be available to Google Cloud Compute later this year.

Typically in machine-learning workloads, initial training and model building are divided from the subsequent pattern matching against the model. The former workload is the one that is most heavily dependent on massive compute power, and it’s this that has generally been done on GPUs. Google’s first-generation TPUs were used for the second part—making inferences based on the model, to recognize images, language, or whatever. Those first generation custom chips are 15 to 30 times faster and 30 to 80 times more power-efficient than CPUs and GPUs for these workloads, and the company has been using them already for its AlphaGo Go-playing computer, as well as its search results.

The new TPUs are optimized for both workloads, allowing the same chips to be used for both training and making inferences. Each card has its own high-speed interconnects, and 64 of the cards can be linked into what Google calls a pod, with 11.5 petaflops total; one petaflops is 1015floating point operations per second.

A 64-card TPU pod, for 11.5 petaflops of computation.
Enlarge / A 64-card TPU pod, for 11.5 petaflops of computation.

Making comparisons with other machine-learning solutions is difficult. Most GPUs have their performance measured in terms of single precision FLOPS, which use 32-bit numbers. The GPUs can typically also operate in double-precision mode (64-bit numbers) and half-precision mode (16-bit numbers). Sometimes, these alternate modes simply halve (for double precision) or double (for half precision) the overall performance, but that’s not universal. Machine learning workloads tend to use these half-precision modes when they can. Google’s first-generation TPUs, however, don’t use floating point at all; they use 8-bit integer approximations to floating point.

Quite how floating point performance maps to these integer workloads isn’t clear, and the ability to use the new TPU for training suggests that Google may be using 16-bit floating point instead. But as a couple of points of comparison: AMD’s forthcoming Vega GPU should offer 13 TFLOPS of single precision, 25 TFLOPS of half-precision performance, and the machine-learning accelerators that Nvidia announced recently—the Volta GPU-based Tesla V100—can offer 15 TFLOPS single precision and 120 TFLOPS for “deep learning” workloads. Nvidia is making similar promises to Google, too, boasting of substantially accelerated training. Microsoft has been using FPGAs for similar workloads, though, again, a performance comparison is tricky; the company has performed demonstrations of more than 1 exa-operations per second (that is, 1018 operations), though it didn’t disclose how many chips that used or the nature of each operation.


Accelerated Machine Learning

Machine learning (ML) has the power to greatly simplify our lives. Improvements in speech recognition and language understanding help all of us interact more naturally with technology. Businesses rely on ML to strengthen network security and reduce fraud. Advances in medical imaging enabled by ML can increase the accuracy of medical diagnoses and expand access to care, ultimately saving lives.

Speed Up Machine Learning Workloads

These revolutionary Cloud TPUs were designed from the ground up to accelerate machine learning workloads. Each Cloud TPU provides up to 180 teraflops of performance, providing the computational power to train and run cutting-edge machine learning models. Cloud TPUs can help you transform your business or create the next research breakthrough.

On-Demand ML Supercomputing

Access powerful machine learning accelerators on demand—no up-front capital investment required. Whether your task requires Cloud TPUs for hours or weeks, you can get exactly the machine learning acceleration you need without creating your own datacenter.

Easy On-Ramp to Cloud

Because TensorFlow is open-source, it’s easy to take ML workloads you’re already running in TensorFlow and try them on Cloud TPUs. Use TensorFlow’s high-level APIs to move models across CPUs, GPUs, and TPUs with minimal code changes.

Access Google’s AI Innovation

Access the same accelerators used by Google to develop world-class machine learning products. Cloud TPUs are purpose-built specifically to accelerate state-of-the-art machine learning workloads, including both training and prediction.