Inside an AI Chip

Click on each component to see what it does.

AI_CHIP_INT NPU NeuralProcessingUnit Memory DRAM GPU GraphicsProcessingUnit CPU CentralProcessingUnit microscopic view of an AI chip

All chips are semiconductors, and AI chips is a specific segment of semiconductors expected to see significant future growth. These AI chips typically come in the form of a “system-on-chip” (SoC). That is, a computer chip that contains multiple functions beyond the central processing unit (CPU) that is the workhorse for all basic computing functions. An SoC will also include parts that process images and video, store chunks of memory, and perform ML tasks (the AI component), many of which are also different types of chips.

Most chips are based on an underlying architecture whose core intellectual property is dominated by a handful of firms, chief among them ARM Holdings. But when it comes to design, there isn’t exactly a “standard” SoC for AI chips because these are increasingly enhanced for specialized AI functions. Yet, while chip design has become more fragmented and diverse, most SoCs contain the main components outlined in the graphic above.

AI Chips in Brief

Chips can have many shapes, sizes, and functions. But what exactly makes a chip, or SoC, an “AI chip”? To answer this question, it’s worth taking a step back to understand the linkage between chips and AI, as well as the end-use markets that these chips serve.

AI is hardly a new technology—the field was pioneered as early as the mid-1950s. But AI’s potential could not be realized at the time because computing power and data availability were insufficient. The rise of AI in recent years is largely due to the confluence of several key factors: maturation of AI algorithms, abundance of data, and a semiconductor industry that now produces chips capable of handling the computational needs of AI functions.

No advanced chips, no AI applications.

But how exactly does AI work in practice? One example would be when you point a smartphone camera at a real house cat and the device can instantly distinguish that cat from, say, a lynx or a serval. This sort of image recognition is easy for a human, but exceptionally difficult for a computer.

Which is why AI algorithms first need to be “trained” to consistently recognize a cat with high accuracy. That training happens in a layered “neural network” in the cloud that mimics the human brain.

Google’s representation of the layered computations needed to train algorithms to recognize a cat.

GPUs have been the preferred chips for AI training because their main attribute is parallel processing. This may change in the future as companies develop their own chips optimized for AI training, like Google’s Tensor Processing Unit (TPU). The company claims that 8 TPUs can achieve the same performance as 32 GPUs.

Inference, on the other hand, usually happens at the “edge,” which means inside devices like phones, laptops, surveillance cameras, or autonomous vehicles (AVs). Use cases for AI inference are diverse, so chips need to be customized for particular functions. For instance, the additional AI inference computation required to graduate from a Level 3 AV (“no eyes needed”) to a Level 4 AV (“no driver needed”) is so significant as to require a different AI chip.

The next generation of inference-focused AI chips will be dominated by ASICs (application-specific integrated circuits), with a smaller but still growing role for FPGAs (field-programmable gate arrays). The main difference between the two is that ASICs are less flexible and are customized for specific functions, while FPGAs can be re-programmed after manufacturing to perform new functions, although they are less efficient at these functions than ASICs. This diversity of AI chip designs in an ASIC-heavy marketplace opens the door to new market entrants.

The choice of the type of AI chip comes down to the particular end-use. For instance, Apple has been using ASICs in its latest generation of AI chips, while California-based Xilinx specializes in FPGAs.

Primer on AI Chips Fabrication

Manufacturing AI chips involves the same sophisticated process used for all advanced semiconductor chips. It can be divided into three basic stages: design, fabrication, and assembly and packaging. Firms that maintain all three stages in-house are known as “integrated device manufacturers” (IDMs), such as Intel and Samsung. But the industry has seen the rise of “fabless” firms, such as Broadcom and Qualcomm, that specialize in chip design and outsource fabrication to specialized “foundries” and outsource assembly and packaging to other contractors. We now examine fabrication and assembly in more detail.

Front End

Wafer fabrication takes place in foundries and can involve more than 300 discrete steps. But it starts with round silicon wafers with diameters of around 5-8 inches (though they are increasing in size), which are doped with chemicals such as Boron to prepare the silicon for the next step.

Photomasking is then used to imprint circuitry patterns onto the doped wafers. This process involves one of the most expensive tools in the fabrication process, photolithography machines, which are mainly supplied by the Dutch company ASML. The most advanced of these machines, capable of making 7nm chips using “Extreme Ultraviolet” lithography, carries a price tag of up to $120 million, roughly the cost of a Boeing 737.

A metal layer is then laid over the imprinted wafer, and precision instruments etch out electrical circuits in the precise pattern that had been imprinted on the silicon wafer through photomasking. These steps are repeated to create many square “dies” (tiny individual chips) across almost the entire surface of the silicon wafer—similar to a book of stamps.

Back End

This is usually the point in the production process when wafers leave the foundry and are received by an “outsourced semiconductor assembly and testing” (OSAT) contractor. Each completed wafer is probed and tested with specialized equipment, and bad dies are marked for discard with a black dot. The portion of good dies is considered the yield for the wafer.

After quality control, the final stage is when the dies are assembled and packaged to become chips usable by end-device producers. After cutting the dies off the wafer, each will be placed on a circuit board, usually an area smaller than a baby’s fingernail. At this point, a fully functional semiconductor device with millions of electronic components is formed.

But such devices are tiny and fragile and need to be packaged in ceramic or plastic to prevent damage. The type of packaging differs depending on the specific purpose of the semiconductor.

And just like that, the chips are ready to be shipped to a company that ordered them.

Composed of multiple processor cores, the CPU is the heart of any SoC. All computing devices have one. It performs the bulk of computing tasks, including AI inference in the cloud or in edge devices, and the number of cores largely determines a CPU’s speed and processing power.

A more specialized chip that was designed to process images, the GPU became important in ML because it can handle the parallel computations needed to crunch the data that enable AI applications like image recognition. But the GPU’s task-specific nature also makes it less flexible than the CPU, so it is preferred for AI training purposes rather than for inference at the edge.

The least flexible of the processing units, the NPU is specifically designed to execute ML algorithms. It is typically a complement to the CPU that accelerates a specific AI function that involves highly complicated matrix computations—hence it is often called an “AI accelerator.” NPUs thereby free up CPU capacity and increase overall chip efficiency. The “neural engine” in Apple’s latest A13 bionic SoC is one example of an NPU.

Memory devices are integral to SoCs, because they store enormous volumes of data for on-demand retrieval. The more memory a chip has, the faster it can retrieve data, and the more efficiently it can execute AI applications. These devices, which usually take the form of dynamic random access memory (DRAM), are composed of tiny cells that store bits of data on capacitors. As the name implies, that storage is “dynamic” and not permanent, because the capacitors need to be refreshed periodically to avoid data loss.


A close-up view of the intricate cross-hatching patterns of transistors on a chip, which are microscopic in size. For instance, the current generation of advanced AI chips have transistors that are just 7nm wide, which is less than three times the width of a human DNA strand. Such a 7nm AI chip, which Apple has rolled out, can contain some 8.5 billion transistors.