23.1 C
Miami
Thursday, November 21, 2024

AI Alone Isn’t Ready for Chip Design

- Advertisement -spot_imgspot_img
- Advertisement -spot_imgspot_img

Chip design has come a long way since 1971, when Federico Faggin finished sketching the first commercial microprocessor, the Intel 4004, using little more than a straightedge and colored pencils. Today’s designers have a plethora of software tools at their disposal to plan and test new integrated circuits. But as chips have grown staggeringly complex—with some comprising hundreds of billions of transistors—so have the problems designers must solve. And those tools aren’t always up to the task.

Modern chip engineering is an iterative process of nine stages, from system specification to
packaging. Each stage has several substages, and each of those can take weeks to months, depending on the size of the problem and its constraints. Many design problems have only a handful of viable solutions out of 10100 to 101000 possibilities—a needle-in-a-haystack scenario if ever there was one. Automation tools in use today often fail to solve real-world problems at this scale, which means that humans must step in, making the process more laborious and time-consuming than chipmakers would like.

Not surprisingly, there is a growing interest in using
machine learning to speed up chip design. However, as our team at the Intel AI Lab has found, machine-learning algorithms are often insufficient on their own, particularly when dealing with multiple constraints that must be satisfied.

In fact, our recent attempts at developing an AI-based solution to tackle a tricky design task known as floorplanning (more about that task later) led us to a far more successful tool based on non-AI methods like classical search. This suggests that the field shouldn’t be too quick to dismiss traditional techniques. We now believe that hybrid approaches combining the best of both methods, although currently an underexplored area of research, will prove to be the most fruitful path forward. Here’s why.

The Perils of AI Algorithms

One of the biggest bottlenecks in chip design occurs in the physical-design stage, after the architecture has been resolved and the logic and circuits have been worked out. Physical design involves geometrically optimizing a chip’s layout and connectivity. The first step is to partition the chip into high-level functional blocks, such as CPU cores, memory blocks, and so on. These large partitions are then subdivided into smaller ones, called macros and standard cells. An average system-on-chip (SoC) has about 100 high-level blocks made up of hundreds to thousands of macros and thousands to hundreds of thousands of standard cells.

Next comes floorplanning, in which functional blocks are arranged to meet certain design goals, including high performance, low power consumption, and cost efficiency. These goals are typically achieved by minimizing wirelength (the total length of the nanowires connecting the circuit elements) and white space (the total area of the chip not occupied by circuits). Such floorplanning problems fall under a branch of mathematical programming known as combinatorial optimization. If you’ve ever played Tetris, you’ve tackled a very simple combinatorial optimization puzzle.

Floorplanning, in which CPU cores and other functional blocks are arranged to meet certain goals, is one of many stages of chip design. It is especially challenging because it requires solving large optimization problems with multiple constraints.Chris Philpot

Chip floorplanning is like Tetris on steroids. The number of possible solutions, for one thing, can be astronomically large—quite literally. In a typical SoC floorplan, there are approximately 10250 possible ways to arrange 120 high-level blocks; by comparison, there are an estimated 1024 stars in the universe. The number of possible arrangements for macros and standard cells is several orders of magnitude larger still.

Given a single objective—squeezing functional blocks into the smallest possible silicon area, for example—commercial floorplanning tools can solve problems of such scale in mere minutes. They flounder, however, when faced with multiple goals and constraints, such as rules about where certain blocks must go, how they can be shaped, or which blocks must be placed together. As a result, human designers frequently resort to trial and error and their own ingenuity, adding hours or even days to the production schedule. And that’s just for one substage.

Despite the triumphs in machine learning over the past decade, it has so far had relatively little impact on chip design. Companies like Nvidia have begun
training large language models (LLMs)—the form of AI that powers services like Copilot and ChatGPT—to write scripts for hardware design programs and analyze bugs. But such coding tasks are a far cry from solving hairy optimization problems like floorplanning.

At first glance, it might be tempting to throw
transformer models, the basis for LLMs, at physical-design problems, too. We could, in theory, create an AI-based floorplanner by training a transformer to sequentially predict the physical coordinates of each block on a chip, similarly to how an AI chatbot sequentially predicts words in a sentence. However, we would quickly run into trouble if we tried to teach the model to place blocks so that they do not overlap. Though simple for a human to grasp, this concept is nontrivial for a computer to learn and thus would require inordinate amounts of training data and time. The same thing goes for further design constraints, like requirements to place blocks together or near a certain edge.

An illustration of a floorplan and a B*-tree data structure.A simple floorplan [left] can be represented by a B*-tree data structure [right].Chris Philpot

So, we took a different approach. Our first order of business was to choose an effective data structure to convey the locations of blocks in a floorplan. We landed on what is called a B*-tree. In this structure, each block is represented as a node on a binary tree. The block in the bottom left corner of the floorplan becomes the root. The block to the right becomes one branch; the block on top becomes the other branch. This pattern continues for each new node. Thus, as the tree grows, it encapsulates the floorplan as it fans rightward and upward.

A big advantage of the B*-tree structure is that it guarantees an overlap-free floorplan because block locations are relative rather than absolute—for example, “above that other block” rather than “at this spot.” Consequently, an AI floorplanner does not need to predict the exact coordinates of each block it places. Instead, it can trivially calculate them based on the block’s dimensions and the coordinates and dimensions of its relational neighbor. And voilà—no overlaps.

With our data structure in place, we then trained several machine-learning models—specifically, graph neural networks, diffusion models, and transformer-based models—on a dataset of millions of optimal floorplans. The models learned to predict the best block to place above or to the right of a previously placed block to generate floorplans that are optimized for area and wirelength. But we quickly realized that this step-by-step method was not going to work. We had scaled the floorplanning problems to around 100 blocks and added hard constraints beyond the no-overlap rule. These included requiring some blocks to be placed at a predetermined location like an edge or grouping blocks that share the same voltage source. However, our AI models wasted time pursuing suboptimal solutions.

We surmised that the hangup was the models’ inability to backtrack: Because they place blocks sequentially, they cannot retrospectively fix earlier bad placements. We could get around this hurdle using techniques like a reinforcement-learning agent, but the amount of exploration such an agent required to train a good model would be impractical. Having reached a dead end, we decided to ditch block-by-block decision making and try a new tack.

Returning to Chip Design Tradition

A common way to solve massive combinatorial optimization problems is with a search technique called
simulated annealing (SA). First described in 1983, SA was inspired by metallurgy, where annealing refers to the process of heating metal to a high temperature and then slowly cooling it. The controlled reduction of energy allows the atoms to settle into an orderly arrangement, making the material stronger and more pliable than if it had cooled quickly. In an analogous manner, SA progressively homes in on the best solution to an optimization problem without having to tediously check every possibility.

Here’s how it works. The algorithm starts with a random solution—for our purposes, a random floorplan represented as a B*-tree. We then allow the algorithm to take one of three actions, again at random: It can swap two blocks, move a block from one position to another, or adjust a block’s width-to-height ratio (without changing its area). We judge the quality of the resulting floorplan by taking a weighted average of the total area and wirelength. This number describes the “cost” of the action.

If the new floorplan is better—that is, it decreases the cost—we accept it. If it’s worse, we also initially accept it, knowing that some “bad” decisions could lead in good directions. Over time, however, as the algorithm keeps adjusting blocks randomly, we accept cost-increasing actions less and less frequently. As in metalworking, we want to make this transition gradually. Just as cooling a metal too quickly can trap its atoms in disorderly arrangements, restricting the algorithm’s explorations too soon can trap it in suboptimal solutions, called local minima. By giving the algorithm enough leeway to dodge these pitfalls early on, we can then coax it toward the solution we really want: the global minimum (or a good approximation of it).

We had much more success solving floorplanning problems with SA than with any of our machine-learning models. Because the SA algorithm has no notion of placement order, it can make changes to any block at any time, essentially allowing the algorithm to correct for earlier mistakes. Without constraints, we found it could solve highly complex floorplans with hundreds of blocks in minutes. By comparison, a chip designer working with commercial tools would need hours to solve the same puzzles.

An illustration of a series of numbered squares.  Using a search technique called simulated annealing, a floorplanning algorithm starts with a random layout [top]. It then tries to improve the layout by swapping two blocks, moving a block to another position, or adjusting a block’s aspect ratio.Chris Philpot

Of course, real-world design problems have constraints. So we gave our SA algorithm some of the same ones we had given our machine-learning model, including restrictions on where some blocks are placed and how they are grouped. We first tried addressing these hard constraints by adding the number of times a floorplan violated them to our cost function. Now, when the algorithm made random block changes that increased constraint violations, we rejected these actions with increasing probability, thereby instructing the model to avoid them.

Unfortunately, though, that tactic backfired. Including constraints in the cost function meant that the algorithm would try to find a balance between satisfying them and optimizing the area and wirelength. But hard constraints, by definition, can’t be compromised. When we increased the weight of the constraints variable to account for this rigidity, however, the algorithm did a poor job at optimization. Instead of the model’s efforts to fix violations resulting in global minima (optimal floorplans), they repeatedly led to local minima (suboptimal floorplans) that the model could not escape.

Moving Forward with Machine Learning

Back at the drawing board, we conceived a new twist on SA, which we call constraints-aware SA (CA-SA). This variation employs two algorithmic modules. The first is an SA module, which focuses on what SA does best: optimizing for area and wirelength. The second module picks a random constraint violation and fixes it. This repair module kicks in very rarely—about once every 10,000 actions—but when it does, its decision is always accepted, regardless of the effect on area and wirelength. We can thus guide our CA-SA algorithm toward solutions that satisfy hard constraints without hamstringing it.

Using this approach, we developed an open-source floorplanning tool that runs multiple iterations of CA-SA simultaneously. We call it
parallel simulated annealing with constraints awareness, or Parsac for short. Human designers can choose from the best of Parsac’s solutions. When we tested Parsac on popular floorplanning benchmarks with up to 300 blocks, it handily beat every other published formulation, including other SA-based algorithms and machine-learning models.

An illustration a series of colored blocks.Without constraints awareness, a regular simulated-annealing algorithm produces a suboptimal floorplan that cannot be improved. In this case, Block X gets trapped in an invalid position. Any attempt to fix this violation leads to several other violations.Chris Philpot

These established benchmarks, however, are more than two decades old and do not reflect modern SoC designs. A major drawback is their lack of hard constraints. To see how Parsac performed on more realistic designs, we added our own constraints to the benchmark problems, including stipulations about block placements and groupings. To our delight, Parsac successfully solved high-level floorplanning problems of commercial scale (around 100 blocks) in less than 15 minutes, making it the fastest known floorplanner of its kind.

We are now developing another non-AI technique based on geometric search to handle floorplanning with oddly shaped blocks, thus diving deeper into real-world scenarios. Irregular layouts are too complex to be represented with a B*-tree, so we went back to sequential block placing. Early results suggest this new approach could be even faster than Parsac, but because of the no-backtracking problem, the solutions may not be optimal.

Meanwhile, we are working to adapt Parsac for
macro placements, one level more granular than block floorplanning, which means scaling from hundreds to thousands of elements while still obeying constraints. CA-SA alone is likely too slow to efficiently solve problems of this size and complexity, which is where machine learning could help.

An illustration of 3 charts and a series of colored squares.  Parsac solves commercial-scale floorplanning problems within 15 minutes, making it the fastest known algorithm of its kind. The initial layout contains many blocks that violate certain constraints [red]. Parsac alters the floorplan to minimize the area and wire-length while eliminating any constraint violations.

Given an SA-generated floorplan, for instance, we could train an AI model to predict which action will improve the layout’s quality. We could then use this model to guide the decisions of our CA-SA algorithm. Instead of taking only random—or “dumb”—actions (while accommodating constraints), the algorithm would accept the model’s “smart” actions with some probability. By co-operating with the AI model, we reasoned, Parsac could dramatically reduce the number of actions it takes to find an optimal solution, slashing its run time. However, allowing some random actions is still crucial because it enables the algorithm to fully explore the problem. Otherwise, it’s apt to get stuck in suboptimal traps, like our failed AI-based floorplanner.

This or similar approaches could be useful in solving other complex combinatorial optimization problems beyond floorplanning. In chip design, such problems include optimizing the routing of interconnects within a core and Boolean circuit minimization, in which the challenge is to construct a circuit with the fewest gates and inputs to execute a function.

A Need for New Benchmarks

Our experience with Parsac also inspired us to create
open datasets of sample floorplans, which we hope will become new benchmarks in the field. The need for such modern benchmarks is increasingly urgent as researchers seek to validate new chip-design tools. Recent research, for instance, has made claims about the performance of novel machine-learning algorithms based on old benchmarks or on proprietary layouts, inviting questions about the claims’ legitimacy.

We released two datasets, called FloorSet-Lite and FloorSet-Prime, which are available now on
GitHub. Each dataset contains 1 million layouts for training machine-learning models and 100 test layouts optimized for area and wirelength. We designed the layouts to capture the full breadth and complexity of contemporary SoC floorplans. They range from 20 to 120 blocks and include practical design constraints.

An illustration of a series of red and blue geometric shapes. To develop machine learning for chip design, we need many sample floorplans. A sample from one of our FloorSet datasets has constraints [red] and irregularly shaped blocks, which are common in real-world designs.Chris Philpot

The two datasets differ in their level of complexity. FloorSet-Lite uses rectangular blocks, reflecting early design phases, when blocks are often configured into simple shapes. FloorSet-Prime, on the other hand, uses irregular blocks, which are more common later in the design process. At that point, the placement of macros, standard cells, and other components within blocks has been refined, leading to nonrectangular block shapes.

Although these datasets are artificial, we took care to incorporate features from commercial chips. To do this, we created detailed statistical distributions of floorplan properties, such as block dimensions and types of constraints. We then sampled from these distributions to create synthetic floorplans that mimic real chip layouts.

Such robust, open repositories could significantly advance the use of machine learning in chip design. It’s unlikely, however, that we will see fully AI based solutions for prickly optimization problems like floorplanning. Deep-learning models dominate tasks like object identification and language generation because they are exceptionally good at capturing statistical regularities in their training data and correlating these patterns with desired outputs. But this method does not work well for hard combinatorial optimization problems, which require techniques beyond pattern recognition to solve.

Instead, we expect that hybrid algorithms will be the ultimate winners. By learning to identify the most promising types of solution to explore, AI models could intelligently guide search agents like Parsac, making them more efficient. Chip designers could solve problems faster, enabling the creation of more complex and power-efficient chips. They could even combine several design stages into a single optimization problem or pursue multiple designs concurrently. AI might not be able to create a chip—or even resolve a single design stage—entirely on its own. But when combined with other innovative approaches, it will be a game changer for the field.

Source link

- Advertisement -spot_imgspot_img

Highlights

- Advertisement -spot_img

Latest News

- Advertisement -spot_img