19.8 C
Miami
Monday, November 17, 2025

Your Laptop Isn’t Ready for LLMs. Yet…

- Advertisement -spot_imgspot_img
- Advertisement -spot_imgspot_img

Odds are the PC in your office today isn’t ready to run AI large language models (LLMs).

Today, most users interact with LLMs via an online, browser-based interface. The more technically inclined might use an application programming interface or command line interface. In either case, the queries are sent to a data center, where the model is hosted and run. It works well, until it doesn’t; a data-center outage can take a model offline for hours. Plus, some users might be unwilling to send personal data to an anonymous entity.

Running a model locally on your computer could offer significant benefits: lower latency, better understanding of your personal needs, and the privacy that comes with keeping your data on your own machine.

However, for the average laptop that’s over a year old, the number of useful AI models you can run locally on your PC is close to zero. This laptop might have a four- to eight-core processor (CPU), no dedicated graphics chip (GPU) or neural-processing unit (NPU), and 16 gigabytes of RAM, leaving it underpowered for LLMs.

Even new, high-end PC laptops, which often include an NPU and a GPU, can struggle. The largest AI models have over a trillion parameters, which requires memory in the hundreds of gigabytes. Smaller versions of these models are available, even prolific, but they often lack the intelligence of larger models, which only dedicated AI data centers can handle.

The situation is even worse when other AI features aimed at making the model more capable are considered. Small language models (SLMs) that run on local hardware either scale back these features or omit them entirely. Image and video generation are difficult to run locally on laptops, too, and until recently they were reserved for high-end tower desktop PCs.

That’s a problem for AI adoption.

To make running AI models locally possible, the hardware found inside laptops and the software that runs on it will need an upgrade. This is the beginning of a shift in laptop design that will give engineers the opportunity to abandon the last vestiges of the past and reinvent the PC from the ground up.

NPUs enter the chat

The most obvious way to boost a PC’s AI performance is to place a powerful NPU alongside the CPU.

An NPU is a specialized chip designed for the matrix multiplication calculations that most AI models rely on. These matrix operations are highly parallelized, which is why GPUs (which were already better at highly parallelized tasks than CPUs) became the go-to option for AI data centers.

However, because NPUs are designed specifically to handle these matrix operations—and not other tasks, like 3D graphics—they’re more power efficient than GPUs. That’s important for accelerating AI on portable consumer technology. NPUs also tend to provide better support for low-precision arithmetic than laptop GPUs. AI models often use low-precision arithmetic to reduce computational and memory needs on portable hardware, such as laptops.

“With the NPU, the entire structure is really designed around the data type of tensors [a multidimensional array of numbers],” said Steven Bathiche, technical fellow at Microsoft. “NPUs are much more specialized for that workload. And so we go from a CPU that can handle three [trillion] operations per second (TOPS), to an NPU” in Qualcomm’s Snapdragon X chip, which can power Microsoft’s Copilot+ features. This includes Windows Recall, which uses AI to create a searchable timeline of a user’s usage history by analyzing screenshots, and Windows Photos’ Generative erase, which can remove the background or specific objects from an image.

While Qualcomm was arguably the first to provide an NPU for Windows laptops, it kickstarted an NPU TOPS arms race that also includes AMD and Intel, and the competition is already pushing NPU performance upward.

In 2023, prior to Qualcomm’s Snapdragon X, AMD chips with NPUs were uncommon, and those that existed delivered about 10 TOPS. Today, AMD and Intel have NPUs that are competitive with Snapdragon, providing 40 to 50 TOPS.

Dell’s upcoming Pro Max Plus AI PC will up the ante with a Qualcomm AI 100 NPU that promises up to 350 TOPS, improving performance by a staggering 35 times compared with that of the best available NPUs just a few years ago. Drawing that line up and to the right implies that NPUs capable of thousands of TOPS are just a couple of years away.

How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly. It’s not possible to run these models on today’s consumer hardware, so real-world tests just can’t be done. But it stands to reason that we’re within throwing distance of those capabilities. It’s also worth noting that LLMs are not the only use case for NPUs. Vinesh Sukumar, Qualcomm’s head of AI and machine learning product management, says AI image generation and manipulation is an example of a task that’s difficult without an NPU or high-end GPU.

Building balanced chips for better AI

Faster NPUs will handle more tokens per second, which in turn will deliver a faster, more fluid experience when using AI models. Yet there’s more to running AI on local hardware than throwing a bigger, better NPU at the problem.

Mike Clark, corporate fellow design engineer at AMD, says that companies that design chips to accelerate AI on the PC can’t put all their bets on the NPU. That’s in part because AI isn’t a replacement for, but rather an addition to, the tasks a PC is expected to handle.

“We must be good at low latency, at handling smaller data types, at branching code—traditional workloads. We can’t give that up, but we still want to be good at AI,” says Clark. He also noted that “the CPU is used to prepare data” for AI workloads, which means an inadequate CPU could become a bottleneck.

NPUs must also compete or cooperate with GPUs. On the PC, that often means a high-end AMD or Nvidia GPU with large amounts of built-in memory. The Nvidia GeForce RTX 5090’s specifications quote an AI performance up to 3,352 TOPS, which leaves even the Qualcomm AI 100 in the dust.

That comes with a big caveat, however: power. Though extremely capable, the RTX 5090 is designed to draw up to 575 watts on its own. Mobile versions for laptops are more miserly but still draw up to 175 W, which can quickly drain a laptop battery.

Simon Ng, client AI product manager at Intel, says the company is “seeing that the NPU will just do things much more efficiently at lower power.” Rakesh Anigundi, AMD’s director of product management for Ryzen AI, agrees. He adds that low-power operation is particularly important because AI workloads tend to take longer to run than other demanding tasks, like encoding a video or rendering graphics. “You’ll want to be running this for a longer period of time, such as an AI personal assistant, which could be always active and listening for your command,” he says.

These competing priorities mean chip architects and system designers will need to make tough calls about how to allocate silicon and power in AI PCs, especially those that often rely on battery power, such as laptops.

“We have to be very deliberate in how we design our system-on-a-chip to ensure that a larger SoC can perform to our requirements in a thin and light form factor,” said Mahesh Subramony, senior fellow design engineer at AMD.

When it comes to AI, memory matters

Squeezing an NPU alongside a CPU and GPU will improve the average PC’s performance in AI tasks, but it’s not the only revolutionary change AI will force on PC architecture. There’s another that’s perhaps even more fundamental: memory.

Most modern PCs have a divided memory architecture rooted in decisions made over 25 years ago. Limitations in bus bandwidth led GPUs (and other add-in cards that might require high-bandwidth memory) to move away from accessing a PC’s system memory and instead rely on the GPU’s own dedicated memory. As a result, powerful PCs typically have two pools of memory, system memory and graphics memory, which operate independently.

That’s a problem for AI. Models require large amounts of memory, and the entire model must load into memory at once. The legacy PC architecture, which splits memory between the system and the GPU, is at odds with that requirement.

“When I have a discrete GPU, I have a separate memory subsystem hanging off it,” explained Joe Macri, vice president and chief technology officer at AMD. “When I want to share data between our [CPU] and GPU, I’ve got to take the data out of my memory, slide it across the PCI Express bus, put it in the GPU memory, do my processing, then move it all back.” Macri said this increases power draw and leads to a sluggish user experience.

The solution is a unified memory architecture that provides all system resources access to the same pool of memory over a fast, interconnected memory bus. Apple’s in-house silicon is perhaps the most well-known recent example of a chip with a unified memory architecture. However, unified memory is otherwise rare in modern PCs.

AMD is following suit in the laptop space. The company announced a new line of APUs targeted at high-end laptops, Ryzen AI Max, at CES (Consumer Electronics Show) 2025.

Ryzen AI Max places the company’s Ryzen CPU cores on the same silicon as Radeon-branded GPU cores, plus an NPU rated at 50 TOPS, on a single piece of silicon with a unified memory architecture. Because of this, the CPU, GPU, and NPU can all access up to a maximum of 128 GB of system memory, which is shared among all three. AMD believes this strategy is ideal for memory and performance management in consumer PCs. “By bringing it all under a single thermal head, the entire power envelope becomes something that we can manage,” said Subramony.

The Ryzen AI Max is already available in several laptops, including the HP Zbook Ultra G1a and the Asus ROG Flow Z13. It also powers the Framework Desktop and several mini desktops from less well-known brands, such as the GMKtec EVO-X2 AI mini PC.

Intel and Nvidia will also join this party, though in an unexpected way. In September, the former rivals announced an alliance to sell chips that pair Intel CPU cores with Nvidia GPU cores. While the details are still under wraps, the chip architecture will likely include unified memory and an Intel NPU.

Chips like these stand to drastically change PC architecture if they catch on. They’ll offer access to much larger pools of memory than before and integrate the CPU, GPU, and NPU into one piece of silicon that can be closely monitored and controlled. These factors should make it easier to shuffle an AI workload to the hardware best suited to execute it at a given moment.

Unfortunately, they’ll also make PC upgrades and repairs more difficult, as chips with a unified memory architecture typically bundle the CPU, GPU, NPU, and memory into a single, physically inseparable package on a PC mainboard. That’s in contrast with traditional PCs, where the CPU, GPU, and memory can be replaced individually.

Microsoft’s bullish take on AI is rewriting Windows

MacOS is well regarded for its attractive, intuitive user interface, and Apple Silicon chips have a unified memory architecture that can prove useful for AI. HHowever, Apple’s GPUs aren’t as capable as the best ones used in PCs, and its AI tools for developers are less widely adopted.

Chrissie Cremers, cofounder of the AI-focused marketing firm Aigency Amsterdam, told me earlier this year that although she prefers macOS, her agency doesn’t use Mac computers for AI work. “The GPU in my Mac desktop can hardly manage [our AI workflow], and it’s not an old computer,” she said. “I’d love for them to catch up here, because they used to be the creative tool.”

Dan Page

That leaves an opening for competitors to become the go-to choice for AI on the PC—and Microsoft knows it.

Microsoft launched Copilot+ PCs at the company’s 2024 Build developer conference. The launch had problems, most notably the botched release of its key feature, Windows Recall, which uses AI to help users search through anything they’ve seen or heard on their PC. Still, the launch was successful in pushing the PC industry toward NPUs, as AMD and Intel both introduced new laptop chips with upgraded NPUs in late 2024.

At Build 2025, Microsoft also revealed Windows’ AI Foundry Local, a “runtime stack” that includes a catalog of popular open-source large language models. While Microsoft’s own models are available, the catalog includes thousands of open-source models from Alibaba, DeepSeek, Meta, Mistral AI, Nvidia, OpenAI, Stability AI, xAI, and more.

Once a model is selected and implemented into an app, Windows executes AI tasks on local hardware through the Windows ML runtime, which automatically directs AI tasks to the CPU, GPU, or NPU hardware best suited for the job.

AI Foundry also provides APIs for local knowledge retrieval and low-rank adaptation (LoRA), advanced features that let developers customize the data an AI model can reference and how it responds. Microsoft also announced support for on-device semantic search and retrieval-augmented generation, features that help developers build AI tools that reference specific on-device information.

“[AI Foundry] is about being smart. It’s about using all the processors at hand, being efficient, and prioritizing workloads across the CPU, the NPU, and so on. There’s a lot of opportunity and runway to improve,” said Bathiche.

Toward AGI on PCs

The rapid evolution of AI-capable PC hardware represents more than just an incremental upgrade. It signals a coming shift in the PC industry that’s likely to wipe away the last vestiges of the PC architectures designed in the ’80s, ’90s, and early 2000s.

The combination of increasingly powerful NPUs, unified memory architectures, and sophisticated software-optimization techniques is closing the performance gap between local and cloud-based AI at a pace that has surprised even industry insiders, such as Bathiche.

It will also nudge chip designers toward ever-more-integrated chips that have a unified memory subsystem and to bring the CPU, GPU, and NPU onto a single piece of silicon—even in high-end laptops and desktops. AMD’s Subramony said the goal is to have users “carrying a mini workstation in your hand, whether it’s for AI workloads, or for high compute. You won’t have to go to the cloud.”

A change that massive won’t happen overnight. Still, it’s clear that many in the PC industry are committed to reinventing the computers we use every day in a way that optimizes for AI. Qualcomm’s Vinesh Sukumar even believes affordable consumer laptops, much like data centers, should aim for AGI.

“I want a complete artificial general intelligence running on Qualcomm devices,” he said. “That’s what we’re trying to push for.”

From Your Site Articles

Related Articles Around the Web

Source link

- Advertisement -spot_imgspot_img

Highlights

- Advertisement -spot_img

Latest News

- Advertisement -spot_img