Arms senior VP explains AI's true impact, and its not what you think

Few companies drive their industries as much as Arm, which controls most production-level instruction set architectures and CPU core designs behind the mobile device market. Arm technology also powers an increasing majority of the up-and-coming self-driving vehicle sphere, underpins rapidly advancing data center processing, and enables over 300 billion Internet of Things devices and counting.

Instruction set architectures, or ISAs, define how third-party software interacts with the hardware’s 1s and 0s, allowing developers to write code that works with microchips of different designs. Arm owns IP intellectual property rights for and develops nearly all of today’s high-efficiency ISAs and most of the physical systems-on-a-chip they communicate with.

Despite its significance, most consumers don’t understand or realize Arm’s role in cutting-edge electronics development. Partly in light of three new Arm Cortex CPUs launching at Computex in June, Chris Bergey, Arm Senior Vice President and General Manager of its Client Line of Business, sat down with Android Police to discuss the company’s efforts to empower on-device AI processing for the future.

Why RISC-V support is a big deal for Qualcomm and Android

When hardware developers compete, consumers win

Arm’s advanced AI aims

A multifaceted approach to progress

A screenshot from a video of Arm's Chris Bergey talking about the future of AI in 2023

Building on the Cortex CPUs powering countless phones, the recently outlined cores are important. Arm’s top-of-the-line Cortex-X925 boasts a 35% instruction-per-clock performance boost. This allows more complex code to operate with lower latency, minimal power demand increases, and higher speeds, especially when undergoing single-instruction, multiple-data (SIMD) parallel processing, which boosts AI performance.

I planned to bring up the newly formed coalition for re-energizing x86 ISAs consisting of leaders like Intel, AMD, and Microsoft. Bergey beat me to it right away, referencing the announcement and proudly explaining, “That, quite frankly, is something we’ve been doing for 20 years. We move the architecture forward and provide the consistency that allows developers to target a piece of hardware and go anywhere with it.”

A graphic outlining the basics of Arm's Kleidi libraries

To facilitate that forward movement, Arm’s newly announced Kleidi libraries (which, Bergey pointed out, are planned for inclusion in Google’s extensive AI stack) act as a framework for AI programming. That includes the KleidiCV library set, which outlines the computer vision techniques expanding in, for example, the self-driving automotive and industrial Internet of Things fields.

If you ask me what the power of Arm is, it’s the 20 million developers … It is hard to make great hardware, but it’s even harder to build an AI software ecosystem.

The Kleidi libraries are a self-contained microkernel, or basic software layer, providing the necessary tools for developers to build features. With no dependencies on external libraries and an explicit focus on supporting the PyTorch community, the forward-looking Kleidi layer forms the basis of Arm’s ever-expanding AI and ML toolkit. Unleashing Kleidi in the AI world allows developers to create solutions that work and build on those successful techniques as hardware and software continue introducing cutting-edge features.

Announced after the Kleidi libraries but meant for different partners, the CSS for Client framework defines the physical specifications of Arm’s latest CPUs and GPUs on the nominal 3nm fabrication node. It’s distributed to third-party designers like Qualcomm and MediaTek to minimize research and development costs, ultimately speeding deployment.

Arm’s dedication to advanced, on-device parallel processing

Solid foundations lead to stable structures

A graphic pointing out the benefits of SVE2 implementation

Early in our chat, Bergey mentioned features that stand out among Arm’s efforts. He circled back to SVE2 when I asked about discrete examples of Arm’s innovation.

The second iteration of the Scalable Vector Extension allows compatible chips to execute a single instruction on multiple data points, a technique allowing the most effective parallel processing possible. Instead of being limited to SVE’s smaller vector widths, SVE2 spans from 128 to 2048 bits, in multiples of 128. This allows for easier programming, greater compatibility, and parallel performance increases when correctly implemented.

The power of transistor-level privacy

I asked Mr. Bergey how Arm technology, which often interacts with transistors, makes an industry-wide difference in security. “Think of Arm as creating the fundamental building blocks and working explicitly to keep CPUs and platforms secure.” We aren’t talking about tracking cookie security but potentially exploitable hardware and ISA idiosyncrasies that weaken otherwise resilient systems.

A brightly colored security camera mounted to a wall

Android security patches don’t matter as much as you think

You’re not that screwed when they stop

As background, Bergey began, “As you probably know, over 60% of software security holes are due to memory buffer overruns.” A buffer overrun occurs when memory is written outside the addresses to which it was assigned, making the contained data vulnerable to interception or manipulation. That 60% figure is high because there isn’t an inherent mechanism for preventing or flagging overruns.

A simple diagram explaining buffer overrun

Source: Cobalt.io

He went on, “That’s one of the things the v9 architecture has been very focused on. In fact, we were the first platform to widely deploy MTE” or the Memory Tagging Extension. “MTE puts a specific tag on the memory to make sure it can’t spill over the designated partition edges.”

Bergey noted that Arm deployed MTE with the Unified Architecture developed by OPC, an industrial electronics task force formed in 2006. He continued, highlighting, “Google-enabled developer MTE access in Android 14, and some chip producers have enabled it at an OEM level.” While MTE is an order of magnitude removed from user-level security measures like two-factor authentication, it’s still closely involved in protecting sensitive data from exploits.

Source: Arm

What are the current Senior VP’s feelings? “I do believe in the power of on-device AI, and privacy is clearly one of the reasons.”

Impacts of endogenous efficiency

Arm has remained laser-focused on low-power computation since its first microchip. The 6MHz, 0.1W, plastic-encased ARM V1 was cheap, energy-efficient, and thermally negligible out of necessity. It also worked perfectly when the team first fired it up. In benchmarks, it trounced a similarly clocked Intel chip by a factor of 10 while matching a 32-bit Motorola implementation clocked at 17MHz.

Source: Computing History UK

Arm’s first microchip, the Arm V1.

But this isn’t a history lesson. Ars Technica has that covered. I’m pointing out Arm’s ultra-lightweight history because some things never change. What do Nokia, Apple, Nintendo, Google, Amazon, Qualcomm, and Nvidia have in common? Arm’s efficient microchips and ISAs have proven integral to at least one successful device from each of those iconic tech houses.

“The idea of low-power computing has really proliferated everywhere,” said Bergey. “We now have a very significant footprint in data center and cloud companies like AWS, with its Graviton line of CPUs based on Arm implementations.

Project Kuiper’s tech resembles Starlink’s, but its $2 trillion backer changes the game

An Amazon win looks different than you think

“Every large cloud computing company has announced that they are offering ARM platforms.” Bergey went on, “And why is that? Arm has these very power-efficient designs.” Earlier in our talk, Bergey referenced the current nuclear-powered AI discourse, which I’d been eager to discuss. “We’ve already talked about the importance of power in the data center. Arm managed to put 128 CPU cores in a single processor with a TDP of 250W.” That’s twice the thermal design power of two 14th-gen Intel Core chips, which combine for just 32 cores.

Influencing advancement from every side

A slide outlining details of Arm's CSS for Client updates

As fascinating as Arm’s industry-shaping ISAs and CPUs are, they aren’t the whole story. “There’s a lot of different IP,” Bergey explained, “but if you ask me what the power of Arm is, it’s the 20 million developers. It’s the software ecosystem. It is hard to make great hardware, but it’s even harder to build an AI software ecosystem. And that’s true throughout the history of CPUs.”

Bergey again stressed the developer community’s importance. “We have quite a few different frameworks planned that build on Kleidi, so developers can consistently compile code and have it do the right thing relative to hardware capabilities. That allows developers to take advantage of new architecture features down the road, many of which are AI-based.

An Android smartphone lying on a surface, displaying an update screen with an Android robot icon and a progress bar indicating 46% completion.

What is Android Private Compute Core?

When safety comes first, you last

“AI-based architecture is one level. On another level, we strongly believe a heterogenous approach is right for AI,” Bergey offered, “which is how we view the CPU, GPU, and accelerator, often an NPU in today’s mobile space.” Bergey expounded on Arm’s mission, “We focus on ensuring our CPUs and GPUs run those AI workloads as well as possible. A lot of developers really value the breadth of the CPU across the hardware ecosystem.

“Everything targets the CPU to start with,” Bergey clarified. “But as far as AI apps, 70% of them basically stay running on the CPU.” At this point, we dug into a couple of topics I’ve wondered about for months.

AI is inside basically everything

Expanding models will continue inspiring specialized hardware

A graphic introducing Arm's new 2024 CPU cores

Going deeper into Arm’s heterogenous approach, Bergey acknowledged that manufacturers’ AI tools often leverage the NPU and GPU, but he maintained the CPU’s significance. “It’s really about giving developers multiple ways to expand their approach to a solution and access to performance. Some methods are heavily compute-bound, some memory-bound, and some dependent on whether you’re building a camera app with novel filters, a large language model, or something multimodal.”

I brought up focused hardware inclusions like AI-only memory partitions. Broadly and without reference to specific companies’ components, Bergey outlined the impact on hardware decisions that increased parallel processing causes. “It’s all about the model. Say you start with 5 billion parameters, quantize those to 4 bits, and end up needing 2.5GB of memory to load it. That’s a significant chunk of DRAM, and is it static? Not really. Loading that model back and forth and actually running it heavily exercises the DRAM bus due to transactions with the CPU.

The real problem we have with the Pixel 9 Pro’s AI-only RAM

It’s fine for now, but nothing is forever

“Keeping that up comes down to power management. Is it better to hold the model in non-volatile flash storage? How often do you bring it back into the memory stack? Is it a one-model-fits-all scenario, or will you swap in different models? One company’s model works for most developers until somebody wants to introduce their own. So, the models themselves drive large amounts of DRAM.”

Bergey continued, “That’s actually one of the limiting aspects. Think about all the talk of AI in the flagship space, which is great. There are amazing things from the likes of Google and Samsung, and different options in China, and for different Android configurations.

“Now think about that cascading down to lower price tiers. The computing elements still exist, but we’re making the CPU super capable. That way, you don’t necessarily need to use a [high-end] 40 TOPS NPU because you probably can’t afford it. Maybe you use a small NPU or run everything on the CPU and GPU. But you still need to increase your DRAM footprint, and in certain geographies, DRAM is already seeing that.

two phones running google gemini app and gemini live on a laptop keyboard

Google is already delivering on the Apple Intelligence promise

Android takes the lead with AI

“Thinking about the future of AI on handsets, there are both computing and memory elements, and that’s not just an Android-specific thing. Even as other mobile operating systems move forward with these features and tie them to certain models, they’re increasing their generic memory stacks, even though they might typically ship with much less memory.”

AI techniques distinctly advance the industry’s focus on certain hardware decisions and will keep doing so.

What is and is not really AI

And how consumers could tell the difference

A pair of robot arms reaching to touch fingertips

Tools like camera imaging algorithms and language translators have used machine learning components for years. In that light, I pressed Mr. Bergey on what features leverage novel AI techniques (versus giving machine learning lip service) and how machine learning of the past differs fundamentally from today’s AI.

“That’s a tough one,” Bergey admitted, offering, “Here’s my perspective. Everybody asks things like, ‘What’s the best AI app?’ or ‘How much do you use AI each day?’ Maybe you have specific tools on your phone, like a ChatGPT client, that you can clearly identify as an AI app.

A collage of five images surrounding the word

Google Veo: The ultrarealistic AI video generation tool explained

Close-ups, panning, and all the film-school tricks at your fingertips

“For a lot of the various applications of smartphones today, though, AI is only just starting to become part of the process. Cameras used it for a long time, whether it was picking the best burst shot, or using the Magic Eraser, or automatically superimposing graphics. Those were the first set of tools that I’d say brought AI into the phone arena.

A stylized picture of a crowd member filming a live performance

Source: Arm

“The large language models and multimodals get into the more creative elements, but that doesn’t necessarily mean these are only AI apps. Transcribing audio is a great example. It used to be just literal transcription, but now you can ask for a 300-word summary or just the main points. The AI exists in the form of better recommendations on varied questions around data.”

google-io-2024-shoreline-sundar-gemini-1

I saw Google’s ‘vision for the future of AI assistants’ at I/O, and I’m cautiously optimistic

Project Astra is set to bring real-time multimodal input to Gemini

Bergey momentarily pivoted away from software. “AI impacts actually go all the way down to the modem. Looking at 6G standards, AI is starting to occupy space that used to be only typical signal processing. Both on the transmitter side and the handset’s 6G modem, you’ll see AI used alongside more traditional methods.”

A screenshot from video of Arm's Chris Bergey speaking at MWC 2022

Source: Arm via YouTube

I watched Mr. Bergey build the answer I had long sought in real time. “I think, right or wrong, AI will continue to be more ubiquitous, even as just a service running in the background. Look at the evolution of GPUs. Previously, you would always ultimately rasterize the content somehow. Even frame interpolation is pretty stochastic compared to how rendering always used to work. Now, with AI, we can rasterize even fewer frames and better approximate the direction something’s moving in the game or engine. So you’ll see more AI in graphics.”

Gemini Live running on the Google Pixel 9 Pro XL

I’m convinced AI will take over, but not in the way you think

We’re outsourcing people skills

“And gaming is getting huge, to the point that the agents in games are actually AI-enabled, and it makes a difference. It’s not just a dumb algorithm that keeps running an enemy into a corner. And you can tell. The AI-enabled character can figure out what moves you made before and try something different this time.

“In the next several years, we’ll see more AI-derived computation techniques because it just makes sense from an energy efficiency point of view. It’s really just infiltrating everywhere, good or bad.”

The Arm-powered Fujistu A64FX supercomputer

Source: Arm

With that, I knew I’d found my answer. For better or worse, AI is infiltrating everything, influencing decision-making processes, increasing efficiency, and integrating with traditional computation. Until I spoke with Mr. Bergey, I didn’t realize I’d been asking the wrong question.

Instead of asking what AI is, ask, “What is AI doing?”

And how developers are using it

In one sense, Bergey’s thought-provoking perspective implies that the term “AI” is getting even more nebulous and hard to pin down. Rather than a discrete discipline with separate uses from previous techniques, AI is another tool in developers’ deep toolbox. AI will continue to play an increasingly important and inextricable role in all development.

We can’t predict where AI is headed, but the wheel is turning, and it won’t slow down. One thing is certain. Arm’s innovation lies at the heart of today’s bleeding-edge computational techniques and those of the future. As Bergey mused, “It all starts with the model.” As data sets are parsed, parameters quantized, algorithms trained, and the system written to memory banks, raw performance and register-transfer level communication play inescapable roles.

AI is infiltrating everything, influencing decision-making processes, increasing efficiency, and integrating directly with traditional computation.

That’s where you’ll find Arm, the patent-heavy, talent-rich conductive layer between hardware and software. By directing powerful, efficient communications, it brings AI into the mainstream and lays the foundations for the next generation of computing, but in more meaningful ways than an LLM chatbot.

source