Wed. Jun 12th, 2024

[ad_1]

Arm Cortex A925

Mobile chipset development continues to advance at a brisk pace, bringing us superior gaming performance, accelerating the latest AI features, and more power-efficient PCs. Arm, one of the companies charting the course, has announced its 2024 selection of CPU and GPU cores to power these growing use cases.

Some (but not all) of 2025’s next-generation top-tier smartphones will be powered by Arm’s newly announced cores. Arm has been giving out fewer details on its CPU and GPU technologies in recent years, but let’s examine the announcements in closer detail to see what we can expect.

The big one: Arm Cortex-X925 core

The flagship CPU in Arm’s 2024 portfolio is the powerhouse Arm Cortex-X925. Despite the name change, this is the direct successor to last generation’s Armv9.2 Cortex-X4 found in processors like the Qualcomm Snapdragon 8 Gen 3. We had anticipated this core to be called the Cortex-X5, but Arm has changed the moniker to match other products in this year’s portfolio.

Headline figures for the Arm Cortex-X925 include a 15% higher performance IPC improvement over the Cortex-X4. This extends to 36% once the gains from moving to 3nm manufacturing, higher clock speeds in excess of 3.6GHz, and larger caches are factored in.  AI performance sees even bigger potential gains, running some models 46% faster on the CPU than the X4. The bottom line is that single-core CPU capabilities will see a significant uplift next-gen.

Cortex-X925 Arm Cortex-X4 Arm Cortex-X3 Arm Cortex-X2

Peak clock speed

Cortex-X925

~3.6GHz

Arm Cortex-X4

~3.4GHz

Arm Cortex-X3

~3.25GHz

Arm Cortex-X2

~3.0GHz

Decode Width

Cortex-X925

10 instructions

Arm Cortex-X4

10 instructions

Arm Cortex-X3

6 instructions
(8 mop)

Arm Cortex-X2

5 instructions

Dispatch Pipeline Depth

Cortex-X925

10 cycles

Arm Cortex-X4

10 cycles

Arm Cortex-X3

11 cycles for instructions
(9 cycles for mop)

Arm Cortex-X2

10 cycles

OoO Execution Window

Cortex-X925

1,500
(2x 750)

Arm Cortex-X4

768
(2x 384)

Arm Cortex-X3

640
(2x 320)

Arm Cortex-X2

448
(2x 288)

Execution Units

Cortex-X925

(assumed)
6x ALU (some 2-cycle)
2x ALU/MAC
2x ALU/MAC/DIV

3x Branch

Arm Cortex-X4

6x ALU
1x ALU/MAC
1x ALU/MAC/DIV

3x Branch

Arm Cortex-X3

4x ALU
1x ALU/MUL
1x ALU/MAC/DIV

2x Branch

Arm Cortex-X2

2x ALU
1x ALU/MAC
1x ALU/MAC/DIV

2x Branch

Architecture

Cortex-X925

ARMv9.2

Arm Cortex-X4

ARMv9.2

Arm Cortex-X3

ARMv9

Arm Cortex-X2

ARMv9

The gains from 3nm are an important part of the performance uplift expected for this generation. Arm has worked extensively to optimize its design for its partners on both FinFET and GAA processes (aka TSMC and Samsung). That leaves the 15% like-for-like improvement over the previous model, which comes down to several key changes in the X925’s microarchitecture.

In the processing core, for example, the X925 now has six SIMD units (the powerful number crunchers that batch compute floating point math and AI workloads) up from four, allowing them to do more heavy math in parallel.  This likely accounts for most of the core’s AI/ML performance boost. There’s also an additional integer multiply unit and extra floating point compare unit, which again increases the core’s sheer number-crunching capabilities when fully fed. Arm is reluctant to discuss die area size these days, but the X925 must be getting pretty big.

Arm Client 2024 CPU Reference Cluster

Robert Triggs / Android Authority

Another interesting change is that some of the ALUs have been switched to dedicated 2-cycle instruction versions. This helps avoid stalls in the regular 1-cycle units but presumably means that these ALUs can’t perform some of the simpler arithmetic. This seems like the sort of design change that only intricate use-case data would allude to.

Instruction dispatch remains 10-wide, but Arm has doubled the X925’s maximum number of instructions in flight, now a colossal 1,500. Likewise, there’s twice the L1 instruction cache bandwidth and double the L1 instruction lookup table size to speed up instruction fetching. Meanwhile, the backend consists of an extra load pipeline to bring more data in from memory. In other words, there are plenty of out-of-order instructions floating around to keep those number-crunching cores busy.

That’s a lot of jargon, but the themes are very familiar from previous years— an ever-wider front end feeding an increasingly insatiable execution engine. In that sense, the X925 is an update to the X4 rather than a wholesale redesign. Even so, performance will take a solid leap forward again in 2025, though a fair chunk of the benefits also come from the move to 3nm.

Power-efficient Arm Cortex-A725 and A520

Sadly, Arm hasn’t provided as many details about the equally important Cortex-A725 — the new middle core that’ll form the backbone of upcoming mobile SoCs.

Arm claims that the A725 is 25% more efficient than the A720 and offers the option for higher peak performance if required. Again, though, this implies the move to 3nm, and Arm hasn’t given us a standard metric for IPC performance gains. However, it claims a 20% boost to L3 traffic, which helps realize some extra performance.

On the microarchitecture level, Arm increased the re-order buffer and instruction issue queue sizes, improving throughput. A new 1MB L2 cache configuration also allows the core to reach a higher performance level. But if that’s it, the A725 is a minor revision of the A720, which was already an optimization of 2022’s A710 core.

Arm Cortex A725 efficiency graph

Robert Triggs / Android Authority

This leads us to the refreshed Cortex-A520, certainly the least exciting model in this year’s CPU trio. The core architecture remains unchanged. Instead, Arm has optimized the A520 footprint for upcoming 3nm processes, resulting in 15% energy-efficient gains.

Looking at Arm’s power efficiency curves, this generation has an even greater crossover between the Cortex-A725 and A520. While the A520 can still reach the very lowest power levels for standby and low-clock tasks, the A725 can deliver vastly more performance for the same power as a maxed-out A520. In other words, many tasks run much faster and just as efficiently on the A725. It’s little wonder that Arm’s 2024 reference design suggests just two A520s, further reducing the number of small cores from what we see in current-generation chipsets.

Vastly improved gaming with the Immortalis G925

Arm continues to upgrade its GPU line-up too, with the Immortalis G925, Mali G725, and Mali G625. As with last year’s range, silicon partners need to use a larger core count to ensure robust ray tracing performance and leverage the Immortalis branding. Ten to 24 cores, up from 16 last gen, is classed as Immortalis, six to nine for a G725 implementation, and one to five cores for a budget G625 setup.

Regardless of the configuration, each G925 core promises a 30% reduction in power consumption when built on 3nm, up to 37% improved performance, and a whopping 52% gain in ray racing over last-gen’s Immortalis G720. That last metric has a big caveat: it requires developers to leverage new APIs to designate targets as “intricate objects,” which the G925 then traces with reduced fidelity. Think leaves or grass that is very expensive to compute individually but that players won’t notice if ray traced at lower accuracy. It’s a neat idea, but entirely dependent on developers knowing about and then coding for.

Arm Immortalis G925 Performance

In real-world games, Arm is claiming even more significant gains with 14 Immortalis G925 cores versus 12 older G720. Of course, that’s not a like-for-like comparison, so take it with a pinch of salt. But giving Arm the benefit of the doubt, I’d guess that you can fit 14 G925 cores in the space of 12 of the previous G720s, but that’s entirely my speculation.

Still, for just two more cores, Arm touts a 72% performance improvement in Call of Duty, 49% in Genshin Impact, 46% in Diablo Immortal, and a 29% gain in Fortnite. The key likes in the core’s new Fragment Prepass technique. The TLDR is that this vastly improves hidden object culling (think a player or object hidden behind a wall), reducing CPU load for these big performance gains. Games with complex geometry benefit most, hence the performance differences between CoD and Fortnite.

If you want a more in-depth explanation, Arm has replaced the traditional Z-buffer Hidden Surface Removal (HSR) technique, like forward pixel kill or primitive re-ordering, with its fragment prepass technology. The key difference is that it removes the need to re-order the Z-buffer (depth buffer) to make culling decisions, reducing driver CPU cycles by up to 43% per thread. This is all done in hardware, meaning there is no overhead for developers, but it doesn’t benefit all games equally.

What about AI?

Arm Immortalis G925 Machine Learning

No 2024 announcement is complete without AI, and Arm had a fair bit to say here despite not having a dedicated AI accelerator to augment its more traditional CPU and GPU parts. Instead, Arm is banking on the more developer-friendly and universal appeal of the CPU and, to a lesser extent, the GPU to tout its AI capabilities.

For instance, Arm points out that most third-party AI Android apps run on the CPU rather than an accelerator, as few have invested the development resources to support the numerous SoC API platforms. In lieu of a more universal API, Arm is banking on the CPU to remain an essential component for AI. That said, this is much easier to say when you don’t have skin in the mobile AI accelerator market.

Still, Arm has some performance numbers to trot out here. The Arm Cortex-X925 boasts a 42% faster time to first token with an 8-billion LLaMA 3 model and 46% faster for a 3.8 billion Phi 3 model. AI CPU inference is also up 59% compared to the Cortex-X4, with GPU inference capabilities receiving a 36% boost over last year’s reference platform. Similarly, the new GPU (in a 14-core versus 12-core configuration) is up to 50% faster in natural language processing, 41% faster in image segmentation, and 32% faster for speech-to-text.

Those are all very welcome improvements to help make AI apps more responsive, but it’s worth remembering that neither a CPU nor a GPU is as fast and efficient as a dedicated AI accelerator.

What to expect from next-gen products

Samsung Galaxy S24 homescreen in hand

Robert Triggs / Android Authority

Arm’s next-gen cores are destined for 2025 flagship smartphones, with Samsung and MediaTek likely to be the biggest mobile silicon vendors to leverage these cutting-edge technologies. Qualcomm is moving to a new custom CPU core for the Snapdragon 8 Gen 4, which means that the majority of flagship Android phones in 2025 probably won’t use Arm Cortex-X925 or Immortalis-G925.

Likewise, the upcoming major wave of Windows on Arm laptops are all powered by Qualcomm’s Snapdragon X Elite platform. Again, this platform uses custom CPU cores rather than Arm’s Cortex. Arm didn’t have much to say about specific plans for Arm-based PCs, likely given Qualcomm’s exclusivity deal with Microsoft, which is rumored to end in 2024. Still, it’s entirely possible that we might see other silicon vendors use Arm Cortex-X cores, quite possibly the new X925, for rival chipsets at some point in 2025. For instance, Arm envisions a PC chip with up to 12 Cortex-X925 CPU cores to push performance well beyond mobile.

Although Arm announced its latest client technologies in the first half of the year, partner chipsets will be announced near the end of 2024,  at the earliest. Smartphones powered by the Cortex-X925 and/or Immortalis-G925 are expected to land in consumer hands in early 2025.

[ad_2]

Source link

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *