M1 Ultra deep dive: Workstation performance at a fraction of the power

At its “Peek Performance” event this week, Apple had a lot more in store for us than a new iPhone SE and an extra thick Mac mini otherwise known as Mac Studio. But Apple’s newest desktop Mac is bigger than a Mac mini for a good reason: it’s meant to house and cool the M1 Ultra, a surprise new high-performance chip. Or rather higher performance, as the M1 Max already turned heads.

As with the rest of the M1 line, it’s not just the claimed performance of Apple’s new multi-chip package that impresses. It’s that Apple is doing it all at a fraction of the power of its competitors. With the M1 Ultra, Apple will trade blows with performance PC hardware that uses hundreds of watts of power while consuming way less than 100 watts. It’s a performance-per-watt advantage that AMD, Intel, and Nivida are not likely to catch for some time.

Why buy one M1 Max when you can buy two?

The impressive M1 Max takes the M1 architecture and blows it up. There are 10 CPU cores (eight performance and two efficiency), 32 GPU cores, two media/video engines, a 16-core Neural Engine, all tied to either 32 or 64GB of memory with a crazy-wide interface that provides a stunning 400GB/sec of bandwidth.

It’s also gigantic, at 57 billion transistors. That’s about double the size of a massive consumer GPU like the GeForce GTX 3090. Increasing performance by making a bigger chip would be an economic disaster…it’s a wonder that Apple can sell a 57-billion transistor chip at consumer prices, even at the high end.

So what did Apple do? It designed M1 Max with a really high-speed interconnect, so it can literally slap two of them into the same package, tie them together, and boom: A massive 114-billion-transistor chip with double the performance!

Apple

It’s not that simple, of course. Apple’s interconnect is called UltraFusion, and it puts both dies together in the same package with a massive 2.5TB/sec of bandwidth between them. Apple claims it has twice the interconnection density of any other technology out there. That’s enough speed to make the whole thing look like one big chip to software, and let all the cores on one chip access the memory connected to the other one without limitation.

This is similar to AMD’s chiplet design on a modern Zen processor with its “infinity fabric” connection, only much faster.

All the cores, all the bandwidth

The fact that the M1 Ultra is two M1 Max chips tied together in a single package with a very high-speed interconnect means it basically has double of everything the M1 Max has. That’s 20 CPU cores (16 performance, four efficiency), 64 GPU cores, 32 Neural Engine cores, and 64GB or 128GB of RAM with a mind-blowing 800 GB/sec of bandwidth. That’s many times more bandwidth than the fastest desktop CPUs, and less than only the most expensive thousand-dollar graphics cards.

Apple claims it will greatly exceed the performance of a Core i9-12900K, and use 100W less power when matching its performance.

Apple

It also means double the media processing engines: four instead of two. These are responsible for encoding and decoding ProRes, H.264, AVC, and other common media formats. If you do video production for a living, the M1 Ultra is going to make those big complex 4K video export jobs laughably fast. In fact, Apple says the M1 Ultra can play back 18 simultaneous streams of 8K ProRes 422 video. If you are the kind of video professional who understands what that means, you probably just spit coffee out your nose.

Not just performance, performance per watt

Naturally, you’d expect the M1 Ultra to deliver about double the performance of the M1 Max, and that’s essentially what Apple claims is so. That means you can expect a Geekbench 5 single-thread score still just under 1,800–individual cores have not been made faster, there are just more of them–and a multi-core score around 24,000. That’s around 80 percent faster than the leading consumer desktop processors from AMD or Intel.

To get more performance, you have to look to Intel’s Xeon or AMD’s Threadripper workstation processors, or server CPUs, all of which have many more cores and use hundreds of watts of power.

And that’s really the key to M1 Ultra. Looking at Apple’s charts, it appears to never really draw more than about 100 watts of power. That’s half an Xbox Series X or PlayStation 5, and

The GPU, with 64 cores and 800 GB/sec of bandwidth to work with, is capable of about 21 teraflops, according to Apple. That’s about double a PlayStation 5, and on par with a GeForce GTX 3070 or Radeon 6800 XT. Of course, Apple’s GPU doesn’t have all the same features as Nvidia or AMD’s latest (there’s no ray tracing acceleration, for a start), and teraflops are not the best way to measure GPU performance. But Apple’s chip is consuming a couple of hundred watts less power while delivering this sort of performance.

We’re skeptical about the M1 Ultra matching the GTX 3090’s performance, but the performance-per-watt is probably not even close.

Apple

We’ll have to reserve judgment for first-hand benchmarks, though an unverified Geekbench score that popped up late Tuesday night looks extremely promising. According to the results, the M1 Ultra is within spitting distance of AMD’s Ryzen Threadripper 3990X 64 core processor, which costs as much as the entire Mac Studio. What Apple appears to have delivered here is a chip that delivers the same performance as a workstation-class CPU and high-end gaming GPU at a fraction of the power. It’s almost certain that we’ll see faster chips from Intel or AMD before long, and desktop GPUs already run faster, but it’ll be quite some time before they match this performance while using this little power.

This looks like workstation desktop performance at gaming laptop power levels. It’s time for the Windows PC superfans to go grab their bag of excuses and caveats, again.

Source : Macworld