The Ampere Altra Review: 2x 80 Cores Arm Server Performance Monster
by Andrei Frumusanu on December 18, 2020 6:00 AM EST- Posted in
- Servers
- Neoverse N1
- Ampere
- Altra
As we’re wrapping up 2020, one last large review item for the year is Ampere’s long promised new Altra Arm server processor. This year has indeed been the year where Arm servers have had a breakthrough; Arm’s new Neoverse-N1 CPU core had been the IP designer’s first true dedicated server core, promising focused performance and efficiency for the datacentre.
Earlier in the year we had the chance to test out the first Neoverse-N1 silicon in the form of Amazon’s Graviton2 inside of AWS EC2 cloud compute offering. The Graviton2 seemed like a very impressive design, but was rather conservative in its goals, and it’s also a piece of hardware that the general public cannot access outside of Amazon’s own cloud services.
Ampere Computing, founded in 2017 by former Intel president Renée James, built upon initial IP and design talent of AppliedMicro’s X-Gene CPUs, and with Arm Holdings becoming an investor in 2019, is at this moment in time the sole “true” merchant silicon vendor designing and offering up Neoverse-N1 server designs.
To date, the company has had a few products out in the form of the eMAG chips, but with rather disappointing performance figures - understandable given that those were essentially legacy products based on the old X-Gene microarchitecture.
Ampere’s new Altra product line, on the other hand is the culmination of several years of work and close collaboration with Arm – and the company first “true” product which can be viewed as Ampere pedigree.
Today, with hardware in hand, we’re finally taking a look at the very first publicly available high-performance Neoverse based Arm server hardware, designed for nothing less than maximum achievable performance, aiming to battle the best designs from Intel and AMD.
Mount Jade Server with Altra Quicksilver
Ampere has supplied us with the company’s server reference design, dubbed “Mount Jade”, a 2-socket 2U rack unit sever. The server came supplied with two Altra Q80-33 processors, Ampere’s top-of-the-line SKU with each featuring 80 cores running at up to 3.3GHz, with TDP reaching up to 250W per socket.
The server was designed with close collaboration with Wiwynn for this dual socket, and with GIGABYTE for the single socket variant, as previously hinted by the two company’s announcements of leading hyperscale deployments of the Altra platforms. The Ampere-branded Mount Jade DVT reference motherboard comes in a typical server blue colour scheme and features 2 sockets with up to 16 DIMM slots per socket, reaching up to 4TB DRAM capacity per socket, although our review unit came equipped with 256GB per socket across 8 DIMMs to fully populate the chip’s 8-channel memory controllers.
This is also our first look at Ampere’s first-generation socket design. The company doesn’t really market any particular name to the socket, but it’s a massive LGA4926 socket with a pin-count in excess of any other commercial server socket from AMD or Intel. The holding mechanism is somewhat similar to that of AMD’s SP3 system, with a holding mechanism tensioned by a 5-point screw system.
The chip itself is absolutely humongous and amongst the current publicly available processors is the biggest in the industry, out-sizing AMD’s SP3 form-factor packaging, coming in at around 77 x 66.8mm – about the same length but considerably wider than AMD’s counterparts.
Although it’s a massive chip with a huge IHS, the Mount Jade server surprised me with its cooling solution as the included 250W type cooler only made contact with about 1/4th the surface area of the heat spreader.
Ampere here doesn’t have a recessed “lip” around the IHS for the mounting bracket to hold onto the chip like on AMD or Intel systems, so the actual IHS surface is actually recessed in relation to the bracket which means you cannot have a flat surface cooler design across the whole of the chip surface.
Instead, the included 250W design cooler uses a huge vapour chamber design with a “pedestal” to make contact with the chip. Ampere explains that they’ve experimented with different designs and found that a smaller area pedestal actually worked better for heat dissipation – siphoning heat off from the actual chip die which is notably smaller than the IHS and chip package.
The cooler design is quite complex, with vertical fin stacks dissipating heat directly off the vapour chamber, with additional large horizontal fins dissipating heat from 6 U-shaped heat pipes that draw heat from the vapour chamber. It’s definitely a more complex and high-end design than what we’re used to in server coolers.
Although the Mount Jade server is definitely a very interesting piece of hardware, our focus today lies around the actual new Altra processors themselves, so let’s dive into the new Q80-33 80-core chip next.
148 Comments
View All Comments
Wilco1 - Monday, December 21, 2020 - link
Why would they introduce Graviton if it would run at a loss??? A significant percentage of AWS is already Graviton (probably 20% by now). If anything Graviton increases profitability due to vertical integration and other cost reduction.mode_13h - Monday, December 21, 2020 - link
First, there's a fundamental disparity between an in-house CPU and a 3rd Party one, where Amazon can cut out some overheads by building their own. So, that already skews the price-comparison.The other question is whether Amazon is partially-subsidizing the price of their Graviton2 instances as an incentive to get more people to switch. For a business, the least risky thing is to stay on x86, so Amazon needs to present an immediate and significant cost savings to get people to switch. After they've switched and ARM server cores have had more time to mature, Amazon can charge more and make back a good return on investment.
I obviously don't know if that's what they're doing, but we don't know that it's not. So, you really can't read much into their current pricing. That's all I'm saying.
mode_13h - Sunday, December 20, 2020 - link
Finally, I guess you missed this part, in the discussion of SPECjbb:> One thing that did come to mind immediately when I saw the results was SMT.
> Due to this being a transactional data-plane resident type of workload,
> SMT will undoubtedly help a lot in terms of performance,
> so I tested out the EPYC chip figures with SMT disabled,
> and indeed max-jOPS went down to 209.5k for the 2S THP enabled results,
> meaning that SMT accounts for a 29.7% performance benefit in this benchmark.
...
> It’s generally these kinds of workloads that SMT works best on,
> and that’s why IBM can deploy SMT4 or SMT8 processors,
> and the type of workloads Marvell’s ThunderX was trying to carve a niche or itself with SMT4.
mode_13h - Sunday, December 20, 2020 - link
As the article mentions, Marvell’s ThunderX did support SMT on ARMv8-A.Were SMT's reputation not bruised by all the recent side-channel exploits, perhaps it would be showing up in some of ARM's own cores. Maybe their V-series will get it, since that's a much larger core.
Wilco1 - Monday, December 21, 2020 - link
ThunderX2/X3 and Neoverse E1 have SMT, but neither has been hugely successful. SMT doesn't provide a significant benefit across a wide range of workloads, so adding another core remains simpler and cheaper. And yes, security is another nail in the coffin.EthiaW - Saturday, December 19, 2020 - link
The performance of Graviton2 meets our expectation for Neoverse N1 (or Cortex A76) better. How can Q80 manage to deliver so much higher IPC with the same architecture? Incredible.Brutalizer - Saturday, December 19, 2020 - link
One old Oracle SPARC T8 cpu does 153.500 Java max-JOPS SPECjbb2015. And the crit-JOPS value is 90.000. Easily smashing all cpus here.https://blogs.oracle.com/bestperf/specjbb2015:-spa...
satai - Saturday, December 19, 2020 - link
Benchmarked by Oracle... Definitely trustworthy.zepi - Saturday, December 19, 2020 - link
SPECJBB graphs kill me.For the love of god, please keep the axis scaling identical!
Same applies to every single metric always. If you provide separate graphs for different products, please make sure that axis-scaling is the same in all images!
Andrei Frumusanu - Sunday, December 20, 2020 - link
The graphs are generated by the benchmark itself.