Arm's New Mali-G77 & Valhall GPU Architecture: A Major Leap
by Andrei Frumusanu on May 27, 2019 12:00 AM ESTAlong today’s announcement of the new Cortex-A77 CPU microarchitecture, the arguably bigger announcement is Arm’s unveiling of the new Valhall GPU architecture and the new Mali-G77 GPU. It’s been three years since the unveiling of the Bifrost architecture, and as the industry and workloads continue to evolve, so must the company’s GPUs.
Valhall and the new Mali-G77 follow up on the last three generations of Mali GPUs with some significant improvements in performance, density and efficiency. While last year’s G76 introduced some large changes to the compute architecture of the execution engines, the G77 goes a lot further and departs from Arm’s relatively unusual compute core design.
A look back at Bifrost – third time’s the charm
It’s not too big of a secret that the last few years haven’t been very kind to Arm’s GPU IP offerings. When the first Bifrost GPU - the Mali-G71 was announced back in 2016 and productised later that year in the Kirin 960 and Exynos 8895, we had expected good performance and efficiency gains.
Bifrost was Arm’s first scalar GPU architecture, departing from the previous generation’s (Midgard: T-600, 700 & 800 series) vector instruction design. The change was fundamental and akin to what we saw desktop GPU vendors like AMD and Nvidia introduce with their new GCN and Tesla architectures last decade.
Unfortunately the first two generations of Bifrost, the Mali-G71 and subsequent G72 weren’t very good GPUs. Arm’s two leading licensees, HiSilicon and Samsung, both came out with quite disappointing SoCs when it came to their GPUs these two generations. The Kirin 960 and 970 in particular were extremely bad in this regard and I’d argue it had quite a lot of impact on Huawei and Honor’s product planning and marketing.
GFXBench Manhattan 3.1 Offscreen Power Efficiency (System Active Power) |
||||
Mfc. Process | FPS | Avg. Power (W) |
Perf/W Efficiency |
|
iPhone XS (A12) Warm | 7FF | 76.51 | 3.79 | 20.18 fps/W |
iPhone XS (A12) Cold / Peak | 7FF | 103.83 | 5.98 | 17.36 fps/W |
Galaxy 10+ (Snapdragon 855) | 7FF | 70.67 | 4.88 | 14.46 fps/W |
Galaxy 10+ (Exynos 9820) | 8LPP | 68.87 | 5.10 | 13.48 fps/W |
Galaxy S9+ (Snapdragon 845) | 10LPP | 61.16 | 5.01 | 11.99 fps/W |
Huawei Mate 20 Pro (Kirin 980) | 7FF | 54.54 | 4.57 | 11.93 fps/W |
Galaxy S9 (Exynos 9810) | 10LPP | 46.04 | 4.08 | 11.28 fps/W |
Galaxy S8 (Snapdragon 835) | 10LPE | 38.90 | 3.79 | 10.26 fps/W |
LeEco Le Pro3 (Snapdragon 821) | 14LPP | 33.04 | 4.18 | 7.90 fps/W |
Galaxy S7 (Snapdragon 820) | 14LPP | 30.98 | 3.98 | 7.78 fps/W |
Huawei Mate 10 (Kirin 970) | 10FF | 37.66 | 6.33 | 5.94 fps/W |
Galaxy S8 (Exynos 8895) | 10LPE | 42.49 | 7.35 | 5.78 fps/W |
Galaxy S7 (Exynos 8890) | 14LPP | 29.41 | 5.95 | 4.94 fps/W |
Meizu PRO 5 (Exynos 7420) | 14LPE | 14.45 | 3.47 | 4.16 fps/W |
Nexus 6P (Snapdragon 810 v2.1) | 20Soc | 21.94 | 5.44 | 4.03 fps/W |
Huawei Mate 8 (Kirin 950) | 16FF+ | 10.37 | 2.75 | 3.77 fps/W |
Huawei Mate 9 (Kirin 960) | 16FFC | 32.49 | 8.63 | 3.77 fps/W |
Huawei P9 (Kirin 955) | 16FF+ | 10.59 | 2.98 | 3.55 fps/W |
The last iteration of the Bifrost architecture, the Mali-G76 was a more significant jump for Arm and the IP was largely able to resolve some of the critical issues of its predecessors, resulting in relatively good results for the Exynos 9820 and Kirin 980 chipsets.
Unfortunately while Arm was catching up and fixing Bifrost’s issues, the competition didn’t merely hold still and was pushing the envelope. Qualcomm’s Adreno GPU architecture had been leading the mobile landscape for several years now, and even though the Adreno 640 didn’t post quite as impressive improvements this year, it’s still clearly leading Arm in terms of performance, efficiency and density. More worrisome is the fact that Apple’s GPU in the A12 was an absolutely major jump in terms of performance and efficiency, performing massively better than even Qualcomm’s best, not to speak of Arm’s own Mali GPUs.
Introducing Valhall – A major revamp
Today we’ll be covering Arm’s brand-new GPU architecture: Valhall (anglicized version of the old Norse Valhöll, a.k.a. Valhalla). The new architecture brings a brand-new ISA and compute core design that tries to address the major shortcomings of the Bifrost architecture, and looks to be a lot more similar to the design approaches we saw adopted by other GPU vendors.
The first iteration of the Valhall GPU is the new Mali-G77 which will implement all of the architectural and micro-architectural improvements we’ll be discussing today.
What’s being promised is a 30% gain in energy efficiency as well as area density (at ISO-performance & process) and a 60% increase in performance of machine learning inferencing workloads on the GPU.
More interestingly, upcoming end-of-2019 and 2020 SoCs are projected to see a 40% increase in performance over 2019 devices. Next-generation SoCs are projected to have only minor process node improvements, so most of the gains quoted here are due to the architectural and microarchitectural leaps made by the new Mali-G77 GPU.
42 Comments
View All Comments
darkich - Monday, May 27, 2019 - link
40% more performance just from design improvements?That's ridiculous, if true..
spaceship9876 - Monday, May 27, 2019 - link
I really hope they release a Mali-G32 replacement for the G31 with this new architecture, a smaller die with lower power consumption and better performance would be great for entry level phones.KECHEES - Tuesday, May 28, 2019 - link
And come to think of it, The other Mali gpu was fab on 8nm. so given that 7nm euv is supposedly 50% more efficient, we should be looking at a staggering performance improvement that's way above Arm's 40% targetballsystemlord - Monday, May 27, 2019 - link
Spelling and grammar corrections (Hint: have someone read what you're writing so that you don't make so many dumb mistakes)."Valhall and the new Mali-G77 follow up on the last three generation of Mali GPUs with some significant improvements in performance,..."
Missing s:
"Valhall and the new Mali-G77 follow up on the last three generations of Mali GPUs with some significant improvements in performance,..."
"...the new ISA is said to be more compiler friendly and adapted and designed to better aligned with modern APIs such as Vulkan."
Missing "be":
"...the new ISA is said to be more compiler friendly and adapted and designed to be better aligned with modern APIs such as Vulkan."
"Dwelling deeper into the structure of the execution engine,..."
Very awkward, try delving:
"Delving deeper into the structure of the execution engine,..."
"One single has more instances on the primary datapath, and less instances of the control and I-cache,..."
Single what, engine? Maybe "EE"?
"One single EE has more instances on the primary datapath, and less instances of the control and I-cache,..."
"On the hit-path, the texture cache itself has been improved and is now 32KB and is able of 16 texels/cycle throughput."
Missing words, maybe:
"On the hit-path, the texture cache itself has been improved and is now 32KB and is able to process 16 texels/cycle throughput."
"Arm states that fundamentally frequency between the G76 and G77 shouldn't change much at all, an internally Arm still targets an 850MHz sign-off."
"and" not "an"
"Arm states that fundamentally frequency between the G76 and G77 shouldn't change much at all, and internally Arm still targets an 850MHz sign-off."
warreo - Monday, May 27, 2019 - link
Not to say we should excuse journalists for less than stellar writing, but having read his stuff for a long time, with Andrei you have to accept the good (technical expertise) with the "could use improvement" (writing/word choice). There's no one out there that offers the kind of analysis and insights Andrei does, so as a reader I continue to read his articles with great interest and don't let the typos and writing bother me.phoenix_rizzen - Tuesday, May 28, 2019 - link
I don't mind the typos and wording issues and grammar issues ... if this was a blog where the content was written and posted directly by the author.What really bugs me is that Anandtech (and Ars, and other news sites) supposedly have editors on staff, yet these issues still slip through. :( There was a time when articles would pass through two or three stages of proofing to make sure these kinds of things didn't make it to press. But, it seems even for-pay "newspapers" these days are lacking in the QA/proofing department, so there's not much we can expect from for-free news sites. :(
Andrei Frumusanu - Tuesday, May 28, 2019 - link
Thanks for the corrections.eastcoast_pete - Monday, May 27, 2019 - link
As mentioned in my post on Andrei's A77 article, I believe that at least some of these efforts are also to help establish ARM's designs as believable competition in the ultraportable space. With the graphics, that won't apply to Qualcomm, but is vital for Huawei and Samsung, as they rely on ARM-designed GPUs. A hexa- or octacore A77 with 12 or 16 of these might just be able to go head-to-head with Intel's low power chips.Andrei Frumusanu - Tuesday, May 28, 2019 - link
Currently the big issue with Mali and ultra-portable is the fact that Arm has no plans for Windows drivers. Thus aside from ChromeOS devices, they're not really targeting that form-factor as much on the GPU as they are on the CPU (because Qualcomm uses the CPU).eastcoast_pete - Wednesday, May 29, 2019 - link
Andrei, that's an important point. Also shows that MS is not as full-throated in its Windows-on-ARM as they let on. While I believe that some of the existing graphics support in Windows for QC's Adreno House is due to QC doing a lot of the heavy lifting, I don't believe that ARM would say no to a collaborative effort with MS to get MALI supported in Windows.