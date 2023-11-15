The rumors are true: Microsoft has created its own custom AI chip that can be used to train large language models and potentially avoid costly reliance on Nvidia. Microsoft has also created its own Arm-based CPUs for cloud workloads. Both custom silicon chips are designed to power its Azure data centers and prepare the company and its enterprise customers for an AI-filled future.

Microsoft’s Azure Maia AI chip and Arm-powered Azure Cobalt CPUs are coming in 2024, boosting demand this year for Nvidia’s H100 GPU, which is widely used to train and operate generative image tools and large language models. is used to. These GPUs are in such high demand that some have even fetched prices over $40,000 on eBay.

“Microsoft has a really long history in silicon development,” Rani Borkar, head of Azure hardware systems and infrastructure at Microsoft, explained in an interview. the verge, Microsoft collaborated on silicon for the Xbox more than 20 years ago and even co-engineered chips for its Surface devices. “These efforts build on that experience,” Borkar says. “In 2017, we began architecting a cloud hardware stack and we began that journey while keeping us on track to build our new custom chips.”

The new Azure Maia AI chip and Azure Cobalt CPUs are both built in-house at Microsoft, combined with a deep overhaul of its entire cloud server stack to optimize performance, power, and cost. “We are rethinking cloud infrastructure for the age of AI, and optimizing virtually every layer of that infrastructure,” says Borkar.

The first two custom silicon chips designed by Microsoft for its cloud infrastructure. Image: Microsoft

The Azure Cobalt CPU, named after the color blue, is a 128-core chip built on the Arm Neoverse CSS design and optimized for Microsoft. It is designed to power common cloud services on Azure. “We put a lot of thought into not only making it highly performant, but also making sure we were conscious of power management,” explains Borkar. “We made some very deliberate design choices, including the ability to control performance and power consumption per core and on each virtual machine.”

Microsoft is currently testing its Cobalt CPUs on workloads such as Microsoft Teams and SQL Server, with plans to make virtual machines available to customers for a variety of workloads next year. Although Borkar will not be directly compared to Amazon’s Graviton 3 servers that are available on AWS, there should be some noticeable performance advantages over the Arm-based servers that Microsoft is currently using for Azure. “Our initial testing shows that our performance is currently 40 percent better than our data centers that use commercial Arm servers,” says Borkar. Microsoft is not sharing full system specifications or benchmarks yet.

Microsoft’s Maia 100 AI accelerator, named after the bright blue star, is designed to run cloud AI workloads such as large language model training and inference. It will be used to power some of the company’s largest AI workloads on Azure, including part of a multibillion-dollar partnership with OpenAI, where Microsoft powers all of OpenAI’s workloads. The software giant is collaborating with OpenAI on the design and testing phases of Maia.

“When Microsoft first shared their design for the Maia chip, we were excited and we have worked closely with them to refine and test it with our models,” says Sam Altman, CEO of OpenAI. “Azure’s end-to-end AI architecture, now optimized to silicon with Maia, paves the way to train more capable models and make those models cheaper for our customers.”

Built on a 5-nanometer TSMC process, Maia has 105 billion transistors – about 30 percent fewer than the 153 billion found on AMD’s own Nvidia competitor, the MI300X AI GPU. “Maya supports our first implementation of sub 8-bit data types, MX data types, to co-design hardware and software,” says Borkar. “This helps us support faster model training and inference times.”

Microsoft is part of a group that includes AMD, Arm, Intel, Meta, Nvidia and Qualcomm that is standardizing next-generation data formats for AI models. Microsoft is working on the collaborative and open work of the Open Compute Project (OCP) to adapt the entire system to the needs of AI.

A probe station used to test Microsoft’s Azure Cobalt system-on-chip. Image: Microsoft

“Maya is the first fully liquid-cooled server processor built by Microsoft,” explains Borkar. “The goal here was to enable a higher density of servers at higher efficiency. Because we’re reimagining the entire stack, we intentionally thought about every layer, so these systems are actually going to fit into our current data center footprint.

It is important for Microsoft to get these AI servers up and running more quickly without having to take up space in data centers around the world. Microsoft created a unique rack to hold Maia server boards, complete with a “Sidekick” liquid chiller, which works like a radiator you’d find in your car or to cool the surface of Maia chips. A fancy gaming PC.

As well as sharing MX data types, Microsoft is also sharing its rack designs with its partners so they can use them on systems with other silicon. But the Maia chip designs won’t be shared more widely, with Microsoft keeping them in-house.

Maia 100 is currently being tested on GPT 3.5 Turbo, the same model that powers ChatGPT, Bing AI workloads, and GitHub Copilot. Microsoft is in the early stages of deployment and, like Cobalt, is not yet ready to release exact Maia specifications or performance benchmarks.

Maia 100 server rack and “Sidekick” cooling. Image: Microsoft

This makes it difficult to understand how the Maia will compare to Nvidia’s popular H100 GPU, the recently announced H200, or even AMD’s latest MI300X. Borkar didn’t want to discuss the comparisons, instead reiterating that partnerships with Nvidia and AMD are still very important to the future of Azure’s AI cloud. Borkar says, “At the scale at which the cloud operates, it is really important to optimize and integrate every layer of the stack, maximize performance, diversify the supply chain, and obviously give our customers infrastructure options. ”

Diversification of supply chains is important for Microsoft, especially since Nvidia is the dominant supplier of AI server chips right now and companies are rushing to buy these chips. Estimates suggest that OpenAI needs more than 30,000 of Nvidia’s older A100 GPUs to commercialize ChatGPT, so Microsoft’s own chips could help lower the cost of AI for its customers. Microsoft also developed these chips for its own Azure cloud workloads, rather than selling them to others like Nvidia, AMD, Intel, and Qualcomm.

“I see it as a complement, not a competition with them,” Borkar insists. “Today we have both Intel and AMD in our cloud compute, and similarly on AI we are announcing AMD. Where we already have Nvidia today. These partners are very important to our infrastructure, and we really want to give our customers options.”

You may have seen the Maya 100 and Cobalt 100 nomenclature, which suggests that Microsoft is already designing second-generation versions of these chips. “It’s a series, it’s not just 100 and it’s over… but we’re not going to share our roadmaps,” says Borkar. It’s not yet clear how often Microsoft will release new versions of Maia and Cobalt. version will deliver, but given the speed of AI I wouldn’t be surprised to see a Maia 100 successor arrive at a similar pace to Nvidia’s H200 announcement (about 20 months).

The key thing now will be how quickly Microsoft implements Maia to accelerate the rollout of its broader AI ambitions, and how these chips will impact pricing for access to AI cloud services. Microsoft isn’t ready to talk about this new server pricing just yet, but we’ve already seen the company quietly launch its Copilot for Microsoft 365 at a premium of $30 per user, per month.

Copilot for Microsoft 365 is currently limited to Microsoft’s largest customers only, with enterprise users having to commit to at least 300 users to join the roster of its new AI-powered Office assistant. As Microsoft moves forward with even more Copilot features and a Bing Chat rebranding this week, Maia may soon help balance demand for the AI ​​chips that power these new experiences.

