Why purpose-built artificial intelligence chips may be key to your generative AI strategy

by Phil Goldstein | on

While a great deal of attention and information has recently focused on generative artificial intelligence and the powerful and disruptive possibilities it represents, an important aspect of the technology that bears investigation and understanding is the hardware necessary to facilitate the transformational end results.

If your company is training a new machine learning model, or customizing or refining an existing model with proprietary data, you likely have great interest in the underlying technology used to accomplish that work. The chips that provide the computing power for artificial intelligence models matter a great deal, because these small pieces of silicon are a significant factor in determining how quickly, inexpensively, and sustainably you can train and run your models.

Excitement about the possibilities of generative AI has led to a significant uptick in demand for computing power for generative AI applications and their underlying models. In fact, in the broad market, there is now a shortage of GPUs, chips that are critical for training and running many ML models.

Not only is this kind of computing power now scarce, even if companies can get access to the chips, it can be incredibly expensive to build and use generative AI models and applications. These models rely on billions or trillions of parameters and require massive amounts of data to run and train properly, which can be cost-prohibitive. Increasingly, as more organizations look to use generative AI tools as part of their wider cloud strategies, specialized hardware — silicon chipsets, specifically — will provide greater benefits for those building, training, and running the models that power generative AI applications.

Amazon Web Services has been working to democratize access to generative AI in a number of ways. It is introducing tools like Amazon Bedrock , which allows developers to access foundation models through an API. It has also been designing custom silicon specifically built to train machine learning models and run them so they can draw inferences. These purpose-built chips offer superior price-performance and make it quicker, less costly and less energy-intensive to train and run models that power generative AI.

Amazon Web Services works with major partners including NVIDIA, Intel, and AMD to offer the broadest set of accelerators in the cloud for machine learning and generative AI applications. Amazon Web Services recently announced that Amazon Elastic Compute Cloud (Amazon EC2) P5 instances, tailored for AI and ML workloads, will be powered by the latest NVIDIA H100 Tensor Core GPUs . Amazon Web Services has also invested significantly over the last several years to build AI- and ML-focused chips in-house, including Amazon Web Services Trainium and Amazon Web Services Inferentia .

“I’m seeing custom silicon designed by public cloud providers for general compute, AI training, AI inferencing, storage acceleration, and all levels of networking,” says Patrick Moorhead, the founder, CEO, and chief analyst of research firm Moor Insights & Strategy. “Custom-designed silicon, when architected and executed well, can provide public cloud customers with lower costs, higher performance, and in some cases unique features and time to market, and capabilities drawing less power.”

Moorhead says he sees Amazon Web Services as the current leader in custom silicon produced by public cloud providers, adding that he does not “designate this carelessly at all.”

“If you look at Amazon Web Services’ publicly disclosed custom-designed silicon in market, I believe Amazon Web Services currently has the broadest, deepest, and most differentiated capabilities across general compute, AI, security, and network offload with Graviton, Trainium, Inferentia, and Nitro.”

Amazon Web Services Trainium chips help reduce machine learning training costs

It can take months and cost tens of millions of dollars to train foundation generative AI models, with hundreds of billions of parameters. That’s why Amazon Web Services introduced Amazon Web Services Trainium, which is specifically designed to speed up and lower the cost of training machine learning models by up to 50 percent. Each Trainium accelerator includes two second-generation NeuronCores that are purpose built for deep learning algorithms.

To help customers reduce their time to train and thereby also reduce their cost of training, Amazon Web Services launched Trainium-based Trn1 instances in 2022. These instances supported bandwidth of 800 gigabits per second (Gbps), but as foundation models have grown larger and more complex, they have required more bandwidth for training. In April 2023, Amazon Web Services introduced Trn1n instances , which double the network bandwidth to 1600 Gbps and boost performance for training by 20 percent over the Trn1 instance.

Amazon Web Services Inferentia chips help accelerate the deployment of generative AI applications

In 2018, Amazon Web Services introduced Amazon Web Services Inferentia, its first purpose-built chip for conducting AI and ML which is the process by with AI applications make predictions and decisions in real-time. The chips power Amazon EC2 Inf1 instances designed to provide high performance and cost efficiency for deep learning model inference workloads. Amazon Web Services Inferentia2 chips, the second generation, deliver up to four times higher throughput and up to 10 times lower latency than first-generation Inf1 chips. Earlier this year, Amazon Web Services introduced Inferentia2-based Inf2 instances , which deliver up to 40 percent better price performance than other comparable Amazon EC2 instances when deploying generative AI models.

What’s key is that the Inferentia chips do all this while delivering high performance, including exceptionally high throughput and low latency. This allows customers to run generative AI applications and receive recommendations or generate content nearly instantly, which is valuable for anyone using a generative AI application.

Running generative AI applications is also highly energy-intensive, and Amazon Web Services and its customers also increasingly see sustainability as an important consideration. That’s why Amazon Web Services designed Inf2 instances to deliver better performance per watt over other comparable instances in Amazon EC2, in this case 50 percent better.

Amazon Web Services customers will undoubtedly use a range of different chips for generative AI. But given the opportunities they have to use the technology to transform their business and customer experience, and the cost of training and running the necessary models, the price-performance that Trainium and Inferentia offer is compelling.

Johnson & Johnson, OctoML see AI benefits from purpose-built Amazon Web Services chips

Healthcare products company Johnson & Johnson uses AI and ML algorithms in a wide range of use cases, including accelerating drug discovery, delivering personalized medicine, optimizing supply chain, improving clinical trial outcomes, and streamlining regulatory compliance. The company is exploring using Amazon Web Services custom silicon for a variety of reasons.

Sairam Menon, software engineering manager for the AI product line at Johnson & Johnson, notes that the company’s research and development department processes massive amounts of data, including for genomics, molecular structures, clinical trial results, and more. The company therefore needs to be extremely efficient in how it processes those datasets to help its researchers predict the efficacy of drugs, optimize molecular designs, and identify potential drug candidates.

Additionally, Johnson & Johnson, like other pharmaceutical companies, is subject to strict regulatory requirements. AI and ML tools help it analyze regulatory compliance data by extracting relevant information to help the company quickly identify potential compliance issues and ensure it is adhering to regulatory standards.

“Specialized hardware that is built for AI/ML, such as … neural processing cores, custom field programmable gate arrays, and application-specific integrated circuit accelerators such as Trainium and Inferentia can help us unlock the cost efficiency, performance, and energy efficiency that we look for when we scale our workloads,” Menon says.

The demand for applications using generative AI large language models means that “reducing the operational cost to use the specialized ML accelerators would help us increase the model quality for the same spend, and achieve more with less,” Menon adds.

Johnson & Johnson is also focused on sustainability, Menon says, and notes that advanced and energy-efficient innovations in AI and ML silicon, along with other advanced architectural energy-saving tactics, allows the company to make using AI sustainable and meet its commitments to address climate change.

Some companies that are using customized AI and ML silicon from Amazon Web Services are doing so to help their customers gain access to generative AI models. Through its OctoAI service, OctoML gives developers access to a library of some of the fastest and most cost-effective open-source foundational models available today, including the fastest Stable Diffusion endpoint in the market and the 65 billion-parameter LLaMA language model.

“A big part of how we enable … optimization and cost-effectiveness is by accelerating model performance on a range of hardware targets,” says Jason Knight, cofounder and vice president of ML at OctoML. “This has been especially important as developers deal with the scarcity of the powerful a100 GPU and look for alternatives to run popular models with the latencies that their users require.”

Knight says that customers have been eager to add Amazon Web Services Inferentia2 to the mix of chips they are working with and to understand the price-performance advantages Inferentia can deliver for their models. “Building on Inf2, we believe we can unlock future waves of new developers and new applications building on generative AI, and get closer to a time when any developer can easily and cost-effectively build on generative AI models,” he says.

Why purpose-built chips are important when choosing a cloud provider

Improving the price-performance for compute-intensive workloads while delivering high energy efficiency and ease of use is key to ensuring that more customers can realize the full promise of generative AI technology.

Experts at research firms such as Forrester Research suggest that organizations should consider a cloud service provider’s complete technology stack, as that is where much of the innovation occurs in the public cloud market and where different players’ level of investment varies the most.

Forrester rated Amazon Web Services as a leader on both the “Current Offering” and “Strategy” axes of measurement in the recently published “Forrester Wave for Public Cloud Development and Infrastructure Platforms, Q422, Global,” which evaluated the 10 most significant vendors in the market.

One of the leading factors Forrester recommends organizations consider is purpose-built silicon that cloud providers like Amazon Web Services are producing. Increasingly, hardware innovation, and silicon in particular, is serving as a differentiating factor for cloud providers and allowing organizations to pursue new use cases.

Learn more

  • Amazon Web Services Silicon Innovation (overview)
  • Amazon Web Services Trainium (overview)
  • Amazon Web Services Inferentia (overview)
  • Amazon Web Services Silicon Innovation Day (video)
  • Generative AI on Amazon Web Services (overview)