We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Amazon Web Services ParallelCluster 3.3.0 now supports On-Demand Capacity Reservations
On-Demand Capacity Reservations (ODCRs) enable you to reserve compute capacity for your Amazon EC2 instances in a specific Availability Zone for any duration. When you use them with Amazon Web Services ParallelCluster, they help to ensure your HPC workloads have enough resources to complete successfully and on-time.
It has long been possible for Amazon Web Services ParallelCluster to make use of ODCRs via manual setup. However, with Amazon Web Services ParallelCluster 3.3.0, you can now add and modify ODCRs for your HPC cluster directly within your Amazon Web Services ParallelCluster configuration.
This post explains what ODCRs are, how this new feature works, and how to configure your HPC cluster to use them.
What are On-Demand Capacity Reservations?
There are two kinds of On Demand Capacity Reservations: open ODCRs and targeted ODCRs.
Under an open ODCR, you do not have to provide a reservation identifier when an Amazon EC2 instance is launched. Instead, instances launched after the start of the reservation that match the reservation by instance type, platform, and Availability Zone are automatically allocated to the ODCR.
Using targeted ODCRs requires you to provide either the ODCR identifier or Resource Group ARN at instance launch time. When the reservation expires, no more instance launches can take place. This can be a good way to meter utilization while you also ensure sufficient capacity.
What’s New?
Amazon Web Services ParallelCluster has historically supported open ODCRs so long as the instance types it launched matched the reservation attributes. Since ParallelCluster 3.1.1, you have been able to use targeted ODCRs by editing a file on the cluster head node to add certain EC2 API parameters.
Amazon Web Services ParallelCluster 3.3.0 improves this experience. Now, you configure ODCRs directly in the Amazon Web Services ParallelCluster configuration file. You specify a combination of capacity reservation identifier, capacity resource group ARN, and cluster placement group for each compute resource in your Slurm queues (Figure 1). Importantly, you can add or remove capacity reservations to your queues dynamically, without disrupting cluster operations.
If you’ve used

Figure 1: ParallelCluster configurations now support capacity reservations in combination with networking placement groups.
How to use ODCRs with Amazon Web Services ParallelCluster 3.3.0
To use targeted ODCRs with ParallelCluster, you will need to create a new cluster. Update to or install ParallelCluster to version 3.3.0 following
Configuring ODCR
You can configure an ODCR for any compute resource in any Slurm queue. The configuration details will vary depending on whether you are using a single instance type per Slurm queue Compute Resource or multiple instance types.
In the case of single instance types, you have two options. You can
CapacityReservationId
. Or, you can
CapacityReservationResourceGroupArn
.
Scheduling:
Scheduler: slurm
SlurmQueues:
- Name: q1
ComputeResources:
- Name: cr1
InstanceType: c6a.48xlarge
MinCount: 1
MaxCount: 8
CapacityReservationTarget:
CapacityReservationId: cr-01234567890abdef0
# OR #
CapacityReservationResourceGroupArn: arn:aws:resource-groups:us-east-1:123456791537:group/MyCRGroup
Note that you must use the Amazon Web Services Command Line Interface (Amazon Web Services CLI) when creating capacity reservation groups for use with Amazon Web Services ParallelCluster, rather than the Amazon Web Services Management Console. This is because the console only supports creation of Tag- and Stack-based resource groups, and these are not supported by ParallelCluster.
In the case of multiple instance types per Compute Resource , you can only use a CapacityReservationResourceGroupArn
. Create a
InstanceType
in your list of Instances
. Then, specify the group’s ARN in your cluster configuration like this:
…
Scheduling:
Scheduler: slurm
SlurmQueues:
- Name: q2
ComputeResources:
- Name: cr1
Instances:
- InstanceType: c6a.24xlarge
- InstanceType: r6a.24xlarge
- InstanceType: m6a.24xlarge
MinCount: 1
MaxCount: 8
CapacityReservationTarget:
CapacityReservationResourceGroupArn: arn:aws:resource-groups:us-east-1:123456791537:group/MyCRGroup
...
Using Cluster Placement Groups
A cluster placement group (CPG) is a logical grouping of instances within a single Availability Zone and they offer the benefit of low network latency and high network throughput, which helps increase performance of tightly coupled HPC workloads. You can use them with ODCR by creating a Cluster Placement Group ODCR (CPG ODCR).
To create a Cluster Placement Group ODCR
-
Create a Cluster Placement Group (if you do not have one already). - Create a targeted ODCR,
specifying the Cluster Placement Group name when you do so. - Create a capacity reservation resource group to hold the ODCR. Add either your reservation identifier or the group ARN to the cluster configuration. Then, add the networking placement group name as shown in this example:
Scheduling:
Scheduler: slurm
SlurmQueues:
- Name: q1
ComputeResources:
- Name: cr1
InstanceType: c6a.48xlarge
MinCount: 1
MaxCount: 8
CapacityReservationTarget:
CapacityReservationId: cr-01234567890abdef0
Networking:
PlacementGroup:
Name: my-placement-group
Updating Your Cluster
You can dynamically update or modify capacity reservation configurations on your cluster once it is running Amazon Web Services ParallelCluster 3.3.0. By default, you will need to stop your compute fleet, update the cluster, then restart the fleet to make changes. However, you can change the Slurm QueueUpdateStrategy
to either DRAIN
or TERMINATE
, as we discussed in our
Troubleshooting ODCRs
Once you have created an ODCR, it’s straightforward to get it working with Amazon Web Services ParallelCluster 3.3.0. However, your request to create a capacity reservation can, itself, fail. It’s worth looking at three reasons this can happen.
- Your request can fail when there isn’t currently enough Amazon EC2 capacity for the requested instance type in the desired availability zone. You can address this by waiting until later, switching availability zones, or changing instance types.
- It can fail when the requested number of instances exceeds your service quota. You can resolve this by
requesting an increase in your service quota for On-Demand instances. Ensure that is high enough to accommodate your capacity reservation and any other concurrent instances you need to run. - Your request can fail due to its Cluster Placement Group. The capacity reservation and the placement group have to be in the same availability zone. Furthermore, you can only create capacity reservations for instance types that support Cluster Placement Groups. Most instance types are supported, but you can learn about the exceptions in the
Cluster Placement Groups documentation .
Summary
In Amazon Web Services ParallelCluster 3.3.0, it’s easier than ever to use On-Demand Capacity Reservations (ODCRs) to reserve exactly the amount of Amazon EC2 instance capacity you need to complete your HPC workloads. Open capacity reservations are designed to work by default, while targeted reservations and reservations with placement groups use a new configuration mechanism. You’ll need to update your Amazon Web Services ParallelCluster installation and update your clusters to take advantage of this new capability.
We’d love to know what you think after trying out ODCRs with Amazon Web Services ParallelCluster, and how we can improve this new feature. Reach out to us on Twitter at
The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.