Amazon Web Services releases open-source software Palace for cloud-based electromagnetics simulations of quantum computing hardware

by Sebastian Grimberg, Hugh Carson, and Andrew Keller | on

Today, we are introducing Palace , for PArallel, LArge-scale Computational Electromagnetics, a parallel finite element code for full-wave electromagnetics simulations. Palace is used at the Amazon Web Services Center for Quantum Computing to perform large-scale 3D simulations of complex electromagnetics models and enable the design of quantum computing hardware. We developed it with support for the scalability and elasticity of the cloud in mind and to leverage the cloud-based high-performance computing (HPC) products and services available on Amazon Web Services .

We are making Palace freely available on GitHub as an open-source project for electromagnetic modeling workloads, not limited to those in quantum computing, which users can run on systems ranging from their own laptops to supercomputers.

Why did we build Palace?

Computational modeling typically requires scientists and engineers to make compromises between model fidelity, wall-clock time, and computing resources. Recent advances in cloud-based HPC have put the power of supercomputers into the hands of scientists and engineers around the world, and now users wish to take advantage to accelerate existing simulation workloads and simulate larger, more complex models.

Palace uses scalable algorithms and implementations from the scientific computing community and supports recent advancements in computational infrastructure to deliver state-of-the-art performance. On Amazon Web Services, this includes the Elastic Fabric Adapter (EFA) for fast networking and HPC-optimized Amazon Elastic Compute Cloud (EC2) instances using customized Intel processors or Amazon Web Services Graviton processors for superior price-performance. Open-source software like Palace also allows users to exploit elastic cloud-based HPC to perform arbitrary numbers of simulations in parallel when exploring large parametric design spaces, unconstrained by proprietary software licensing models.

Lastly, we built Palace because while there exist many highly performant, open-source tools for a wide range of applications in computational physics, there are few open-source solutions for massively parallel, finite element-based computational electromagnetics. Palace supports a wide range of simulation types: eigenmode analysis, driven simulations in the frequency and time domains, and electrostatic and magnetostatic simulations for lumped parameter extraction. As an open-source project, it is also fully extensible by developers looking to add new features for problems of industrial relevance. Much of Palace is made possible by the MFEM finite element discretization library , which enables high-performance, scalable finite element research and application development.

Palace adds to the ecosystem of open-source software supporting cloud-based numerical simulation and HPC, which enables the development of custom solutions and cloud infrastructure for simulation services and gives you more choice than existing alternatives.

Examples of electromagnetics simulations for quantum hardware design

In this section we present two example applications which demonstrate some of the key features of Palace and its performance as a numerical simulation tool. For all of the presented applications, we configured our cloud-based HPC cluster to compile and run Palace using GCC v11.3.0, OpenMPI v4.1.4, and EFA v1.21.0 on Amazon Linux 2 OS. We used COMSOL Multiphysics for the geometry preparation and mesh generation preprocessing in each case, but Palace supports a wide variety of mesh file formats to accommodate diverse workflows including those utilizing entirely open-source software.

Transmon qubit and readout resonator

The first example considers a common problem encountered in the design of superconducting quantum devices: the simulation of a single transmon qubit coupled to a readout resonator, with a terminated coplanar waveguide (CPW) transmission line for input/output. The superconducting metal layer is modeled as an infinitely thin, perfectly conducting surface on top of a c-plane sapphire substrate.

An eigenmode analysis is used to compute the linearized transmon and readout resonator mode frequencies, decay rates, and corresponding electric and magnetic field modes. Two finite element models are considered: a fine model with 246.2 million degrees of freedom, and a coarse model with 15.5 million degrees of freedom that differs by 1% in the computed frequencies as compared to the fine model. For the interested reader, the governing Maxwell’s equations are discretized using third-order, H(curl)-conforming Nédélec elements in the fine model, and similar first-order elements in the coarse model, on a tetrahedral mesh. Figure 1 shows the 3D geometry of the transmon model and a view of the mesh used for simulation. Visualizations of the magnetic field energy density for each of the two computed eigenmodes are also shown in Figure 2.

Transmon qubit example simulation model and mesh

Figure 1: 3D simulation model for the transmon qubit and readout resonator geometry (left). On the right, a view of the surface mesh used for discretization.

Visualization of eigenmodes for transmon qubit simulation example

Figure 2: Visualization, using ParaView , of the magnetic field energy density, scaled by the maximum over the entire computational domain, computed from the simulated linearized transmon eigenmode (left) and readout resonator eigenmode (right).

For each of the two models, we scale the number of cores used for the simulation in order to investigate the scalability of Palace on Amazon Web Services when using a variety of EC2 instance types. Figure 3 plots the simulation wall-clock times and computed speedup factors for the coarse model, while Figure 4 plots them for the higher-fidelity fine model. We observe simulation times of approximately 1.5 minutes and 12 minutes for the coarse and fine models, respectively, achieved with the scalability of EC2. Notice also the improved performance of c7g.16xlarge instance type, featuring the latest generation Amazon Web Services Graviton3 processor, over the previous generation c6gn.16xlarge, often matching the performance of the latest Intel-based instance types.

Performance plots for coarse transmon qubit simulation example

Figure 3: Simulation wall-clock times and speedup factors for the coarse transmon qubit example model with 15.5 million degrees of freedom.

Performance plots for fine transmon qubit simulation example

Figure 4: Simulation wall-clock times and speedup factors for the fine transmon qubit example model with 246.2 million degrees of freedom.

Superconducting metamaterial waveguide

The second example to demonstrate the capabilities and performance of Palace involves the simulation of a superconducting metamaterial waveguide based on a chain of lumped-element microwave resonators. This model is constructed in order to predict the transmission properties of the device presented in Zhang et al., Science 379 (2023) [published] [preprint] . The frequency response of the metamaterial is computed using a driven simulation over the range of 4 to 8 GHz with a resolution of 1 MHz, using Palace’s adaptive fast frequency sweep algorithm.

We consider models of increasing complexity starting from a single unit-cell (see Figure 5 below), with 242.2 million degrees of freedom, and increasing to 21 unit-cells, with 1.4 billion degrees of freedom. The complexity of simulating this device comes from geometric features over a large range of length scales, with trace widths of 2 μm relative to an overall model length of 2 cm. The number of EC2 instances used for the simulation is increased with the number of metamaterial unit-cells, to maintain a constant number of degrees of freedom per processor.

Figure 5 shows the metamaterial waveguide geometry for the 1, 4, and 21 unit-cell simulation cases. Computed filter responses for each of the simulation cases are plotted in Figure 6, where we see the frequency response become more complex as the number of unit-cell repetitions increases. The solution computed using the adaptive fast frequency sweep algorithm is checked against a few uniformly sampled frequencies and both solutions show good agreement over the entire frequency band.

Figure 5: 1, 4, and 21 unit-cell repetition models for the metamaterial waveguide simulation, with engineered tapers at both ends. The 4 unit-cell repetition (bottom) visualizes the electric field energy density, scaled by the maximum over the entire computational domain, from the solution computed by Palace at 6 GHz.

Simulated transmission responses for metamaterial waveguide simulation example

Figure 6: Simulated transmission in the range of 4 to 8 GHz for the 1 (top-left), 4 (top-right), and 21 unit-cell repetition models. The empty circles denote the automatically-sampled frequencies used by Palace’s adaptive fast frequency sweep algorithm. The solution for the frequency response computed at a few uniformly-spaced frequencies is also plotted to demonstrate the accuracy of the adaptive fast frequency sweep.

We plot in Figure 7 the wall-clock time required to run the simulations across an increasing number of cores as the models become more complex. All models are simulated on c6gn.16xlarge instances with the largest case using 200 instances or 12,800 cores. Wall-clock simulation time is higher when using the adaptive fast frequency sweep, but this is because the uniform sweep provides the frequency response at only 17 sampled frequencies while the fast adaptive sweep is much higher resolution, using 4001 points. For the 21 unit-cell repetition, the uniform frequency sweep would take roughly 27 days to achieve the same fine 1 MHz resolution if each frequency point is sampled sequentially.

Performance plots for the metamaterial waveguide simulation example

Figure 7: Simulation wall-clock time for the metamaterial waveguide example as the number of unit-cell repetitions and correspondingly the number of cores used are increased.

As a final comment, the increase in simulation wall-clock time as the model complexity grows, even as the number of degrees of freedom per core is kept roughly constant, is attributed to more linear solver iterations required for convergence as the linear system of equations assembled for each frequency becomes more difficult to solve. Likewise, the adaptive frequency sweep requires a larger number of frequency samples, hence more wall-clock simulation time, to maintain the specified error tolerance as the number of unit-cells in the model increases.

Conclusion and next steps

This blog post introduced the newly released, open-source finite element code Palace for computational electromagnetics simulations. Palace is licensed under the Apache 2.0 license and is free to use by anyone from the broader numerical simulation and HPC community. Additional information can be found in the Palace GitHub repository , where you can file issues, learn about contributing to the project, or read the project documentation . The documentation includes a full suite of tutorial problems which guide you through the process of setting up and running simulations with Palace.

We have also presented the results of a few example applications, running Palace on a variety of EC2 instance types and numbers of cores. The simplest way to get started with Palace on Amazon Web Services is to use Amazon Web Services-supported Amazon Web Services ParallelCluster . You can refer to the self-paced Spack on Amazon Web Services ParallelCluster tutorial and then install Palace using the  Spack package manager .