Polaris paves the way for Aurora exascale supercomputer at Argonne
Argonne National Laboratory plans to pave the way for its first exascale supercomputer with a testbed system, Polaris, announced Wednesday.
The “stepping stone” supercomputer will afford staff and users early access to hardware and technologies that will be available when Aurora delivery begins in 2022, giving them time to prepare, Ti Leggett, Polaris project director, told FedScoop.
Aurora will be one of the first large-scale deployments of yet-to-be-released technology from Intel, so Polaris will employ a similar architecture in the meantime.
“It’s a different architecture than our systems currently run, that our users have been familiar with for many years,” Leggett said. “Polaris will fill the gap for hybrid computing with GPUs, the Slingshot, as well as preparing their data on the Eagle and Grand file systems.”
Polaris will be a hybrid central processing unit-graphics processing unit (CPU-GPU) system based off Hewlett Packard Enterprise’s (HPE) Slingshot interconnect, just like Aurora. Both supercomputers will have the same HPE Cray programming environment and system software stack.
The new architecture used by HPE Cray exascale supercomputers is designed to handle massive modern analytics and artificial intelligence workloads.
Despite both sporting a unified memory architecture, the systems’ biggest difference will be between their CPUs and GPUs. Polaris will use NVLink, while Aurora will use Intel Xe links. Still CPUs will connect to GPUs via PCIe, and GPUs will employ high-bandwidth memory in both cases.
The first of the Argonne Leadership Computing Facility’s (ALCF) users to benefit from Polaris, and later Aurora, will be those on Early Science Program research teams.
“We work with key projects with aspirations for running on the next machine as soon as possible and help us stress the machine and give feedback to the vendors on the design,” Leggett said. “So some of the first users we wanted Polaris to support were those Early Science Program users.”
Aurora will support not only ModSim science codes but artificial intelligence, machine learning and data-intensive workflows, and Polaris will be able to do that with the same architecture.
The Department of Energy, of which Argonne is a part, also has the Exascale Computing Project preparing scientific codes for the new architectures.
“We’ll be working with them to get them early access as well and continued access on Polaris through the end of the ECP project, in order for their codes and applications to make their deliverables,” Leggett said.
Argonne has a formal project with its own director for both the Polaris and Aurora system acquisitions for budget and delivery purposes. Susan Coghlan directs the Aurora project, and Leggett is her deputy.
Both projects will go through an acceptance period where the hardware is stress-tested and vetted for functionality, performance and stability, followed by Critical Decision 4, the project closeout date. Leggett declined to give date specifics.
“Those dates have been set and we’re marching to those and we should have the system deployed and accepted ahead of that,” he said.
The expectation is that Polaris will be available to the first users early in the first quarter of 2022, before going into production for the wider INCITE and ASCR Leadership Computing Challenge (ALCC) allocation programs soon after. While the Frontier exascale computer at Oak Ridge National Lab is being installed now, Aurora is next up with delivery in 2022 and production in 2023. Lawrence Livermore National Lab‘s El Capitan exascale computer is also due for delivery in 2022.
Polaris will leverage all of ALCF’s existing infrastructure, namely the 200-petabyte Eagle and Grand file systems deployed last year. They combine for 1.5 terabytes-per-second bandwidth, which will be leveraged for campaign storage.
The Eagle file system also has novel community sharing via Globus sharing that will remain available for Polaris users to generate end data and share it with the wider research community.
All systems will be supported by an InfiniBand Sand fabric and backend infrastructure including Theta, the current, Core-based Intel Knights LANDING CPU system Aurora is replacing.
Once Aurora is on the ALCF floor, Polaris will remain a production resource for ML, AI and data-intensive workflows.
Argonne also wants to use Polaris to explore integration with other experimental facilities, starting with several on site like the Advanced Photon Source and Center for Nanoscale Materials that can be incorporated “rather easily,” Leggett said.
“We’re going to use Polaris as a way to explore how we can connect these traditionally ModSim, high-performance computing resources with more on-demand, urgent computing that is required for those kinds of experimental facilities,” Leggett said.