6:00 pm
Reception hosted by Multicore World
8:15 am
Opening Remarks
Scientific Instruments and Custom Architectures: Co-Design with Open-Source Tools
As transistor scaling slows and data rates from scientific instruments rise, data movement has become the primary performance bottleneck across high-performance computing (HPC) and instrumentation. On-chip data reduction, dataflow computing, and specialized hardware accelerators are increasingly necessary to improve system efficiency and enable real-time data handling. However, hardware specialization demands significant expertise and accessible resources, which remain limited. Open-source tools like Chisel, Verilator, cocotb, FireSim, Mosaic, alongside standards like RISC-V, help lower barriers, foster co-design, and drive innovation in scientific computing and instruments.
Exploration of Coarse-Grained Reconfigurable Architectures for HPC and AI Workloads
In Processor research team at RIKEN R-CCS, we have been researching reconfigurable architectures for HPC and AI workloads. The highly efficient feature of dataflow computing for parallel processing and data movement makes us focus on Coarse-Grained Reconfigurable Array (CGRA) as a dataflow computing architecture suitable for HPC and AI. CGRA allows high-throughput pipelined computing for various dataflow graphs of computing kernels, which are mapped onto the array giving operator-level parallelism. In this talk, I will introduce our research on RIKEN CGRA with its baseline design for HPC, design space exploration for computing and routing resources, and extension opportunities to share computing resources both for HPC and AI workloads.
Under the Hood of OpenFPGA
In this talk, we will introduce the OpenFPGA framework whose aim is to generate highly-customizable Field Programmable Gate Array (FPGA) fabrics and their supporting EDA flows. Following the footsteps of the RISC-V initiative, OpenFPGA brings reconfigurable logic into the open-source community and closes the performance gap with commercial products. OpenFPGA strongly incorporates physical design automation in its core and enables 100k+ look-up tables FPGA fabric generation from specification to layout in less than 24h with a single engineer effort.
Democratizing silicon with composable chiplets
Chiplets present a compelling approach to reducing the cost and time of chip design by raising the abstraction level to the die. In this talk, I will share my decade-long experience with chiplets and examine the current challenges blocking effective chiplet-based design for startups, government, and academia. Additionally, I will introduce recent work on composable (“interchangable”) chiplets that have the potential of completely disrupting chip design by removing the need for tapeouts all together. Composable chiplets allow O(M^N) unique silicon systems to be assembled from N chiplets arbitrarily selected from a library of size M. To illustrate the potential of this approach, consider a 10 chiplet system built from a library of 10 chiplets, enabling the creation of 109 unique configurations. In contrast, fewer than 103 unique chip tapeouts occur worldwide each year. Standing up a practical composable chiplet platform on par with the existing SoC design ecosystem will require enormous investments, but if done right has the potential of fundamentally disrupting the semiconductor industry.
10:30 am
Coffee Break
Bridging Python to Silicon: The SODA Toolchain
The SODA Synthesizer is an open-source, modular, end-to-end hardware compiler framework. The SODA frontend, developed in MLIR, performs system-level design, code partitioning, and high-level optimizations to prepare the specifications for the hardware synthesis. The backend is based on Bambu, a state-of-the-art high-level synthesis tool and generates the final hardware design. The backend can interface with logic synthesis tools for field programmable gate arrays or with commercial and open-source logic synthesis tools (e.g., OpenROAD) for application-specific integrated circuits. This talk discusses opportunities and challenges in integrating with commercial and open-source tools both at the frontend and backend, and highlight the role that an end-to-end compiler framework like SODA can play in an open-source hardware design ecosystem.
11:30 am
Open Chiplet, Hardware Prototyping, and Testbeds Panel
Panelists: Pierre-Emmanuel Gaillardon (University of Utah), Andreas Olofsson (ZeroASIC), Antonino Tumeo (PNNL), Kentaro Sano (RIKEN), Kaz Yoshii (ANL)
12:30 am
Lunch Break
Lokelani 2
The Interconnected Science Ecosystem (INTERSECT): Building Scientific Laboratories of the Future
Recent advances in edge computing, automation, and artificial intelligence (AI) are accelerating the development of “smart” autonomous laboratories and user facilities across the scientific enterprise. These smart laboratories integrate experimental instruments, edge computing, and high-performance computing (HPC) into complex, AI-driven workflows—enabling new levels of automation, adaptability, and scientific throughput. The INTERSECT initiative at Oak Ridge National Laboratory is building the foundational infrastructure to interconnect these smart labs by developing scalable edge-to-HPC data and control pathways that support real-time coordination across facilities, disciplines, and autonomous systems. INTERSECT fosters cross-facility collaboration and domain interoperability through a unified infrastructure approach that enables scalable, AI-driven autonomous experimentation. This talk will highlight the development of several end-to-end autonomous workflows, with a particular focus on the integration of AI agents into domain-specific projects spanning materials science, chemistry, and advanced manufacturing.
Quantum-enhanced interferometric imaging: A step toward quantum-enhanced very-long-baseline interferometry for astronomy
We report a laboratory demonstration of interferometric imaging using a path-entangled single-photon state as a reference field distributed to spatially separated receivers to measure the spatial distribution of an extended incoherent source. The use of distributed entanglement between the receiving stations in this protocol allows measurements without requiring direct interference of the collected light and provides a route to larger baseline separations that could enhance the precision of astronomical telescopes.
Adapting HPC Systems for Integrated Research Infrastructure: Challenges, Opportunities, and Emerging Technologies
Integrated Research Infrastructure (IRI) is a framework that provides a new approach to science. It seamlessly connects experimental workflows from real-world data collection to computational analysis and modeling/simulation on high-performance computing (HPC) systems. As IRI frameworks become more adopted, they introduce unique challenges for HPC environments, including multi-tenancy, real-time feedback loops, and dynamic resource allocation across different workloads. This talk will explore these challenges, focusing on how they interact with critical components such as data management and provenance, security and privacy, and the integration of AI-driven analysis. Additionally, I will examine the role of emerging technologies in addressing these challenges, including advanced communication frameworks like MPI and NCCL, and high-speed networking and interconnect solutions such as NVLink, Ultra Ethernet, and SmartNICs. By understanding and eventually addressing these issues, we can better understand how HPC systems must adapt to support next-generation research infrastructures, transforming how scientific research is conducted and scaled.
Integrated Data Analytics Needs in ESGF2-US
The Earth System Grid Federation (ESGF) is a global peer-to-peer network of data nodes that manage, archive, and distribute output from Earth system model simulations and their related input, observational, and reanalysis data. The DOE-funded ESGF2-US project is a major contributor to ESGF by collaborating on development and deployment of front-end and back-end software and by providing storage and analysis platforms for data-proximate computing. ESGF2-US is currently collaborating on the development of new user interfaces for data discovery, server-side computing capabilities for generating value added products, and JupyterHub resources for supporting large-scale data analytics. Leveraging tools and capabilities developed on commercial cloud platforms, the project is exploring software and hardware solutions that enable greater productivity for data managers, modelers, and analysts.
3:30 pm
Coffee Break
Developing HPC Interfaces for the US DOE Integrated Research Infrastructure program
The Integrated Research Infrastructure (IRI) program stands out with its innovative strategy to more effectively enable science across the DOE user facilities at scale. It is set to radically accelerate discovery and innovation within the DOE with a unique approach that empowers scientists to seamlessly and securely combine DOE research tools, infrastructure, and user facilities into their orchestrated workflows. A key component of this approach is the introduction of collaborative interfaces for users and orchestration tools, enabling workflows that run seamlessly across multiple user facilities. Today, despite the use of common toolchains and similar HPC providers, the landscape of novel HPC interfaces is still rather fragmented, with no common interface emerging that functions across all user facilities. This is a major hurdle for scientists aiming to build resilient workflows with multiple facilities in mind, locking them into one specific computing and data environment and exposing them to risks like major interruptions. Hence, the success of IRI hinges on the delivery of collaborative interfaces that scientists can use to build resilient and performant cross-facility workflows. This presentation will cover scope and current activities of the DOE IRI program and with a focus on the activities in the interfaces subcommittee.
4:30 pm
Interconnected Research Infrastructure Panel
Panelists: Bjoern Enders (LBNL), Shane Cannon (LBNL), Ben Mintz (ORNL), Forrest Hoffman (ORNL), Erhan Saglamyurek (ESNet), Matthew Dosanjh (SNL), Brian Smith (University of Oregon)
6:30 pm
Dinner
Mauna Loa-lima Ballroom
Accelerating Data Science in the Pacific
The Department of Defense implements its mission to provide national defense across a broad spectrum of activities. The Air Force Research Laboratory seeks to enhance those activities, or capabilities, to create US technical advantage over our adversaries. Through research, development, and demonstration, the Maui High Performance Computing Center accelerates the delivery of those novel capabilities to trial by fire. Dr. Scott Pierce will provide insight to the variety of the data leveraged and required to improve performance in the areas of space domain awareness, cyber defense, and mission planning.
High-Resolution Fire Spread Modeling of the 2023 Lahaina Fires
In August 2023, the Lahaina Fire devastated the historic town of Lahaina, driven by an intense downslope wind storm. To better understand this catastrophic event, we conducted high-resolution modeling of the windstorm and subsequent fire spread using advanced numerical techniques. Employing nested domains at horizontal resolutions of 900 m and 100 m, with the inner domain configured for large-eddy simulations (LES), we explicitly resolved the dominant turbulent scales governing the wind dynamics. These detailed wind fields were then used to drive an urban-scale fire spread model, which incorporated all structures and vegetation in Lahaina at a fine spatial (10 m) and temporal (5-minute) resolution. To validate and enhance our simulations, we actively engaged the local community through a dedicated website, collecting critical eyewitness observations and data. We present detailed hour-by-hour comparisons between our simulations and observed data, highlighting the accuracy and utility of high-performance computing (HPC) modeling. Finally, we discuss the potential of these advanced modeling approaches to inform and guide resilient rebuilding efforts in Lahaina.
Advancing Space Weather from Hawaiʻi: Data, Models, and Forecasting
The new Haleakalā Neutron Monitor (NM) on Maui provides Hawaiʻi’s first high-altitude stream of cosmic-ray and solar neutron data for space-weather research. We combine these measurements with data from the Alpha Magnetic Spectrometer on the International Space Station, which has been operating since 2011 and will continue through 2030. Together, these datasets capture a unique energy spectrum of solar and galactic radiation reaching Earth and the Hawaiian Islands. The real-time data stream is integrated with measurements from the continental U.S. Simpson Neutron Monitor network and the global Neutron Monitor Database (NMDB), and is sent directly to NOAA’s Space Weather Prediction Center to support operational awareness. Dedicated computing nodes on Oʻahu are used to develop and run models that simulate the transport of galactic cosmic rays through the solar system and into Earth’s atmosphere. These models aim to improve our understanding and forecasting of space radiation environments. The project also offers students in Hawaiʻi training opportunities in space weather, radiation modeling, and scientific data analysis while building new research collaborations across the Islands.
10:30 am
Coffee Break
Geothermal Resources Across the State of Hawaiʻi
Despite the fact that electricity costs in the State of Hawaii are two to three times the national average, and statewide geothermal resource assessments published in 1985 and 2015 suggest the possibility of extensive geothermal resources, there remains today very little data to constrain the size and scale of Hawaii's prospective geothermal resources. The research that needs to be conducted to more fully constrain Hawaii's resources is proven, safe, and can be expected to cost a few hundred million USD (over a decade or so). Finding such funds has proven very difficult. This talk will outline what is known but much more of what is not known about geothermal resources across the State of Hawaii, and outline a path forward. In general, very little is known about Hawaii's "deep" (below 2km) subsurface due to a unique lack of oil and gas exploration in the state, and so any geothermal resource exploration in Hawaii will be inextricably linked to groundwater and carbon storage resource exploration.
High-Performance PDE Simulations with PISALE: Addressing Technical Challenges in Energy Earthshots, Fusion, and National Security
We present applications of the computational framework PISALE, leveraging advanced numerical methods and high-performance computing to address critical challenges identified by the U.S. Department of Energy's Energy Earthshots™ Initiative—an ambitious effort to accelerate breakthroughs toward abundant, affordable, and reliable energy solutions within the decade. PISALE combines Arbitrary Lagrangian Eulerian (ALE) and Adaptive Mesh Refinement (AMR) methods to efficiently solve complex Partial Differential Equation (PDE) systems, enabling unprecedented accuracy and adaptability in simulations. We highlight PISALE's versatility and data-driven capabilities through targeted Earthshot applications, including enhanced geothermal energy, floating offshore wind, and the development of advanced materials relevant to multiple Earthshot goals. Additionally, we illustrate the broader applicability of PISALE through separate examples such as high-speed impact phenomena and cutting-edge X-ray Free Electron Laser (XFEL) experiments. Led by the University of Hawaii in collaboration with Lawrence Berkeley Laboratory (LBL) and Lawrence Livermore National Laboratory (LLNL), our team emphasizes student engagement and mutually beneficial partnerships, advancing energy innovation and workforce development both locally and nationally.
12:00 pm
Science and Data on the Islands, Challenges and Opportunities Panel
Panelists: Scott Pierce (MHPCC), David Eder (University of Hawaiʻi), Veronica Bindi (University of Hawaiʻi), Nicole Lautze
(Hawaiʻi Groundwater and Geothermal Resource Center)
12:30 am
Lunch Break
Lokelani 2
Opportunities, Risks, and Frontiers for Frontier Models on Frontier Science
Large (Language) Models have evolved from a highly successful architectural innovation for machine translation into a powerful, broadly applicable tool. In this talk, I will trace the pace of this change over the past year, with an emphasis on where and how these tools have been adopted in the sciences, including spotlights on opportunities for automation and acceleration in the near future.
Photonic networks of quantum processing units: A path to scalable quantum computers
Large (Language) Models The state-of-the art quantum computers offer up to a few hundreds of physical qubits with sufficiently small error rates for fault tolerant quantum information processing. However, the millions of such qubits that are needed for utility scale machines is still too difficult with the current technologies. One of the promising routes to access this regime is to scale up via photonic interconnects between small-scale processing modules, leading to distributed quantum computing (DQC) evolved from a highly successful architectural innovation for machine translation into a powerful, broadly applicable tool. In this talk, I will trace the pace of this change over the past year, with an emphasis on where and how these tools have been adopted in the sciences, including spotlights on opportunities for automation and acceleration in the near future.
Dynamical System Units for AI and Scientific Computing
Recently, the dynamical system-based analogy architecture, as an emerging disruptive computing technology, shows promise in a variety of key problems such as machine learning, optimization, and solving partial equations. In this talk, I will walk through the research front end we have been working on for the dynamical system units, including the design for continuous variables, non-linear functions, mapping to AI, and hardware scaling.
Power of Machine Learning and Optimization in Quantum Computer Design
Towards developing a large-scale quantum computer, design automation, and associated optimization become increasingly critical. In this talk, I will review three areas of quantum computer design and operation where machine learning and advanced optimization can help improve the accuracy and performance magnitudes beyond the best existing human experts. The four areas are quantum architecture design, quantum control optimization and quantum metrology, quantum circuit optimization, and quantum error correction. I will conclude by outlining some open areas where design automation and machine learning can become critical in pushing beyond existing technological barriers toward scalable quantum computer design.
3:30 pm
Report Planning and Collaborative Discussions
Quantum hardware-enabled molecular dynamics via transfer learning
The ability to perform ab initio molecular dynamics simulations using potential energies calculated on quantum computers would allow virtually exact dynamics for chemical and biochemical systems, with substantial impacts on the fields of catalysis and biophysics. However, noisy hardware, the costs of computing gradients, and the number of qubits required to simulate large systems present major challenges to realizing the potential of dynamical simulations using quantum hardware. Here, we demonstrate that some of these issues can be mitigated by recent advances in machine learning. By combining transfer learning with techniques for building machine-learned potential energy surfaces, we propose a new path forward for molecular dynamics simulations on quantum hardware.
The Advanced Quantum Testbed: a collaborative research laboratory for superconducting circuit quantum information science
The Advanced Quantum Testbed (AQT) at Berkeley Lab is a collaborative research program launched in 2018. AQT provides a full-stack, white-box quantum computing platform with superconducting qubits, enabling cutting-edge research on quantum algorithms, hardware, and error mitigation techniques. Through deep collaboration with users from national laboratories, academia, and industry, AQT fosters innovation and advances quantum computing capabilities for impactful scientific applications. Here, I describe the technologies accessible at the testbed, as well as several scientific research projects related to aspects along the full superconducting quantum stack.
From NISQ to Quantum Accelerated Supercomputing
2024 marked a turning point for quantum computing: credible demonstrations of quantum error correction (QEC) across multiple qubit modalities and the first glimpses of scalable fault-tolerant architectures. As industry moves beyond noisy-intermediate-scale quantum (NISQ) devices, tight integration with high-performance computing (HPC) and AI is becoming indispensable. This talk will provide an overview of how NVIDIA is partnering across the quantum ecosystem to make this transition possible and solve the challenges on the way to realizing Quantum Accelerated Supercomputing.
Certified randomness using a trapped-ion quantum processor
Although quantum computers can perform a wide range of practically important tasks beyond the abilities of classical computers, realizing this potential remains a challenge. An example is to use an untrusted remote device to generate random bits that can be certified to contain a certain amount of entropy. Certified randomness has many applications but is impossible to achieve solely by classical computation. Here we demonstrate the generation of certifiably random bits using the 56-qubit Quantinuum H2-1 trapped-ion quantum computer accessed over the Internet. Our protocol leverages the classical hardness of recent random circuit sampling demonstrations: a client generates quantum ‘challenge’ circuits using a small randomness seed, sends them to an untrusted quantum server to execute and verifies the results of the server. We analyze the security of our protocol against a restricted class of realistic near-term adversaries. Using classical verification with measured combined sustained performance of 1.1 × 1018 floating-point operations per second across multiple supercomputers, we certify 71,313 bits of entropy under this restricted adversary and additional assumptions. Our results demonstrate a step towards the practical applicability of present-day quantum computers. See paper for more details: https://www.nature.com/articles/s41586-025-08737-1
10:30 am
Coffee Break
Quantum simulations of large lattice models and chemistry beyond the existence of exact solutions
In the last decade, variational algorithms have been the tool of choice for researchers and practitioners of quantum computing to tackle ground state problems on pre-fault-tolerant quantum processors. Currently, a number of practical and theoretical issues prevent scalability of variational algorithms to large system sizes. In this talk, I will discuss two quantum diagonalization methods, based on subspaces obtained from quantum computers, which overcome the scaling limitations of variational algorithms. The Krylov quantum diagonalization, which allowed us to perform quantum ground state calculations for lattice models of up to 50 spins, and a sample-based quantum diagonalization, which enabled realistic chemistry computations of up to 77 qubits on a quantum centric supercomputing architecture, using a Heron quantum processor and the supercomputer Fugaku.
Recent results on Quantinuum hardware
Quantinuum currently provides the highest performing quantum computing hardware commercially available. Enabled by the highest fidelity qubits and gates and an advanced feature set including mid-circuit measurement, qubit reuse, conditional logic, and arbitrary angle two-qubit gates, users routinely execute the deepest circuits and achieve remarkable results. This talk will highlight recent demonstrations of what this hardware can accomplish.
12:00 pm
Disruptive Technologies and Quantum Information Science Panel
Panelists: Kevin Cronk (PsiQuantum), Christ Langer (Quantinuum), Erhan Saglamyurek (LBNL), Sam Stanwyck (Nvidia), Antonio Mezzacapo (IBM), Murphy Yuezhen Niu (USCB)
1:00 pm
Lunch Break
Lokelani 2
Balanced and Elastic End-to-end Training of Dynamic LLMs
To reduce computational and memory costs in Large Language Models (LLMs), dynamic workload reduction schemes like Mixture of Experts (MoEs), parameter pruning, layer freezing, sparse attention, early token exit, and Mixture of Depths (MoDs) have emerged. However, these methods introduce severe workload imbalances, limiting their practicality for large-scale distributed training. We propose DynMo, an autonomous dynamic load balancing solution that ensures optimal compute distribution when using pipeline parallelism in training dynamic models. DynMo adaptively balances workloads, dynamically packs tasks into fewer workers to free idle resources, and supports both multi-GPU single-node and multi-node systems. Compared to static training methods (Megatron-LM, DeepSpeed), DynMo accelerates training by up to 1.23× (MoEs), 3.18× (pruning), 2.23× (layer freezing), 4.02× (sparse attention), 4.52× (early exit), and 1.17× (MoDs).
Advancing Science Through AI Accelerators
Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered a promising approach to address some challenging problems because of their superior generalization capabilities across domains. The effectiveness of the models and the accuracy of the applications are contingent upon their efficient execution on the underlying hardware infrastructure. Specialized AI accelerator hardware systems have recently become available for accelerating AI applications. In this talk, I will present a benchmarking study for performance evaluation of LLMs on various AI accelerators including training and inference capabilities, besides an orthogonal approach of various implementation frameworks. I will present our findings and analyses of the models’ performance to better understand the intrinsic capabilities of AI accelerators.
Performance Modeling and System Design Insights for AI Foundation Models
Generative AI, in particular large transformer models, are increasingly driving HPC system design in science and industry. The presentation will discuss some analysis on performance characteristics of such transformer models and their sensitivity to the transformer type, parallelization strategy, and HPC system features (accelerators and interconnects). The analysis is through a performance model that allows us to explore this complex design space --- the talk will highlight its key components. We show results demonstrating the varying needs of foundation models in science and industry
3:30 pm
Coffee Break
Eco-Driven AI-HPC: Workflows, data & modeling
Modern workflows illustrate the convergence of AI and HPC in computational sciences. They now connect experimental and computing facilities together, transport and process large volumes of data at high velocity, train surrogate models of first principle simulations, and can dynamically adapt their structure to enable and automate scientific discovery. The energy-efficient execution of such workflows raise several challenges related to resource allocation and scheduling, data management, and performance modeling that we will explore in this talk.
4:30 pm
Efficient AI for Science Panel
Panelists: Daniel Freeman (Anthropic), Mohamed Wahib (RIKEN), Murali Emani (ANL), Fred Suter (ORNL), Shashank Subramanian (LBNL)
5:30 pm
Adjourn