Architecture, Networks, and Storage.- Microarchitecture of a Configurable High-radix Router for Exascale Interconnect.- BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs.- Lessons Learned from Accelerating Quicksilver on Programmable Integrated Unified Memory Architecture (PIUMA) and How that’s Different from CPU.- A Hierarchical Task Scheduler for Heterogeneous Computing.- Machine Learning, AI, and Emerging Technologies.- Auto-Precision Scaling for Distributed Deep Learning.- FPGA Acceleration of Number Theoretic Transform.- Designing a ROCm-aware MPI Library for AMD GPUs: Early Experiences.- A Tunable Implementation of Quality-of-Service Classes for HPC Networks.- Scalability of Streaming Anomaly Detection in an Unbounded Key Space using Migrating Threads.- HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads.- Proctor: A Semi-Supervised Performance Anomaly Diagnosis Framework for Production HPC Systems.- HPC Algorithms and Applications.- COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling.- Enabling AI-Accelerated Multiscale Modeling of Thrombogenesis at Millisecond and Molecular Resolutions on Supercomputers.- Evaluation of the NEC Vector Engine for Legacy CFD Codes.- Distributed Sparse Block Grids on GPUs.- iPUG: Accelerating Breadth-First Graph Traversals using Manycore Graphcore IPUs.- Performance Modeling, Evaluation, and Analysis.- Optimizing GPU-enhanced HPC System and Cloud Procurements for Scientific Workloads.- A Performance Analysis of Modern Parallel Programming Models Using a Compute-Bound Application.- Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact.- Performance of the Supercomputer Fugaku for Breadth-First Search in Graph500 Benchmark.- Under the Hood of SYCL - An Initial Performance Analysis With an Unstructured-mesh CFD Application.- Characterizing Containerized HPC Application Performance at Petascale on CPU and GPU Architectures.- Ubiquitous Performance Analysis.- Programming Environments and Systems Software.- Artemis: Automatic Runtime Tuning of Parallel Execution Parameters Using Machine Learning.