Euro-Par 2021: Parallel Processing: 27th International Conference on Parallel and Distributed Computing, Lisbon, Portugal, September 1-3, 2021, Procee » książka
Compilers, Tools and Environments.- ALONA: Automatic Loop Nest Approximation with Reconstruction and Space Pruning.- Automatic low-overhead load-imbalance detection in MPI applications.- Performance and Power Modeling, Prediction and Evaluation.- Trace-driven Workload Generation and Execution.- Bilas Update on the Asymptotic Optimality of LPT.- E2EWatch: An End-to-end Anomaly Diagnosis Framework for Production HPC Systems.- Scheduling and Load Balancing.- Collaborative GPU Preemption via Spatial Multitasking for Efficient GPU Sharing.- A Fixed-Parameter Algorithm for Scheduling Unit dependent Tasks with Unit Communication Delays.- Plan-based Job Scheduling for Super computers with Shared Burst Buffers.- Taming Tail Latency in Key-Value Stores: a Scheduling Perspective.- A log-linear(2+5/6)-approximation algorithm for parallel machine scheduling with a single orthogonal resource.- An MPI-Parallel Algorithm for Mapping Complex Networks onto Hierarchical Architectures.- Pipelined Model Parallelism: Complexity Results and Memory Considerations.- Data Management, Analytics and Machine Learning.- Efficient and Systematic Partitioning of Large and Deep Neural Networks for Parallelization.- A GPU Architecture Aware Fine-Grain Pruning Technique for Deep Neural Networks.- Towards Flexible and Compiler-Friendly Layer Fusion for CNNs on Multicore CPUs.- Smart Distributed Data Sets for Stream Processing.- Cluster, Cloud and Edge Computing.- Colony: Parallel Functions as a Service on the Cloud-Edge Continuum.- Horizontal Scaling in Cloud using Contextual Bandits.- Geo-Distribute Cloud Application at the Edge.- A Fault Tolerant and Deadline Constrained Sequence Alignment Application on Cloud-based Spot GPU Instances.- Sustaining Performance While Reducing Energy Consumption: A Control Theory Approach.- Theory and Algorithms for Parallel and Distributed Processing.- Algorithm design for Tensor Units.- A Scalable Approximation Algorithm for Weighted Longest Common Subsequence.- TSL Queue: An E‑cient Lock-free Design for Priority Queues.- G-Morph: Induced Subgraph Isomorphism Search of Labeled Graphs on a GPU.- Parallel and Distributed Programming, Interfaces, and Languages.- Accelerating Graph Applications Using Phased Transactional Memory.- Efficient GPU Computation using Task Graph Parallelism.- Towards High Performance Resilience using Performance Portable Abstractions.- Enhancing Load-Balancing of MPI Applications with Workshare.- Particle-In-Cell Simulation using Asynchronous Tasking.- Multicore and Manycore Parallelism.- Exploiting co-execution with one API: heterogeneity from a modern perspective.- Parallel Numerical Methods and Applications.- Designing a 3D Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems.- Fault-tolerant LU factorization is low cost.- Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUs.- Outsmarting the Atmospheric Turbulence for Ground-Based Telescopes Using the Stochastic Levenberg-Marquardt Method.- GPU Accelerated Mahalanobis-average Hierarchical Clustering Analysis.- High performance architectures and accelerators.- PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory Hierarchy.- Optimized Implementation of the HPCG Benchmark on Recongurable Hardware.