Introduction – Learn what expects you inside the book
What this book is about
Who should read this book
Notation and conventions
How to read this book
Overview
Parallel computer
Intraprocessor parallelism
Interprocessor parallelism
Exercises
MPI standard
MPI history
Related standards
Exercises
MPI subsetting
Motivation
Typical examples
Implementation practice
Exercises
Shared memory – Learn how to create a simple MPI subset capable of basic blocking point-to-point and collective operations over shared memory
Subset definition
General assumptions
Blocking point-to-point communication
Blocking collective operations
Exercises
Communication mechanisms
Basic communication
Intraprocess performance
Interprocess performance
Exercises
Startup and termination
Process creation
Two processes
More processes
Connection establishment
Process termination
Exercises
Blocking point-to-point communication
Limited message length
Blocking protocol
Unlimited message length
Double buffering
Eager protocol
Rendezvous protocol
Exercises
Blocking collective operations
Naive algorithms
Barrier
Broadcast
Reduce and Allreduce
Exercises
Sockets – Learn how to create an MPI subset capable of all point-to-point and blocking collective operations over Ethernet and other IP capable networks
Subset definition
General assumptions
Blocking point-to-point communication
Nonblocking point-to-point operations
Blocking collective operations
Exercises
Communication mechanisms
Basic communication
Intranode performance
Internode performance
Exercises
Synchronous progress engine
Communication establishment
Data transfer
Exercises
Startup and termination
Process creation
Startup command
Process daemon
Out-of-band communication
Host name resolution
Connection establishment
At startup (eager)
On request (lazy)
Process termination
Exercises
Blocking point-to-point communication
Source and tag matching
Unexpected messages
Exercises
Nonblocking point-to-point communication
Request management
Exercises
Blocking collective operations
Communication context
Basic algorithms
Tree based algorithms
Circular algorithms
Hypercube algorithms
Exercises
OFA libfabrics – Learn how to create an MPI subset capable of all point-to-point and collective operations over InfiniBand and upcoming future networks
Subset definition
General assumptions
Point-to-point operations
Collective operations
Exercises
Communication mechanisms
Basic communication
Intranode performance
Internode performance
Exercises
Startup and termination
Process creation
Credential exchange
Connection establishment
Process termination
Exercises
Point-to-point communication
Blocking communication
Nonblocking communication
Exercises
Collective operations
Advanced algorithms
Blocking operations
Nonblocking operations
Exercises
Advanced features – Learn how to add advanced MPI features including but not limited to heterogeneity, one-sided communication, file I/O, and language bindings
Communication modes
Standard
Buffered
Synchronous
Heterogeneity
Basic datatypes
Simple datatypes
Derived datatypes
Exercises
Groups, communicators, topologies
Group management
Communicator management
Process topologies
Exercises
One-sided communication
Mapped implementation
Native implementation
Exercises
File I/O
Standard I/O
MPI file I/O
Exercises
Language bindings
Fortran
C++
Java
Python
Exercises
Optimization – Learn how to optimize MPI internally by using advanced implementation techniques and available special hardware
Direct data transfer
Direct memory access
Remote direct memory access
Exercises
Threads
Thread support level
Threads as MPI processes
Shared memory extensions
Exercises
Multiple fabrics
Synchronous progress engine
Asynchronous progress engine
Hybrid progress engine
Exercises
Dedicated hardware
Synchronization
Special memory
Auxiliary networks
Exercises
Look ahead – Learn to recognize MPI advantages and drawbacks to better assess its future
MPI axioms
Reliable data transfer
Ordered message delivery
Dense process rank sequence
Exercises
MPI-4 en route
Fault tolerance
Exercises
Beyond MPI
Exascale challenge
Exercises
References – Learn about books that may further extend your knowledge
Appendices
MPI Families – Learn about major MPI implementation families, their genesis, architecture and relative performance
MPICH
Genesis
Architecture
Details
MPICH
MVAPICH
Intel MPI
…
Exercises
OpenMPI
Genesis
Architecture
Details
Exercises
Comparison
Market
Features
Performance
Exercises
Alternative interfaces – Learn about other popular interfaces that are used to implement MPI
DAPL
…
Exercises
SHMEM
…
Exercises
GasNET
…
Exercises
Portals
…
Exercises
Solutions to all exercises – Learn how to answer all those questions