1. Introduction to Multimodal Scene Understanding Michael Ying Yang, Bodo Rosenhahn and Vittorio Murino 2. Multi-modal Deep Learning for Multi-sensory Data Fusion Asako Kanezaki, Ryohei Kuga, Yusuke Sugano and Yasuyuki Matsushita 3. Multi-Modal Semantic Segmentation: Fusion of RGB and Depth Data in Convolutional Neural Networks Zoltan Koppanyi, Dorota Iwaszczuk, Bing Zha, Can Jozef Saul, Charles K. Toth and Alper Yilmaz 4. Learning Convolutional Neural Networks for Object Detection with very little Training Data Christoph Reinders, Hanno Ackermann, Michael Ying Yang and Bodo Rosenhahn 5. Multi-modal Fusion Architectures for Pedestrian Detection Dayan Guan, Jiangxin Yang, Yanlong Cao, Michael Ying Yang and Yanpeng Cao 6. ThermalGAN: Multimodal Color-to-Thermal Image Translation for Person Re-Identification in Multispectral Dataset Vladimir A. Knyaz and Vladimir V. Kniaz 7. A Review and Quantitative Evaluation of Direct Visual-Inertia Odometry Lukas von Stumberg, Vladyslav Usenko and Daniel Cremers 8. Multimodal Localization for Embedded Systems: A Survey Imane Salhi, Martyna Poreba, Erwan Piriou, Valerie Gouet-Brunet and Maroun Ojail 9. Self-Supervised Learning from Web Data for Multimodal Retrieval Raul Gomez, Lluis Gomez, Jaume Gibert and Dimosthenis Karatzas 10. 3D Urban Scene Reconstruction and Interpretation from Multi-sensor Imagery Hai Huang, Andreas Kuhn, Mario Michelini, Matthais Schmitz and Helmut Mayer 11. Decision Fusion of Remote Sensing Data for Land Cover Classification Arnaud Le Bris, Nesrine Chehata, Walid Ouerghemmi, Cyril Wendl, Clement Mallet, Tristan Postadjian and Anne Puissant 12. Cross-modal learning by hallucinating missing modalities in RGB-D vision Nuno Garcia, Pietro Morerio and Vittorio Murino