Sentiment-aware Multi-modal Recommendation on Tourist Attractions.- SCOD: Dynamical Spatial Constraints for Object Detection.- STMP: Spatial Temporal Multi-level Proposal Network for Activity Detection.- Hierarchical Vision-Language Alignment for Video Captioning.- Task-Driven Biometric Authentication of Users in Virtual Reality (VR) Environments.- Deep Neural Network Based 3D Articulatory Movement Prediction Using Both Text and Audio Inputs.- Subjective Visual Quality Assessment of Immersive 3D Media Compressed by Open-Source Static 3D Mesh Codecs.- Joint EPC and RAN Caching of Tiled VR Videos for Mobile Networks.- Foveated Ray Tracing for VR Headsets.- Preferred Model of Adaptation to Dark for Virtual Reality Headsets.- From Movement to Events: Improving Soccer Match Annotations.- Multimodal Video Annotation for Retrieval and Discovery of Newsworthy Video in a News Verification Scenario.- Integration of Exploration and Search: A Case Study of the M^3 Model.- Face Swapping for Solving Collateral Privacy Issues in Multimedia Analytics.- Exploring the Impact of Training Data Bias on Automatic Generation of Video Captions.- Fashion Police: Towards Semantic Indexing of Clothing Information In Surveillance Data.- CNN-Based Non-Contact Detection of Food Level in Bottles from RGB Images.- Personalized Recommendation of Photography Based on Deep Learning.- Two-level Attention with Multi-task Learning for Facial Emotion Estimation.- User Interaction for Visual Lifelog Retrieval in a Virtual Environment.- Query-by-Dancing: A Dance Music Retrieval System Based on Body-Motion Similarity.- Joint Visual-Textual Sentiment Analysis Based on Cross-modality Attention Mechanism.- Deep Hashing with Triplet Labels and Unification Binary Code Selection for Fast Image Retrieval.- Incremental Training for Face Recognition.- Character Prediction in TV Series via a Semantic Projection Network.- A Test Collection for Interactive Lifelog Retrieval.- SEPHLA: Challenges and Opportunities within Environment – Personal Health Archives.- Athens Urban Soundscape (ATHUS): A dataset for urban soundscape quality recognition.- V3C - a Research Video Collection.- Image Aesthetics Assessment using Fully Convolutional Neural Networks.- Detecting tampered videos with multimedia forensics and deep learning.- Improving Robustness of Image Tampering Detection for Compression.- Audiovisual annotation procedure for multi-view field recordings.- A Robust Multi-Athlete Tracking Algorithm by Exploiting Discriminant Features and Long-Term Dependencies.- Early Identification of Oil Spills in Satellite Images Using Deep CNNs.- Point Cloud Colorization Based on Densely Annotated 3D Shape Dataset.- evolve2vec: Learning Network Representations Using Temporal Unfolding.- The Impact of Packet Loss and Google Congestion Control on QoE for WebRTC-based Mobile Multiparty Audiovisual Telemeetings.- Hierarchical Temporal Pooling for Efficient Online Action Recognition.- Generative Adversarial Networks with Enhanced Symmetric Residual Units for Single Image Super-Resolution.- 3D ResNets for 3D object classification.- Four Models for Automatic Recognition of Left and Right Eye in Fundus Images.- On the unsolved problem of Shot Boundary Detection for Music Videos.- Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention.- Exploiting Incidence Relation Between Subgroups for Improving Clustering-Based Recommendation Model.- Hierarchical Bayesian Network based Incremental Model for Flood Prediction.- A New Female Body Segmentation and Feature Localisation Method for Image-based Anthropometry.- Greedy Salient Dictionary Learning For Activity Video Summarization.- Accelerating Topic Detection on Web for a Large-Scale Data Set via Stochastic Poisson Deconvolution.- Automatic Segmentation of Brain Tumor Images Based on Region Growing with Co-constraint.- Proposal of an Annotation Method for Integrating Musical Technique Knowledge using a GTTM Time-Span Tree.- A hierarchical level set approach to for RGBD image matting.- A Genetic Programming Approach to Integrate Multilayer CNN Features for Image Classification.- Improving Micro-Expression Recognition Accuracy using Twofold Feature Extraction.- An effective dual-fisheye lens stitching method based on feature points.- 3D Skeletal Gesture Recognition via Sparse Coding of Time-Warping Invariant Riemannian Trajectories.- Efficient Graph based Multi-View Leaning.- DANTE speaker recognition module. An efficient and robust automatic speaker searching solution for terrorism-related scenarios.