Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion based Classification.- Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition.- Auto-Learning-GCN: An Ingenious Framework for Skeleton-based Action Recognition.- Skeleton-based Action Recognition with Combined Part-wise Topology Graph Convolutional Networks.- Segmenting Key Clues to Induce Human-Object Interaction Detection.- Lightweight Multispectral Skeleton and Multi-Stream Graph Attention Networks for Enhanced Action Prediction with Multiple Modalities.- Spatio-temporal Self-supervision for Few-shot Action Recognition.- A Fuzzy Error based Fine-tune Method for Spatio-temporal Recognition Model.- Temporal-Channel Topology Enhanced Network for Skeleton-Based Action Recognition.- HFGCN-Based Action Recognition System for Figure Skating.- Image Priors Assisted Pre-training for Point Cloud Shape Analysis.-AMM-GAN: Attribute-Matching Memory for Person Text-to-Image Generation.- RecFormer: Recurrent Multi-modal Transformer with History-aware Contrastive Learning for Visual Dialog.- KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing.- Enhancing Text-Image Person Retrieval through Nuances Varied Sample.- Unsupervised Prototype Adapter for Vision-Language Models.- Multimodal Causal Relations Enhanced CLIP for Image-to-Text Retrieval.- Exploring Cross-Modal Inconsistency in Entities and Emotions for Multimodal Fake News Detection.- Deep Consistency Preserving Network for Unsupervised Cross-modal Hashing.- Learning Adapters for Text-guided Portrait Stylization with Pretrained Diffusion Models.- EdgeFusion: Infrared and Visible Image Fusion Algorithm in Low Light.- An Efficient Momentum Framework for Face-Voice Association Learning.- Multi-modal Instance Refinement for Cross-domain Action Recognition.- Modality Interference Decoupling and Representation Alignment for Caricature-Visual Face Recognition.- Plugging Stylized Controls in Open-Stylized Image Captioning.- MGT: Modality-Guided Transformer for Infrared and Visible Image Fusion.- Multimodal Rumor Detection by Using Additive Angular Margin with Class-aware Attention for Hard Samples.- An Effective Dynamic Reweighting Method for Unbiased Scene Graph Generation.- Multi-modal Graph and Sequence Fusion Learning for Recommendation.- Co-attention guided local-global feature fusion for aspect-level multimodal sentiment analysis.- Discovering Multimodal Hierarchical Structures with Graph Neural Networks for Multi-modal and Multi-hop Question Answering.- Enhancing Recommender System with Multi-modal Knowledge Graph.- Location Attention Knowledge Embedding Model for Image-Text Matching.- Pedestrian Attribute Recognition Based on Multimodal Transformer.- RGB-D Road Segmentation Based on Geometric Prior Information.- Contrastive Perturbation Network for Weakly Supervised Temporal Sentence Grounding.- MLDF-Net: Metadata Based Multi-level Dynamic Fusion Network.- Efficient Adversarial Training with Membership Inference Resistance.- Enhancing Image Comprehension for Computer Science Visual Question Answering.- Cross-Modal Attentive Recalibration and Dynamic Fusion for Multispectral Pedestrian Detection.