RSID: A remote sensing image dehazing network.- ContextNet: Learning Context Information for Texture-less Light Field Depth Estimation.- An Efficient Way for Active None-line-of-sight: End-to-end Learned Compressed NLOS Imaging.- DFAR-Net: Dual-Input Three-Branch Attention Fusion Reconstruction Network for Polarized Non-Line-of-Sight Imaging.- EVCPP:Example-driven Virtual Camera Pose Prediction for cloud performing arts scenes.- RBSR: Efficient and Flexible Recurrent Network for Burst Super-Resolution.- WDU-Net: Wavelet-Guided Deep Unfolding Network for Image Compressed Sensing Reconstruction.- Memory-Augmented Spatial-Temporal Consistency Network for Video Anomaly Detection.- Frequency and Spatial Domain Filter Network for Visual Object Tracking.- Enhancing Feature Representation for Anomaly Detection via Local-and-Global Temporal Relations and a Multi-Stage Memory.- DFAformer: A Dual Filtering Auxiliary Transformer for Efficient Online Action Detection in Streaming Videos.- Relation-guided Multi-stage Feature Aggregation Network for Video Object Detection.- Multimodal Local Feature Enhancement Network for Video Summarization.- Asymmetric Attention Fusion for Unsupervised Video Object Segmentatio.- Flow-Guided Diffusion Autoencoder for Unsupervised Video Anomaly detection.- Prototypical Transformer for Weakly Supervised Action Segmentation.- Unimodal-Multimodal Collaborative Enhancement for Audio-Visual Event Localization.- Dual-memory feature aggregation for video object detection.- Going Beyond Closed Sets: A Multimodal Perspective for Video Emotion Analysis.- Temporal-Semantic Context Fusion for Robust Weakly Supervised Video Anomaly Detection.- A Survey: the Sensor-based Method for Sign Language Recognition.- Utilizing Video Word Boundaries and Feature-based Knowledge Distillation Improving Sentence-level Lip Reading.- Denoised Temporal Relation Network for Temporal Action Segmentation.- 3D Lightweight Spatial-Spectral Attention Network for Hyperspectral Image Classification.- Deepfake Detection via Fine-Grained Classification and Global-Local Information Fusion.- Unsupervised Image-to-Image Translation with Style Consistency.- SemanticCrop: Boosting Contrastive Learning via Semantic-cropped Views.- Transformer-based multi-object tracking in unmanned aerial vehicles.- HEI-GAN: A Human-Environment Interaction based GAN for Multimodal Human Trajectory Prediction.- CenterMatch: A Center Matching Method for Semi-supervised Facial Expression Recognition.- Cross-Dataset Distillation with Multi-Tokens for Image Quality Assessment.- Quality-Aware CLIP for Blind Image Quality Assessment.- Multi-Agent Perception via Co-Attentive Communication Mechanism.- DBRNet:Dual-Branch Real-Time Segmentation NetWork For Metal Defect Detection.- MaskDiffuse: Text-guided Face Mask Removal based on Diffusion Models.- Image Generation Based Intra-class Variance Smoothing for Fine-grained Visual Classification.- Cross-Domain Soft Adaptive Teacher for Syn2Real Object Detection.- Dynamic Graph-Driven Heat Diffusion: Enhancing Industrial Semantic Segmentation.- EKGRL: Entity-based Knowledge Graph Representation Learning for Fact-based Visual Question Answering.- Disentangled Attribute Features Vision Transformer for Pedestrian Attribute Recognition.- A high-resolution network based on feature redundancy reduction and attention mechanism.