Publications

By categories in reversed chronological order (#1 indicates Co-First Author, * indicates Corresponding Author).

2025

MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration Boyun Li, Haiyu Zhao, Wenxin Wang, Peng Hu, Yuanbiao Gou*, Xi Peng* IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. [Abstract] [Bib] [Code] [HTML] [PDF]
None

None
MUNet: A Lightweight Mamba-based Under-Display Camera Restoration Network Wenxin Wang, Boyun Li, Wanli Liu, Xi Peng*, Yuanbiao Gou* Image and Vision Computing (IVC), 2025. [Abstract] [Bib] [Code] [HTML] [PDF]
Under-Display Camera (UDC) restoration aims to recover the underlying clean images from the degraded images captured by UDC. Although promising results have been achieved, most existing UDC restoration methods still suffer from two vital obstacles in practice: (i) existing UDC restoration models are parameter-intensive, and (ii) most of them struggle to capture long-range dependencies within high-resolution images. To overcome above drawbacks, we study a challenging problem in UDC restoration, namely, how to design a lightweight UDC restoration model that could capture long-range image dependencies. To this end, we propose a novel lightweight Mamba-based UDC restoration network (MUNet) consisting of two modules, named Separate Multi-scale Mamba (SMM) and Separate Convolutional Feature Extractor (SCFE). Specifically, SMM exploits our proposed alternate scanning strategy to efficiently capture long-range dependencies across multi-scale image features. SCFE preserves local dependencies through convolutions with various receptive fields. Thanks to SMM and SCFE, MUNet achieves state-of-the-art lightweight UDC restoration performance with significantly fewer parameters, making it well-suited for deployment on mobile devices. Our codes will be available after acceptance.

@article{MUNet,
   title={MUNet: A lightweight Mamba-based Under-Display Camera restoration network},
   author={Wang, Wenxin and Li, Boyun and Liu, Wanli and Peng, Xi and Gou, Yuanbiao},
   journal={Image and Vision Computing},
   pages={105486},
   year={2025}
}

2024

AverNet: All-in-one Video Restoration for Time-varying Unknown Degradations Haiyu Zhao, Lei Tian, Xinyan Xiao, Peng Hu, Yuanbiao Gou*, Xi Peng* Neural Information Processing Systems (NeurIPS), 2024. [Abstract] [Bib] [Code] [HTML] [PDF]
Traditional video restoration approaches were designed to recover clean videos from a specific type of degradation, making them ineffective in handling multiple unknown types of degradation. To address this issue, several studies have been conducted and have shown promising results. However, these studies overlook that the degradations in video usually change over time, dubbed time-varying unknown degradations (TUD). To tackle such a less-touched challenge, we propose an innovative method, termed as All-in-one VidEo Restoration Network (AverNet), which comprises two core modules, i.e., Prompt-Guided Alignment (PGA) module and Prompt-Conditioned Enhancement (PCE) module. Specifically, PGA addresses the issue of pixel shifts caused by time-varying degradations by learning and utilizing prompts to align video frames at the pixel level. To handle multiple unknown degradations, PCE recasts it into a conditional restoration problem by implicitly establishing a conditional map between degradations and ground truths. Thanks to the collaboration between PGA and PCE modules, AverNet empirically demonstrates its effectiveness in recovering videos from TUD. Extensive experiments are carried out on two synthesized datasets featuring seven types of degradations with random corruption levels. The code is available at https://github.com/XLearning-SCU/2024-NeurIPS-AverNet.

@inproceedings{AverNet,
   title={AverNet: All-in-one Video Restoration for Time-varying Unknown Degradations},
   author={Haiyu Zhao and Lei Tian and Xinyan Xiao and Peng Hu and Yuanbiao Gou and Xi Peng},
   booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
   year={2024}
}
Test-Time Degradation Adaptation for Open-Set Image Restoration Yuanbiao Gou, Haiyu Zhao, Boyun Li, Xinyan Xiao, Xi Peng*, International Conference on Machine Learning (ICML), 2024. (Spotlight ~3.5%) [Abstract] [Bib] [Code] [HTML] [PDF]
In contrast to close-set scenarios that restore images from a predefined set of degradations, open-set image restoration aims to handle the unknown degradations that were unforeseen during the pretraining phase, which is less-touched as far as we know. This work study this challenging problem and reveal its essence as unidentified distribution shifts between the test and training data. Recently, test-time adaptation has emerged as a fundamental method to address this inherent disparities. Inspired by it, we propose a test-time degradation adaptation framework for open-set image restoration, which consists of three components, \textit{i.e.}, i) a pre-trained and degradation-agnostic diffusion model for generating clean images, ii) a test-time degradation adapter adapts the unknown degradations based on the input image during the testing phase, and iii) the adapter-guided image restoration guides the model through the adapter to produce the corresponding clean image. Through experiments on multiple degradations, we show that our method achieves comparable even better performance than those task-specific methods. The code is available at https://github.com/XLearning-SCU/2024-ICML-TAO.

@inproceedings{TAO,
   title={Test-Time Degradation Adaptation for Open-Set Image Restoration},
   author={Yuanbiao Gou and Haiyu Zhao and Boyun Li and Xinyan Xiao and Xi Peng},
   booktitle={Forty-first International Conference on Machine Learning},
   year={2024}
}

2023

Rethinking Image Super Resolution from Long-Tailed Distribution Learning Perspective Yuanbiao Gou, Peng Hu, Jiancheng Lv, Hongyuan Zhu, Xi Peng*, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. [Abstract] [Bib] [Code] [HTML] [PDF]
Existing studies have empirically observed that the resolution of the low-frequency region is easier to enhance than that of the high-frequency one. Although plentiful works have been devoted to alleviating this problem, little understanding is given to explain it. In this paper, we try to give a feasible answer from a machine learning perspective, i.e., the twin fitting problem caused by the long-tailed pixel distribution in natural images. With this explanation, we reformulate image super resolution (SR) as a long-tailed distribution learning problem and solve it by bridging the gaps of the problem between in low- and high-level vision tasks. As a result, we design a long-tailed distribution learning solution, that rebalances the gradients from the pixels in the low- and high-frequency region, by introducing a static and a learnable structure prior. The learned SR model achieves better balance on the fitting of the low- and high-frequency region so that the overall performance is improved. In the experiments, we evaluate the solution on four CNN- and one Transformer-based SR models w.r.t. six datasets and three tasks, and experimental results demonstrate its superiority.

@inproceedings{FPL,
   author = {Gou, Yuanbiao and Hu, Peng and Lv, Jiancheng and Zhu, Hongyuan and Peng, Xi},
   title = {Rethinking Image Super Resolution From Long-Tailed Distribution Learning Perspective},
   booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
   month = {June},
   year = {2023},
   pages = {14327-14336}
}
Comprehensive and Delicate: An Efficient Transformer for Image Restoration Haiyu Zhao#1, Yuanbiao Gou#1, Boyun Li, Dezhong Peng, Jiancheng Lv, Xi Peng* IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. [Abstract] [Bib] [Code] [HTML] [PDF]
Vision Transformers have shown promising performance in image restoration, which usually conduct window- or channel-based attention to avoid intensive computations. Although the promising performance has been achieved, they go against the biggest success factor of Transformers to a certain extent by capturing the local instead of global dependency among pixels. In this paper, we propose a novel efficient image restoration Transformer that first captures the superpixel-wise global dependency, and then transfers it into each pixel. Such a coarse-to-fine paradigm is implemented through two neural blocks, i.e., condensed attention neural block (CA) and dual adaptive neural block (DA). In brief, CA employs feature aggregation, attention computation, and feature recovery to efficiently capture the global dependency at the superpixel level. To embrace the pixel-wise global dependency, DA takes a novel dual-way structure to adaptively encapsulate the globality from superpixels into pixels. Thanks to the two neural blocks, our method achieves comparable performance while taking only 6% FLOPs compared with SwinIR.

@inproceedings{CODE,
   author = {Zhao, Haiyu and Gou, Yuanbiao and Li, Boyun and Peng, Dezhong and Lv, Jiancheng and Peng, Xi},
   title = {Comprehensive and Delicate: An Efficient Transformer for Image Restoration},
   booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
   month = {June},
   year = {2023},
   pages = {14122-14132}
}

2022

Multi-Scale Adaptive Network for Single Image Denoising Yuanbiao Gou, Peng Hu, Jiancheng Lv, Joey Tianyi Zhou, Xi Peng*, Neural Information Processing Systems (NeurIPS), 2022. [Abstract] [Bib] [Code] [HTML] [PDF]
Multi-scale architectures have shown effectiveness in a variety of tasks thanks to appealing cross-scale complementarity. However, existing architectures treat different scale features equally without considering the scale-specific characteristics, \textit{i.e.}, the within-scale characteristics are ignored in the architecture design. In this paper, we reveal this missing piece for multi-scale architecture design and accordingly propose a novel Multi-Scale Adaptive Network (MSANet) for single image denoising. Specifically, MSANet simultaneously embraces the within-scale characteristics and the cross-scale complementarity thanks to three novel neural blocks, \textit{i.e.}, adaptive feature block (AFeB), adaptive multi-scale block (AMB), and adaptive fusion block (AFuB). In brief, AFeB is designed to adaptively preserve image details and filter noises, which is highly expected for the features with mixed details and noises. AMB could enlarge the receptive field and aggregate the multi-scale information, which meets the need of contextually informative features. AFuB devotes to adaptively sampling and transferring the features from one scale to another scale, which fuses the multi-scale features with varying characteristics from coarse to fine. Extensive experiments on both three real and six synthetic noisy image datasets show the superiority of MSANet compared with 12 methods. The code could be accessed from https://github.com/XLearning-SCU/2022-NeurIPS-MSANet.

@inproceedings{MSANet,
   title={Multi-Scale Adaptive Network for Single Image Denoising},
   author={Yuanbiao Gou and Peng Hu and Jiancheng Lv and Joey Tianyi Zhou and Xi Peng},
   booktitle={Advances in Neural Information Processing Systems},
   year={2022}
}
Dual Contrastive Prediction for Incomplete Multi-view Representation Learning Yijie Lin, Yuanbiao Gou, Xiaotian Liu, Jinfeng Bai, Jiancheng Lv, Xi Peng* Pattern Analysis and Machine Intelligence (IEEE TPAMI), 2022. (Highly Cited Papers 1%) [Abstract] [Bib] [Code] [HTML] [PDF]
In this article, we propose a unified framework to solve the following two challenging problems in incomplete multi-view representation learning: i) how to learn a consistent representation unifying different views, and ii) how to recover the missing views. To address the challenges, we provide an information theoretical framework under which the consistency learning and data recovery are treated as a whole. With the theoretical framework, we propose a novel objective function which jointly solves the aforementioned two problems and achieves a provable sufficient and minimal representation. In detail, the consistency learning is performed by maximizing the mutual information of different views through contrastive learning, and the missing views are recovered by minimizing the conditional entropy through dual prediction. To the best of our knowledge, this is one of the first works to theoretically unify the cross-view consistency learning and data recovery for representation learning. Extensive experimental results show that the proposed method remarkably outperforms 20 competitive multi-view learning methods on six datasets in terms of clustering, classification, and human action recognition. The code could be accessed from https://pengxi.me.

@article{DCP,
   title={Dual Contrastive Prediction for Incomplete Multi-View Representation Learning},
   author={Lin, Yijie and Gou, Yuanbiao and Liu, Xiaotian and Bai, Jinfeng and Lv, Jiancheng and Peng, Xi},
   journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
   year={2022},
   volume={},
   number={},
   pages={1-14}
}

2021

COMPLETER: Incomplete Multi-View Clustering via Contrastive Prediction Yijie Lin, Yuanbiao Gou, Zitao Liu, Boyun Li, Jiancheng Lv, Xi Peng* IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 19-25, 2021. [Abstract] [Bib] [Code] [HTML] [PDF]
In this paper, we study two challenging problems in incomplete multi-view clustering analysis, namely, i) how to learn an informative and consistent representation among different views without the help of labels and ii) how to recover the missing views from data. To this end, we propose a novel objective that incorporates representation learning and data recovery into a unified framework from the view of information theory. To be specific, the informative and consistent representation is learned by maximizing the mutual information across different views through contrastive learning, and the missing views are recovered by minimizing the conditional entropy of different views through dual prediction. To the best of our knowledge, this could be the first work to provide a theoretical framework that unifies the consistent representation learning and cross-view data recovery. Extensive experimental results show the proposed method remarkably outperforms 10 competitive multi-view clustering methods on four challenging datasets. The code is available at https://pengxi.me.

@inproceedings{COMPLETER,
   title={COMPLETER: Incomplete Multi-View Clustering via Contrastive Prediction},
   author={Lin, Yijie and Gou, Yuanbiao and Liu, Zitao and Li, Boyun and Lv, Jiancheng and Peng, Xi},
   booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
   pages={11174--11183},
   year={2021}
}
You Only Look Yourself: Unsupervised and Untrained Single Image Dehazing Neural Network Boyun Li#1, Yuanbiao Gou#1, Shuhang Gu, Jerry Zitao Liu, Joey Tianyi Zhou, Xi Peng* International Journal of Computer Vision (IJCV) , 2021. (Highly Cited Papers 1%) [Abstract] [Bib] [Code] [HTML] [PDF]
In this paper, we study two challenging and less-touched problems in single image dehazing, namely, how to make deep learning achieve image dehazing without training on the ground-truth clean image (unsupervised) and an image collection (untrained). An unsupervised model will avoid the intensive labor of collecting hazy-clean image pairs, and an untrained model is a “real” single image dehazing approach which could remove haze based on the observed hazy image only and no extra images are used. Motivated by the layer disentanglement, we propose a novel method, called you only look yourself (YOLY) which could be one of the first unsupervised and untrained neural networks for image dehazing. In brief, YOLY employs three joint subnetworks to separate the observed hazy image into several latent layers, i.e., scene radiance layer, transmission map layer, and atmospheric light layer. After that, three layers are further composed to the hazy image in a self-supervised manner. Thanks to the unsupervised and untrained characteristics of YOLY, our method bypasses the conventional training paradigm of deep models on hazy-clean pairs or a large scale dataset, thus avoids the labor-intensive data collection and the domain shift issue. Besides, our method also provides an effective learning-based haze transfer solution thanks to its layer disentanglement mechanism. Extensive experiments show the promising performance of our method in image dehazing compared with 14 methods on six databases. The code could be accessed at www.pengxi.me.

@article{YOLY,
   title={You Only Look Yourself: Unsupervised and Untrained Single Image Dehazing Neural Network},
   author={Li, Boyun and Gou, Yuanbiao and Gu, Shuhang and Liu, Jerry Zitao and Zhou, Joey Tianyi and Peng, Xi},
   journal={International Journal of Computer Vision},
   volume={129},
   number={5},
   pages={1754--1767},
   year={2021}
}

2020

CLEARER: Multi-Scale Neural Architecture Search for Image Restoration Yuanbiao Gou, Boyun Li, Zitao Liu, Songfan Yang, Xi Peng* Neural Information Processing Systems (NeurIPS), 2020. [Abstract] [Bib] [Code] [HTML] [PDF]
Multi-scale neural networks have shown effectiveness in image restoration tasks, which are usually designed and integrated in a handcrafted manner. Different from the existing labor-intensive handcrafted architecture design paradigms, we present a novel method, termed as multi-sCaLe nEural ARchitecture sEarch for image Restoration (CLEARER), which is a speciﬁcally designed neural architecture search (NAS) for image restoration. Our contributions are twofold. On one hand, we design a multi-scale search space that consists of three task-ﬂexible modules. Namely, 1) Parallel module that connects multi-resolution neural blocks in parallel, while preserving the channels and spatial-resolution in each neural block, 2) Transition module remains the existing multi-resolution features while extending them to a lower resolution, 3) Fusion module integrates multi-resolution features by passing the features of the parallel neural blocks to the current neural blocks. On the other hand, we present novel losses which could 1) balance the tradeoff between the model complexity and performance, which is highly expected to image restoration; and 2) relax the discrete architecture parameters into a continuous distribution which approximates to either 0 or 1. As a result, a differentiable strategy could be employed to search when to fuse or extract multi-resolution features, while the discretization issue faced by the gradient-based NAS could be alleviated. The proposed CLEARER could search a promising architecture in two GPU hours. Extensive experiments show the promising performance of our method comparing with nine image denoising methods and eight image deraining approaches in quantitative and qualitative evaluations. The codes are available at https://github.com/limit-scu.

@article{CLEARER,
   title={CLEARER: Multi-Scale Neural Architecture Search for Image Restoration},
   author={Gou, Yuanbiao and Li, Boyun and Liu, Zitao and Yang, Songfan and Peng, Xi},
   journal={Advances in Neural Information Processing Systems},
   volume={33},
   year={2020}
}
Zero-Shot Image Dehazing Boyun Li, Yuanbiao Gou, Jerry Zitao Liu, Hongyuan Zhu, Joey Tianyi Zhou, Xi Peng* IEEE Transactions on Image Processing, vol. 29, pp. 8457-8466, 2020. [Abstract] [Bib] [Code] [HTML] [PDF]
In this article, we study two less-touched challenging problems in single image dehazing neural networks, namely, how to remove haze from a given image in an unsupervised and zero-shot manner. To the ends, we propose a novel method based on the idea of layer disentanglement by viewing a hazy image as the entanglement of several “simpler” layers, i.e., a hazy-free image layer, transmission map layer, and atmospheric light layer. The major advantages of the proposed ZID are two-fold. First, it is an unsupervised method that does not use any clean images including hazy-clean pairs as the ground-truth. Second, ZID is a “zero-shot” method, which just uses the observed single hazy image to perform learning and inference. In other words, it does not follow the conventional paradigm of training deep model on a large scale dataset. These two advantages enable our method to avoid the labor-intensive data collection and the domain shift issue of using the synthetic hazy images to address the real-world images. Extensive comparisons show the promising performance of our method compared with 15 approaches in the qualitative and quantitive evaluations. The source code could be found at http://www.pengxi.me.

@article{ZID,
   title={Zero-shot Image Dehazing},
   author={Li, Boyun and Gou, Yuanbiao and Liu, Jerry Zitao and Zhu, Hongyuan and Zhou, Joey Tianyi and Peng, Xi},
   journal={IEEE Transactions on Image Processing},
   volume={29},
   pages={8457--8466},
   year={2020}
}
Interpretable Neural Networks: From Differentiable Programming Perspective Yuanbiao Gou, Boyun Li, Zhenyu Huang, Xi Peng* Communications of the CCF, pp: 42-47, vol 15, no 11, 2019. (in Chinese) [PDF]