HIGH-QUALITY AND HIGH-EFFICIENCY IMAGE/VIDEO MATTING

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "HIGH-QUALITY AND HIGH-EFFICIENCY IMAGE/VIDEO MATTING"

By

Miss Yanan SUN


Abstract:

Matting has long been a primary technique for image/video editing. Traditional
matting methods outlined matting problem and made a preliminary exploration but
their performance is limited by the low-level image feature. This issue has
been addressed to a considerable extent with the introduction of deep neural
networks. However, the vigorous upgrading of the multimedia industry in recent
years has posed more challenges, including diverse media content and
application scenarios, commodity-level devices with limited resources, and the
popularity of HD/UHD display screens. To overcome these challenges, this thesis
explores matting task from four different perspectives: accuracy of image
matting, temporal coherence of video matting, efficiency of image and video
matting, and instance-level matting.

The first study improves image matting performance by utilizing semantic
information in alpha mattes. We propose Semantic Image Matting (SIM), which
reasons the underlying causes of matting due to various foreground objects and
incorporates semantic classification of matting regions to obtain better alpha
mattes. The method extends the conventional trimap to semantic trimap, learns a
multi-class discriminator to regularize alpha prediction at semantic level, and
content-sensitive weights to balance different regularization losses. The study
outperforms other methods, achieving competitive state-ofthe- art performance
in multiple benchmarks.

The second study proposes a deep learning-based video matting framework (DVM)
that uses a spatio-temporal feature aggregation module (ST-FAM) to address the
inherent technical challenges in reasoning the temporal domain. ST-FAM aligns
and aggregates temporal information in high dimension across multiple frames
through deformable convolution to overcome the unreliability of optical flow
estimation within matting regions. The study also introduces a lightweight
trimap propagation network to eliminate frameby- frame trimap annotations.

The third study proposes SparseMat, a computationally efficient approach for
ultrahigh resolution (UHR) image/video matting. It's infeasible to process UHR
images at full resolution using existing matting algorithms without running out
of memory on consumerlevel computational platforms. SparseMat uses spatial and
temporal sparsity to address general UHR matting and reduce computation
redundancy. The method generates highquality alpha matte for UHR images and
videos at the original high resolution in a single pass.

The last study proposes the new task of instance matting (IM), requiring
precise alpha matte prediction for each instance. To solve instance matting,
the study introduces InstMatt, to tackle technical challenges such as mingled
colors and overlapping boundaries. InstMatt includes a novel mutual guidance
strategy and a multi-instance refinement module to delineate multi-instance
relationships. Our InstMatt produces high-quality instance-level alpha matte
and can be adapted to different classes.


Date:                   Friday, 18 August 2023

Time:                   10:00am - 12:00noon

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Prof. Rongrong ZHOU (MARK)

Committee Members:      Prof. Chi Keung TANG (Supervisor)
                        Prof. Pedro SANDER
                        Prof. Dan XU
                        Prof. Weichuan YU (ECE)
                        Prof. Jinwei GU (CUHK)


**** ALL are Welcome ****