LEARNING VISUAL CORRESPONDENCES FOR GEOMETRY RECOVERY

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "LEARNING VISUAL CORRESPONDENCES FOR GEOMETRY RECOVERY"

By

Mr. Hongkai CHEN


Abstract:

Identifying robust and accurate visual correspondences across images, also
known as image matching, has been a long-standing topic in computer vision
research. Particularly, image matching serves a fundamental step in
reconstructing real-world geometry from multi-view photos, which receives
widespread attention from series of industrial applications, including
metaverse, AR/VR and autonomous driving. Traditionally, image matching involves
a series of discrete steps and hand-crafted algorithms. Although proven
effective in general cases, their manually designed features and matching
strategy are often insufficient to cope with challenging matching scenarios,
such as low-texture regions, large perspective changes or very low-overlap
pairs. In this thesis, we are dedicated to further improving the accuracy and
robustness of image matching algorithms, particularly through the utilization
of deep learning techniques.

We first propose a graph neural network (GNN), which inherit traditional
keypoint-based matching scheme, to regularize matching cost jointly reasoning
about visual similarity and matching consensus. Specifically, to avoid
exhaustive interaction among image keypoints, we leverage a small set of
pre-seleceted relatively reliable matches, referred to as seed matches, to
guide matching of a whole keypoint set. By integrating seed matches with a
series of efficient attentive operations, we prove that even a very limited set
of seeds could provide strong clues to assist matching of other keypoints.
Through comprehensive experiments, we demonstrate that our approach achieves
competitive performance compared with state-of-the-art GNN-based matcher while
maintaining modest computational costs.

Jumping out of keypoint-based matching, we then presenet an end-to-end
Transformerbased matcher that directly works on raw image pairs and skip the
step of keypoint detection. To tackle the quadratic complexity caused by dense
operation on images for vanilla transformer, we propose a global-local
attention framework to ensure both global long-range interaction and local
fine-level interaction. Specially, instead of setting local attention span as a
fixed size, we adjust it according to learned matching uncertainty, which
balances matching coverage and interaction granularity in an adaptive way.
Through comprehensive evaluation, we prove that our designed attention
framework significantly improve the quality of obtained matches and boosts the
accuracy of camera pose estimation. Particularly, we outperform our
counterparts that also adopt efficient Transformer design by a large margin.

Finally, taking one step further from our previous work, we propose a
geometry-aware deformable attention to enhance local attention in
Transformer-based matcher. Towards better modeling of ubiquitous local
deformation caused by view-point changes, we estimate patchwise parametric
deformation filed from intermediate matching results, which are used to shape
local attention pattern. Through this design, we embed deformation priors into
the process of matching in a principled and intuitive manner. Experiments show
that our design considerably improves the effectiveness of global-local
attention framework and produces high quality visual correspondences for both
two-view pose estimation and visual localization.

With intensive investigation and innovation, we aspire to further advance the
performance boundaries of image matching and empower a wider range of 2D and 3D
applications.


Date:                   Tuesday, 22 August 2023

Time:                   2:00pm - 4:00pm

Venue:                  Room 4475
                        Lifts 25/26

Chairman:               Prof. Lixin WU (MATH)

Committee Members:      Prof. Long QUAN (Supervisor)
                        Prof. Chi Keung TANG
                        Prof. Dan XU
                        Prof. Weichuan YU (ECE)
                        Prof. Tien Tsin WONG (CUHK)


**** ALL are Welcome ****
Privacy Sitemap
LEARNING VISUAL CORRESPONDENCES FOR GEOMETRY RECOVERY

About

People

Research

Academics

Admissions