Assessing the Reliability of Deep Learning Applications

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Assessing the Reliability of Deep Learning Applications"

By

Mr. Yongqiang TIAN


Abstract:

Deep Learning (DL) applications are widely deployed in diverse areas, such 
as image classification, natural language processing, and auto-driving 
systems. Although these applications achieve outstanding performance in 
certain metrics like accuracy, developers have raised strong concerns 
about their reliability since the logic of DL applications is a black box 
for humans. Specifically, DL applications learn their logic during 
stochastic training and encode it in high-dimensional weights of DL 
models. Unlike source code in conventional software, such weights are 
infeasible for humans to directly interpret, examine, and validate. As a 
result, the reliability issues in DL applications are not easy to detect 
and may cause catastrophic accidents in safety-critical missions. 
Therefore, it is critical to adequately assess the reliability of DL 
applications.

This thesis aims to help software developers assess the reliability of DL 
applications from the following three perspectives.

The first study proposes object-relevancy, a property that reliable 
DL-based image classifiers should comply with, i.e., the classification 
results should be made based on the features relevant to the target object 
in a given image, instead of irrelevant features such as the background. 
This study further proposes an automatic approach based on two metamorphic 
relations to assess if this property is violated in the image 
classifications. The evaluation shows that the proposed approach can 
effectively detect unreliable inferences violating the object-relevancy 
property, with an average precision of 64.1% and 96.4% for the two 
relations, respectively. The subsequent empirical study reveals that such 
unreliable inferences are prevalent in the real world and the existing 
training strategies cannot tackle this issue effectively.

The second study concentrates on the reliability issues induced by DL 
model compression. DL model compression can significantly reduce the sizes 
of Deep Neural Network (DNN) models, and thus facilitate the deployment of 
sophisticated, sizable DNN models. However, the prediction results of 
compressed models may deviate from those of their original models, 
resulting in unreliably deployed DL applications. To help developers 
thoroughly assess the impact of model compression, it is essential to test 
these models to find any deviated behaviors before dissemination. This 
study proposes DFLARE, a novel, search-based, black-box testing technique. 
The evaluation shows that DFLARE constantly outperforms the baseline in 
both efficacy and efficiency. More importantly, the triggering inputs 
found by DFLARE can be used to repair up to 48.48% of deviated behaviors.

The third study reveals the unreliable assessment of DL-based Program 
Generators (DLGs) in compiler testing. To effectively test compilers, DLGs 
are proposed to automatically generate massive testing programs. However, 
after thorough analysis of the characteristics of DLGs, this study found 
that the assessment of these DLGs is unfair and unreliable, since the 
chosen baselines, i.e., Language-Specific Program Generators (LSGs), are 
different from DLGs in many aspects. Furthermore, this study proposed 
Kitten, a simple, fair, and non-DL-based baseline for DLGs. The 
experiments show that DLGs cannot even compete against such a simple 
baseline and the claimed advantages of DLGs are likely due to the biased 
selection of the baseline. Specifically, Kitten triggers 1,750 hang bugs 
and 34 distinct crashes in 72-hours of testing on GCC, while the 
the-state-of-art DLG only triggers 3 hang bugs and 1 distinct crash. 
Moreover, the code coverage achieved by Kitten is at least 2x as of that 
achieved by the the-state-of-art DLG.


Date:                   Friday, 14 July 2023

Time:                   9:00am - 11:00am

Venue:                  Room 6538
                        Lifts 27/28

Chairman:               Prof. Hui SU (CIVL)

Committee Members:      Prof. Shing Chi CHEUNG (Supervisor)
                        Prof. Chengnian SUN (Supervisor, U of Waterloo)
                        Prof. Raymond WONG
                        Prof. Ross MURCH (ECE)
                        Prof. Meng XU (U of Waterloo)
                        Prof. Zhi JIN (Peking University)


**** ALL are Welcome ****
Privacy Sitemap
Assessing the Reliability of Deep Learning Applications

About

People

Research

Academics

Admissions