Optimizing the Inference Efficiency of Deep Neural Networks on the Graph and the Operator Level

PhD Thesis Proposal Defence


Title: "Optimizing the Inference Efficiency of Deep Neural Networks on the 
Graph and the Operator Level"

by

Miss Jingzhi FANG


Abstract:

Deep neural networks (DNNs) have achieved great success in many areas, e.g., 
computer vision, natural language processing, and so on. This great success is 
mainly achieved by increasingly large and computationally intensive deep 
learning models. The increased model size makes the training and inference of 
DNNs time-consuming, bringing severe problems to the development and 
application of DNNs. As a result, it is important to reduce the execution time 
of DNNs. One way to achieve this goal is to optimize the implementation of 
DNNs, without changing their outputs. The best implementation of a DNN is 
affected by its model architecture, input workload, and the hardware to run on. 
Therefore, each DNN should be optimized individually, with the consideration of 
its runtime information. However, the optimization space of DNN implementations 
is often huge, making it hard to search for the best implementation. 
Furthermore, we may need to conduct the optimization process multiple times in 
practice, e.g., when designing the model architecture and when the DNN runtime 
information is dynamic (like dynamic input workload). Long optimization time 
can be unaffordable. As a result, the efficiency of optimizing the DNN 
implementation is also of great importance. In this thesis, we introduce two 
techniques to accelerate the optimization of DNN implementations while 
maintaining good optimization effectiveness. Specifically, as the DNN can be 
represented by a computation graph, where each node corresponds to an operator 
in the model (e.g., matrix multiplication) and each edge corresponds to the 
data dependency between operators, our techniques optimize the DNN 
implementation on the graph level and the operator level, respectively. The 
graph-level optimization method searches for the equivalent computation graph 
for a DNN by applying transformations to its original computation graph 
iteratively. The operator-level optimization method searches for the equivalent 
low-level code for an operator by applying transformations to its naively 
implemented low-level code.


Date:                   Friday, 22 March 2024

Time:                   4:00pm - 6:00pm

Venue:                  Room 5501
                        Lifts 25/26

Committee Members:      Prof. Lei Chen (Supervisor)
                        Prof. Raymond Wong (Chairperson)
                        Prof. Qiong Luo
                        Prof. Ke Yi