Towards High-performance Datacenter Systems with Application-oriented Optimizations

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Towards High-performance Datacenter Systems with Application-oriented
Optimizations"

By

Mr. Chaoliang ZENG


Abstract:

In recent decades, we have witnessed extensive construction of datacenters and
widespread deployments of various applications. With the rapid rise of Internet
services and cloud computing but the slowdown of Moore's law and Dennard
scaling, there is a conflict between expanding application requirements and
slow evolutions of general-purpose processors. Therefore, it is critical to
build high-performance datacenter systems with application-oriented
optimizations.

This thesis describes my research efforts in building high-performance
datacenter systems with careful exploitation of application-specific
characteristics and hardware architectures. Specifically, we explore three
application-oriented datacenter systems.

First, we present Herald, a runtime embedding scheduler, for efficient
cache-enabled recommendation model training. Herald fully exploits the
predictability and occasionality of embedding cache access to reduce the
embedding transmissions between caches and PS during training. We believe that
the scheduling philosophy of Herald can be generally extended to the training
of embedding models.

Second, we study the embedding-based retrieval algorithm from the first
principles and derive a practically ideal architecture for optimal performance.
Based on the derived architecture, we propose FAERY for high-performance
embedding-based retrieval running on FPGA. FAERY leverages appropriate parallel
techniques to orchestrate key operators in embedding-based retrieval, so that
FAERY can outperform CPU- and GPU-based approaches. Although FAERY is a
domain-specific accelerator for retrieval in recommendation systems, we believe
similar optimization techniques can be applied to systems bounded by memory and
computation.

Third, we design Tiara, a three-tier hardware architecture to accelerate
stateful layer-4 load balancing. Tiara makes the best use of heterogeneous
hardware by decoupling the load balancing function. As a result, Tiara can
provide high performance with cost, energy, and space efficiency. We believe
Tiara three-tier architecture is generic and can benefit more datacenter
gateway functions.


Date:                   Tuesday, 18 July 2023

Time:                   2:00pm - 4:00pm

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Prof. Shiheng WANG (ACCT)

Committee Members:      Prof. Kai CHEN (Supervisor)
                        Prof. Gary CHAN
                        Prof. Dan XU
                        Prof. Jun ZHANG (ECE)
                        Prof. Hong XU (CUHK)


**** ALL are Welcome ****