TOWARDS USABLE, EFFICIENT SERVERLESS COMPUTING SYSTEMS

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "TOWARDS USABLE, EFFICIENT SERVERLESS COMPUTING SYSTEMS"

By

Mr. Minchen YU


Abstract:

Serverless computing has gained much popularity as a new cloud computing
paradigm due to its high scalability and fine-grained, pay-per-use billing
model. In contrast to traditional serverful cloud offerings like VMs,
serverless computing proposes high-level abstractions in the form of stateless
functions, which facilitates users to develop and execute cloud applications.
However, these advancements also present new challenges, such as application
state management and restricted function communication, which can significantly
impact the usability and applicability of serverless cloud.

This dissertation aims to make serverless computing more usable and efficient,
where we explore both scenarios: general-purpose and application-specific
serverless systems. We first discuss general-purpose serverless platforms that
are designed to provide a function abstraction and support diverse
applications. Current serverless platforms typically deploy an application as a
function workflow, and orchestrate and trigger its functions by following
invocation dependencies. However, this design is oblivious to the underlying
data exchanges between functions, making it inefficient and hard to orchestrate
complex applications. We therefore propose a novel data-centric approach to
function orchestration, which can easily and effectively support complex
workflow patterns by making data consumption explicit and allowing it to
trigger functions. Following this data-centric design, we present Pheromone, an
efficient serverless platform that enables low-latency function interactions
and data exchanges and is easy to use for orchestrating many applications.

We next focus on enabling efficient ML model inference on serverless computing,
i.e., application-specific serverless systems. Serverless computing is
well-suited for model inference, as it supports fast autoscaling to handle
dynamic, bursty inference requests at low cost. However, current serverless
functions are limited in CPU and memory resources and do not support GPUs,
which hinders their ability to perform efficient model inference. To address
the limitations, we present two serverless systems. First, we propose Gillis, a
serverless model inference system that tackles the resource limitations of
individual functions. Gillis can automatically partition and parallelize a ML
model across multiple function, leading to faster inference and reduced memory
footprint per function. It supports large models that cannot be accommodated
within a single function and effectively meets request-level latency Service
Level Objectives (SLOs) through its model partitioning algorithms. Second, we
propose Torpor, a GPU-enabled serverless platform for low-latency,
resource-efficient model inference. Torpor enables fine-grained GPU sharing
among various inference functions, and supports efficient model swapping
between host and GPUs, which reduces function keep-alive cost and achieves load
balancing across GPUs. With its model swapping and request scheduling
algorithms, Torpor can effectively meet perfunction latency SLOs while
achieving high GPU utilization.


Date:                   Thursday, 3 August 2023

Time:                   10:30am - 12:30pm

Venue:                  Room 5501
                        Lifts 25/26

Chairman:               Prof. Jun ZHANG (ECE)

Committee Members:      Prof. Wei WANG (Supervisor)
                        Prof. Gary CHAN
                        Prof. Shuai WANG
                        Prof. Fengbin TU (ECE)
                        Prof. Chuan WU (HKU)


**** ALL are Welcome ****