Spring 2006 HKUST Database Seminar

Date

Speaker

Talk

Comments

March 24, 2006

13:00-14:00

Wei Feng

Title

Holistic Schema Matching for Web Query Interfaces

 Abstract

One significant part of today's Web is Web databases, which can dynamically provide information in

response to user queries. To help users submit queries to different Web databases, the query interface

matching problem needs to be addressed. To solve this problem, we propose a new complex schema matching

approach, Holistic Schema Matching (HSM). By examining the query interfaces of real Web databases, we

observe that attribute matchings can be discovered from attribute-occurrence patterns. For example,

First Name often appears together with Last Name while it is rarely co-present with Author in the Books

domain. Thus, we design a count-based greedy algorithm to identify which attributes are more likely to

be matched in the query interfaces. In particular, HSM can identify both simple matching, i.e., 1:1

matching, and complex matching, i.e., 1:n or m:n matching, between attributes. Our experiments show

that HSM can discover both simple and complex matchings accurately and efficiently on real data sets.

March 31st, 2006

13:00-14:00

Dr. Yan Liu

Title:

Automatic Feature Selection in Video Understanding

  Abstract:

The rapid growth and wide applications of digital video data have led to a significant need for video

understanding. Unfortunately, the gap between high-level concepts and low-level features and the high

time cost of video analysis are two important obstacles of efficient video data management. Feature

selection, which can improve the prediction performance of the predictors, provide faster and more

cost-effective predictors, and provide better understanding of the data, is introduced to address these

two problems. But applying existing automatic feature selection algorithms to video data is impractical

because of the unrealistic amount of computer time. So far, most feature selection technologies in

video applications are based on researchers' intuition although human interaction can't satisfy the

dramatic increase of video data and the multiple requirements of different users.

 In this talk, I propose and demonstrate four low time cost feature selection algorithms to address the

problem of high dimensionality, complex hypotheses, and sparse training data in classification, with

special applications in video segmentation, categorization and retrieval. Compared to existing methods,

 in both artificial datasets and real problems in several different domains, these methods meet or

exceed existing methods' accuracy while running at several orders of magnitude faster.

Bio:

Dr. Liu, Yan is an assistant professor in

the Department of Computing at The Hong

 Kong Polytechnic University. Her research

 interests include machine learning,

multimedia understanding and E-learning.

 She received her PhD in Computer Science

 Department from Columbia University and

 her M.S. in Business School from Nanjing

University.

April 21^st, 2006

13:00-14:00

Dr. Yufei Tao

Title:

  Privacy Preserving Publication with Generalization

  Abstract:

  Privacy preservation is a serious concern in publication of personal data, and has attracted

considerable attention from the database community.  In the first part of the talk, we will review "k-

anonymous generalization",  an important methodology underlying the existing solutions in the

literature. We also explain the drawbacks of this methodology. In particular, k-anonymous

generalization exerts the same amount of privacy preservation for all persons, without catering for

 their concrete needs. The consequence is that we may be offering insufficient protection to a subset

 of people, while applying excessive privacy control to another subset.

The rest of the talk will be devoted to introducing a novel generalization ramework based on the

concept of "personalized anonymity". This technique performs the minimum generalization for satisfying

 everybody's requirements, and thus, retains the largest amount of information from the

microdata. We also present the results of our theoretical study that provide valuable insight into the

 behavior of alternative solutions. These results mathematically reveal the circumstances where the

previous work fails to protect privacy, and establish the superiority of the proposed technique.

The content of the talk will appear in SIGMOD 2006.

Bio:

Dr. Yufei Tao is an Assistant Professor in

 the Department of Computer Science,

the City University of Hong Kong. He holds

 a PhD from the Hong Kong University of

Science and Technology, and did his post

doc at the Carnegie Mellon University,

 USA. Yufei received the Hong Kong Young

 Scientist Award in 2002. He enjoys

 solving database problems, particularly

 those related to privacy preservation,

 and spatial/temporal databases.

April 28^th, 2006

13:00-14:00

David Yang

Title:

An Efficient Approach to Support Querying Secure Outsourced XML Information

Abstract:

Data security is well-recognized a vital issue in an information system that is supported in an

 outsource environment. However, most of conventional XML encryption proposals treat confidential parts

of an XML document as whole blocks of text and apply encryption algorithms directly on them. As a

result, queries involving the encrypted part cannot be efficiently processed.

This talk focuses on XQEnc, a novel approach to support querying encrypted XML. XQEnc is based on two

important techniques of vectorization and skeleton compression. Essentially, vectorization, which is a

generalization ofcolumns of a relational table, makes use the basic path of an XML tree to label the

data values. Skeleton compression collapses the redundant paths into a multiplicity attribute. Our

analysis and experimental study shows that XQEnc achieves both better query efficiency and more robust

security compared with conventional methods. As an application, we show how XQEnc can be realized with

relational techniques to enable secure XML data outsourcing.

July 28^th, 2006

13:00-14:00

Prof. Ben Choi

Title:

Building a Better Search Engine

Abstract:

This presentation will outline several new techniques for building a better search engine. New algorithms have been developed for

automatically classifying and clustering Web pages to allow users to quickly find information on the Web. A new text indexing scheme has

been developed for speeding up keyword search. And, a new parallel and distributed architecture has been developed to handle massive

search requests. Several techniques under development will also be outlined, including Web crawling methods for finding and updating Web

pages and page ranking methods for ordering search results. These techniques have been implemented and tested on an object-oriented

database, the experience of which will also be shared. In addition, two patent applications have been filed based on some the new techniques.

Bio:

Prof. Ben Choi, Ph.D. & Pilot: He is an

 Associate Professor in Computer Science

 at Louisiana Tech University and is also

a pilot of airplanes and helicopters. He

 worked in the computer industry as a

System Performance Engineer at Lucent

 Technologies and as a Computer

 Consultant. He got his Bachelor, Master,

 and Ph.D. degrees all from The Ohio State

 University. His areas were Electrical

 Engineering, Computer Engineering, and

 Computer Science. His works included

 associative memory, parallel computing,

 and machine learning. His research

 interests are developing software and

hardware methods for building intelligent

machines and abstracting the Universe as

a Computer.

August 4^th, 2006

13:00-14:00

Yiping Ke

Title:

Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach

Abstract:

Existing research on mining quantitative databases mainly focuses on mining associations. However,

 mining associations is too expensive to be practical in many cases. In this paper, we study mining

correlations from quantitative databases and show that it is a more effective approach than mining

associations. We propose a new notion of Quantitative Correlated Patterns (QCPs), which is founded on

two formal concepts, mutual information and all-confidence. We first devise a normalization on mutual

information and apply it to QCP mining to capture the dependency between the attributes. We further

adopt all-confidence as a quality measure to control, at a finer granularity, the dependency between

 the attributes with specific quantitative intervals. We also propose a supervised method to combine

the consecutive intervals of the quantitative attributes based on mutual information, such that the

interval combining is guided by the dependency between the attributes. We develop an algorithm,

QCoMine, to efficiently mine QCPs by utilizing normalized mutual information and all-confidence to

perform a two-level pruning. Our experiments verify the efficiency of QCoMine and the quality of the

QCPs.