Fall 2006 HKUST Database Seminar

Sept. 29^th, 2006

13:00-14:00

Jing Zhao

Title:

Improving Meta-search Quality Using Distributed Latent Semantic Indexing

Abstract:

The trade-off between improving search quality and minimizing search engine summarization cost has been

 one of the most important issues in a meta-search system. We proposed a search engine descriptor based

on distributed latent semantic indexing. The descriptor represented the semantics of a search engine's

document collection which are characterized by term co-occurrences across documents. Furthermore, we

divided a  document collection into clusters by indexed terms and did latent  semantic indexing within

 clusters so as to improve the effectiveness and efficiency of search engine summarization. We also

 proposed result merging algorithm based on latent semantic indexing.

Our experimental results over the TREC collection showed that our server selection method significantly

 improved search precision with a low storage and update cost. In addition, our result merging

 algorithm was verified as effective especially when a small number of top-ranked results were examined.

Oct 20^th, 2006,

13:00-14:00

Room: 3530

Su WeiFeng

Title:

Automatic Hierarchical Classification of Structured DeepWeb Databases

Abstract:

We present a method that automatically classifies structured deepWeb databases according to a pre-

defined topic hierarchy. We assume that there are some manually classified databases, i.e., training

databases, in every node of the topic hierarchy. Each training database is probed using queries

constructed from the node titles of the topic hierarchy and the query result counts reported by the

database are used to represent the content of the database. Hence, when adding a new database it can be

probed by the same set of queries and classified to a node whose training databases are most similar to

the new one. Specifically, a support vector machine classifier is trained on each internal node of the

topic hierarchy with these training databases and the new database can be classified into the hierarchy

top-down level by level. A feature extension method is proposed to create discriminate features.

Experiments run on real structured Web databases collected from the Internet show that this

classification method is quite accurate.

Nov 24^th, 2006

13:00-14:00

Room: 3464

Gabriel Ghinita

Ph.D. Candidate

Department of Computer Science, NUS

Title:

Privacy-Preserving Spatial Queries in Location Based Services

Abstract:

The emerging trend of mobile devices with embedded positioning capabilities (e.g., GPS) facilitates the widespread use of Location Based Services. For such applications to succeed, query privacy and confidentiality are of paramount importance. Conventional privacy-preserving techniques include encryption, which safeguards communication channels, and the use of pseudonyms, which hide user identities. Nevertheless, the contents of spatial queries may disclose the physical location and identity of users, compromising their privacy. We present a framework for preserving the privacy of users who issue spatial queries to Location Based Services. We propose transformations based on the well-established k-anonymity paradigm to compute exact answers for Range and Nearest Neighbor queries, without revealing the query source identity. Our proposed techniques provide guarantees on user privacy, and can be employed both in the centralized setting, as well as in decentralized/P2P systems. Extensive experimental studies show that our methods are applicable to real-life scenarios with numerous mobile users.

Dec 18^th, 2006

14:00-15:00

Room: 3530

Huiming Qu
Ph.D candidate
Department of Computer Science
University of Pittsburgh

and

Jimeng Sun
Ph.D candidate
Computer Science Department
Carnegie Mellon University

Title:

Preference-Aware Query and Update Scheduling for Web-Databases

By Ms. Huiming Qu

Abstract:

Web-database systems are nowadays an integral part of everybody's life. In general, users expect short response times and low staleness. However, it may be extremely hard to apply all updates on time, i.e., keep zero staleness, and also get fast response times, especially in periods of bursty traffic. In this work, we present the concept of Quality Contracts (QCs) which combines the two incomparable performance metrics: response time or Quality of Service (QoS), and staleness or Quality of Data (QoD). QCs allows individual users to express their preferences for the expected QoS and QoD of their queries by assigning " profit" values. To maximize the total profit from submitted QCs, we propose an adaptive algorithm, called QUTS. QUTS addresses the problem of prioritizing the scheduling of updates over queries using a meta scheduling scheme that dynamically allocates CPU resources to updates and queries according to user preferences. We present the results of an extensive experimental study using real data (taken from a stock information web site), where we show that QUTS performs better than baseline algorithms under the entire spectrum of QCs; QUTS also adapts fast to changing workloads.

Title:

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams

By Mr. Jimeng Sun

Abstract:

Data stream values are often associated with multiple aspects. For example, each value from environmental sensors may have an

associated type (e.g., temperature, humidity, etc) as well as location. Aside from timestamp, type and location are the two additional

aspects. How to model such streams? How to simultaneously find patterns within and across the multiple aspects? How to do it

incrementally in a streaming fashion? In this paper, all these problems are addressed through a general data model, tensor streams, and

an effective algorithmic framework, window-based tensor analysis (WTA). Two variations of WTA, independentwindow tensor

analysis (IW) and moving-window tensor analysis (MW), are presented and evaluated extensively on real datasets. Finally, we illustrate

one important application, Multi-Aspect Correlation Analysis (MACA), which uses WTA and we demonstrate its effectiveness on an

environmental monitoring application.

Remark: similar talk will be given at ICDM06.

Previous Seminar Achieves

Spring 2006