Fall 2006 HKUST Database Seminar

Sept. 29th, 2006


Jing Zhao

Improving Meta-search Quality Using Distributed Latent Semantic Indexing
The trade-off between improving search quality and minimizing search engine summarization cost has been
 one of the most important issues in a meta-search system. We proposed a search engine descriptor based 
on distributed latent semantic indexing. The descriptor represented the semantics of a search engine's 
document collection which are characterized by term co-occurrences across documents. Furthermore, we 
divided a  document collection into clusters by indexed terms and did latent  semantic indexing within
 clusters so as to improve the effectiveness and efficiency of search engine summarization. We also
 proposed result merging algorithm based on latent semantic indexing.
Our experimental results over the TREC collection showed that our server selection method significantly
 improved search precision with a low storage and update cost. In addition, our result merging
 algorithm was verified as effective especially when a small number of top-ranked results were examined.

Oct 20th,  2006,


Room: 3530

Su WeiFeng

Automatic Hierarchical Classification of Structured DeepWeb Databases
We present a method that automatically classifies structured deepWeb databases according to a pre-
defined topic hierarchy. We assume that there are some manually classified databases, i.e., training
databases, in every node of the topic hierarchy. Each training database is probed using queries 
constructed from the node titles of the topic hierarchy and the query result counts reported by the
database are used to represent the content of the database. Hence, when adding a new database it can be
probed by the same set of queries and classified to a node whose training databases are most similar to 
the new one. Specifically, a support vector machine classifier is trained on each internal node of the
topic hierarchy with these training databases and the new database can be classified into the hierarchy
top-down level by level. A feature extension method is proposed to create discriminate features. 
Experiments run on real structured Web databases collected from the Internet show that this 
classification method is quite accurate.

Nov 24th, 2006


Room: 3464

Gabriel Ghinita

Ph.D. Candidate

Department of Computer Science, NUS


Privacy-Preserving Spatial Queries in Location Based Services


The emerging trend of mobile devices with embedded positioning capabilities (e.g., GPS) facilitates the widespread use of Location Based Services. For such applications to succeed, query privacy and confidentiality are of paramount importance. Conventional privacy-preserving techniques include encryption, which safeguards communication channels, and the use of pseudonyms, which hide user identities. Nevertheless, the contents of spatial queries may disclose the physical location and  identity of users, compromising their privacy. We present a framework for preserving the privacy of users who issue spatial queries to Location Based Services. We propose transformations based on the well-established k-anonymity paradigm to compute exact answers for Range and Nearest Neighbor queries, without revealing the query source identity. Our proposed techniques provide guarantees on user privacy, and can be employed both in the centralized setting, as well as in decentralized/P2P systems. Extensive experimental studies show that our methods are applicable to real-life scenarios with numerous mobile users.



Dec 18th, 2006


Room: 3530


Huiming Qu
Ph.D candidate
Department of Computer Science
University of Pittsburgh


Jimeng Sun
Ph.D candidate
Computer Science Department
Carnegie Mellon University


Preference-Aware Query and Update Scheduling for Web-Databases
By Ms. Huiming Qu

Web-database systems are nowadays an integral part of everybody's life. In general, users expect short response times and low staleness. However, it may be extremely hard to apply all updates on time, i.e., keep zero staleness, and also get fast response times, especially in periods of bursty traffic. In this work, we present the concept of Quality Contracts (QCs) which combines the two incomparable performance metrics: response time or Quality of Service (QoS), and staleness or Quality of Data (QoD). QCs allows individual users to express their preferences for the expected QoS and QoD of their queries by assigning " profit" values. To maximize the total profit from submitted QCs, we propose an adaptive algorithm, called QUTS. QUTS addresses the problem of prioritizing the scheduling of updates over queries using a meta scheduling scheme that dynamically allocates CPU resources to updates and queries according to user preferences. We present the results of an extensive experimental study using real data (taken from a stock information web site), where we show that QUTS performs better than baseline algorithms under the entire spectrum of QCs; QUTS also adapts fast to changing workloads.

Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams
By Mr. Jimeng Sun

Data stream values are often associated with multiple aspects. For example, each value from environmental sensors may have an

 associated type (e.g., temperature, humidity, etc) as well as location. Aside from timestamp, type and location are the two additional

 aspects. How to model such streams? How to simultaneously find patterns within and across the multiple aspects? How to do it

 incrementally in a streaming fashion? In this paper, all these problems are addressed through a general data model, tensor streams, and

 an effective algorithmic framework, window-based tensor analysis (WTA). Two variations of WTA, independentwindow tensor

 analysis (IW) and moving-window tensor analysis (MW), are presented and evaluated extensively on real datasets. Finally, we illustrate

 one important application, Multi-Aspect Correlation Analysis (MACA), which uses WTA and we demonstrate its effectiveness on an

 environmental monitoring application.

Remark: similar talk will be given at ICDM06.




Previous Seminar Achieves


Spring 2006