Latent Tree Models

 

News:

o    20-10-2011:  Model-based multidimensional clustering of categorical data published on Artificial Intelligence.

o    20-10-2011:  Software packages EAST (for learning LTMs) and Lantern (for inspecting/manipulating LTMs) available at end of this page.

 

The Concept: Latent tree models (LTMs) are tree-structured probabilistic graphical models or Bayesian networks. An example is shown in Fig. 1. The leave nodes (the Xi's) represent observed variables or attributes in data, while the internal nodes (the Yi's) represent latent variables. All the variables are discrete. Each node is associated with a conditional multinomial distribution of the node given its parent, characterizing how it depends on its parent.

Description: Description: I:\admin\web\ltm\index.4.jpg

 

Algorithms: LTMs were introduced in (Zhang 2002, 2004), where they were called hierarchical latent class models. To learn an LTM, one starts with a data set on observed variables, and determines (1) the number of latent variables, (2) the connection among them, (3) the number of possible states, called cardinality, for each latent variable, and (4) the probability distributions.

 

An concept-test algorithm called DHC for learning LTM was proposed in  Zhang (2002, 2004). The algorithm is search-based and is capable of handling data sets with only about half a dozen attributes. Two other search-based algorithms HSHC  (Zhang and Kocka 2004) and EAST (Chen et al. 2008). Those two algorithms gain efficiency by reducing the number of candidate models examined during search and by reducing the time spent in evaluating the candidate models. Between the two, EAST has a more principled method for evaluating candidate models and adopts a more intelligent search strategy. It is more efficient than HSHC and is less likely to get trapped at local maxima.

 

Two other algorithms for learning LTMs are recently proposed by other authors. The algorithm by Harmeling and Williams (2010) is faster than HSHC and EAST, but is restricted to produce only binary trees. The algorithm by Choi (2010) has performance guarantees under some conditions, but requires the cardinalities of the latent variables be given in advance.

o   N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. AAAI-02, 230-237.

o   N. L. Zhang (2004). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, 5(6):697--723, 2004.

o   N. L. Zhang and T. Kocka (2004). Efficient Learning of Hierarchical Latent Class Models. Proc. of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2004), Boca Raton, Florida,  November 15-17.

o   Tao Chen, N. L. Zhang, and Yi Wang (2008). Efficient Model Evaluation in the Search-Based Approach to Latent Structure Discovery. In Proceedings of the Fourth European Workshop on Probabilistic Graphical Models (PGM-08), 57-64.

o   S. Harmeling and C.K. I. Williams. Greedy learning of binary latent trees (2011). IEEE Transactions on Pattern Analysis and Machine Intel ligence, 33(6), 1087-1097.

o    Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, and Alan S. Willsky 2011. Learning latent tree graphical models. Journal of Machine Learning Research 1 (2011) 1-48.

 

Multidimensional Clustering: Clustering is a data analysis approach where the objective is to find `naturally occurring' groups. Early research work on clustering usually assumed that there was one true clustering of data. This assumption does not hold for complex data which are typically multifaceted and can be meaningfully clustered in many different ways. There is a growing interest in methods that produce multiple partitions of data with each partition being based on a different subset of attributes. We call such methods multi-partition clustering methods. (It is not to be confused with multi-view clustering which makes use of independence among different views of data to improve the quality of one single partition.) The trend of multi-partition clustering is evident from recent machine learning and data mining conferences.

Description: Description: I:\admin\web\ltm\index.2.jpg

Analyzing data using LTMs can result in multiple discrete latent variables, each representing a soft partition of data. So LTMs can be used for multi-partition clustering. This potential was first pointed out in (Zhang 2002, 2004). Because latent variables can be viewed as latent attributes of data, we sometimes call LTM-based cluster analysis  multidimensional clustering}. In (Zhang et al. 2008), we analyzed a marketing data set from COIL Challenge 2000. In (Zhang et al. 2010), we analyzed a survey data set for social science. In both cases, interesting latent structures and  meaningful clusters were discovered. In (Zhang et al. 2010) , we also discussed how to present the results of multidimensional clustering so that domain experts can easily and accurately grasp the meanings of the clusters found.

o   N. L. Zhang, Y. Wang and T. Chen (2008). Discovery of latent structures. Experience with the COIL Challenge 2000 dataJournal of Systems Science and Complexity. 21: 1-12.

o   T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence. 176(1), 2246-2269.

 

Pouch LTMs: LTMs are for discrete data. In (Poon et al. 2010), we studied a similar class of models, called pouch latent tree models (PLTMs), for continuous data. An example is shown in Fig. 2. In a PLTM, continuous attributes are first divided into pouches and then the pouches are attached to the latent variables. In Fig. 2, for instance, the doubleton pouch [X1X2] and the singleton pouch [X3] are both attached to the latent variable Y2. The pouches are associated with conditional Gaussian distributions P(X1;X2jY2) and P(X3jY2). PLTMs are a generalization of Gaussian mixture models (GMMs). As a matter of fact, an GMM is a PLTM where there is only one latent variable, and one pouch that consists of all the attributes. In [30], PLTMs were used to analyze an array of data sets. The results indicate PLTMs are capable of identifying natural facets of data and discover meaningful clusterings along each facet.

Description: Description: I:\admin\web\ltm\index.5.jpg

 

o   L. K. M. Poon, N. L. Zhang, T. Chen, and Y. Wang (2010). Variable selection in model-based clustering: To do or to facilitate. ICML-10.

 

Application in TCM: An important motivation for our work is to meet the need of traditional Chinese medicine (TCM) research. In (Zhang et al. 2008), we analyzed a TCM data set using LTMs. The latent structures obtained match the relevant TCM theories. This is exciting and has attracted a lot of attentions because it provides, for first time, objective justification for the ancient theories (see Fig. 3). Since then we have been working with collaborators trying to identify natural clusters in various TCM data sets and use them as an objective and quantitative basis for the establishment of TCM diagnosis standards. The effort is worthwhile because TCM diagnosis is mostly based on observed symptoms rather than laboratory tests, and is severely influenced by subjective factors and might vary significantly from one doctor to another.

Click here for a separate web page on this topic.

Description: Description: I:\admin\web\ltm\index.6.jpg

 

o   N. L. Zhang,  S. H. Yuan, T. Chen and  Y. Wang (2008).  Statistical Validation of TCM Theories. Journal of Alternative and Complementary Medicine, 14(5):583-7

o   N. L. Zhang, S. H. Yuan, T. Chen and Y. Wang (2008). Latent tree models and diagnosis in traditional Chinese medicine. Artificial Intelligence in Medicine. 42: 229-245.

 

Application: Density Estimation and Classification: In (Wang et al 2008a, 2008b), we study the use of LTMs for density estimation. To estimate the joint distribution of some random variables, we build an LTM with those variables as manifest variables. LTMs are computationally simple to deal with. At the same time, they can achieve good approximation accuracy because they can represent complex relationships among the manifest variables. A new algorithm for learning LTMs is developed specifically for this purpose. Called LTAB, the algorithm first computes the mutual information between each pair of manifest variables and then constructs a binary tree structure for LTM through hierarchical clustering of variables. LTAB is much faster than EAST. It does not obtain interesting model structures. Good approximation to the true distribution is achieved by setting the cardinalities (i.e., the numbers of states) of the latent variables high.

HNB models were introduced in (Zhang et al. 2004) as a framework for discovering latent structures in supervised learning. In that paper, we developed an algorithm for learning HNB models and tested it on some small data sets. The algorithm corresponds to the DHC algorithm for LTM.  This work was fellowed up by Langseth andNielsen (2006, 2009).

o   Y. Wang, N. L. Zhang. T. Chen (2008a). Latent Tree Models and Approximate Inference in Bayesian Networks, AAAI-08.

o   Y. Wang, N. L. Zhang and T. Chen (2008b). Latent tree models and approximate inference in Bayesian networks. Journal of Artificial Intelligence Research, 32, 879-900. 

o   N. L. Zhang, T. D. Nielsen, and F. V. Jensen (2004). Latent variable discovery in  classification models. Artificial Intelligence in Medicine, 30(3): 283-299.

o   H. Langseth and T. D. Nielsen (2009): Latent Classification models for Binary dataPattern Recognition 42:2724-2736, 2009.

o   H. Langseth and T. D. Nielsen (2006): Classification using Hierarchical Naive Bayes models. Machine Learning 63(2), pp. 135-159, 2006.

 

Other Applications: In (Mourad et al 2010),  LTMs are applied to bioinfomatics.

Raphaël Mourad, Christine Sinoquet, Philippe Leray (2010). A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies. BMC Bioinformatics 2011, 12:16doi:10.1186/1471-2105-12-16.

 

Softwares

 Send feedbacks to lzhang@cse.ust.hk  

 

Lantern 

 GUI for LTMs: http://www.cse.ust.hk/~lzhang/ltm/softwares/Lantern2.0-beta.exe

 

EAST

An implementation of the EAST algorithm: http://www.cse.ust.hk/~lzhang/ltm/softwares/EAST.zip

 

HLCM

Implementation of HSHC algorithm and utilities:

http://www.cse.ust.hk/~lzhang/ltm/softwares/hlcm-distribute.zip

http://www.cse.ust.hk/~lzhang/ltm/softwares/toolBox.zip 

 

PLTM-EAST
An implementation of the EAST algorithm for Pouch Latent Tree :
http://www.cse.ust.hk/~lzhang/ltm/softwares/PLTM-EAST.tar.gz

 

 

Datasets

Model, Structure, BIC Score, Contributor, Date

CoIL Data (train, test)

1. Model, structure, -51,465, N. L. Zhang, 10-06

2. future better models (let me know if you find one)

TCM Kidney Data

1. Model, structure, -73,680, N. L. Zhang, 10-06

2. future better models