Latent Tree Models
|
News: o 20-10-2011: Model-based
multidimensional clustering of categorical data published on Artificial
Intelligence. o 20-10-2011:
Software packages EAST (for learning LTMs) and Lantern (for
inspecting/manipulating LTMs) available at end of this page. |
|
The Concept: Latent tree models (LTMs) are
tree-structured probabilistic graphical models or Bayesian networks. An
example is shown in Fig. 1. The leave nodes (the Xi's) represent observed
variables or attributes in data, while the internal nodes (the Yi's)
represent latent variables. All the variables are discrete. Each node is
associated with a conditional multinomial distribution of the node given its
parent, characterizing how it depends on its parent. |
|
|
Algorithms: LTMs
were introduced in (Zhang 2002, 2004), where they were called hierarchical
latent class models. To learn an LTM, one starts with a data set on observed variables,
and determines (1) the number of latent variables, (2) the connection among
them, (3) the number of possible states, called cardinality, for each
latent variable, and (4) the probability distributions. An concept-test algorithm called DHC for learning LTM was
proposed in Zhang (2002, 2004). The algorithm is search-based and is
capable of handling data sets with only about half a dozen attributes. Two
other search-based algorithms HSHC (Zhang and Kocka 2004) and EAST
(Chen et al. 2008). Those two algorithms gain efficiency by reducing
the number of candidate models examined during search and by reducing the
time spent in evaluating the candidate models. Between the two, EAST has a
more principled method for evaluating candidate models and adopts a more
intelligent search strategy. It is more efficient than HSHC and is less
likely to get trapped at local maxima. Two other algorithms for learning LTMs are recently
proposed by other authors. The algorithm by Harmeling and Williams (2010) is
faster than HSHC and EAST, but is restricted to produce only binary trees.
The algorithm by Choi (2010) has performance guarantees under some
conditions, but requires the cardinalities of the latent variables be given
in advance. |
|
o N. L. Zhang (2002). Hierarchical latent class
models for cluster analysis. AAAI-02, 230-237. o N. L. Zhang (2004).
Hierarchical latent class
models for cluster analysis. Journal of Machine Learning
Research, 5(6):697--723, 2004. o N. L. Zhang and T.
Kocka (2004). Efficient Learning of
Hierarchical Latent Class Models. Proc. of the 16th IEEE
International Conference on Tools with Artificial Intelligence (ICTAI-2004),
Boca Raton, Florida, November 15-17. o Tao Chen, N. L.
Zhang, and Yi Wang (2008). Efficient
Model Evaluation in the Search-Based Approach to Latent Structure Discovery.
In Proceedings of the Fourth European Workshop on Probabilistic Graphical
Models (PGM-08), 57-64. o S. Harmeling and o Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar,
and Alan S. Willsky (2011). Learning
latent tree graphical models. Journal of Machine
Learning Research 1 (2011) 1-48. |
|
Multidimensional Clustering: Clustering is a data analysis approach where the objective
is to find `naturally occurring' groups. Early research work on clustering
usually assumed that there was one true clustering of data. This assumption
does not hold for complex data which are typically multifaceted and can be
meaningfully clustered in many different ways. There is a growing interest in
methods that produce multiple partitions of data with each partition being
based on a different subset of attributes. We call such methods multi-partition clustering methods. (It is not to
be confused with multi-view clustering which
makes use of independence among different views of data to improve the
quality of one single partition.) The trend of multi-partition clustering is
evident from recent machine learning and data mining conferences. |
|
|
Analyzing data using LTMs can result in multiple discrete
latent variables, each representing a soft partition of data. So LTMs can be
used for multi-partition clustering. This
potential was first pointed out in (Zhang 2002, 2004). Because latent
variables can be viewed as latent attributes of data, we sometimes call
LTM-based cluster analysis multidimensional
clustering}. In (Zhang et al. 2008), we analyzed a marketing
data set from COIL
Challenge 2000. In (Zhang et al. 2010), we analyzed a survey data
set for social science. In both cases, interesting latent structures
and meaningful clusters were discovered. In (Zhang et al. 2010)
, we also discussed how to present the results of multidimensional clustering
so that domain experts can easily and accurately grasp the meanings of the
clusters found. |
|
|
o N. L. Zhang, Y.
Wang and T. Chen (2008). Discovery
of latent structures. Experience with the COIL Challenge 2000 data.
Journal of Systems Science and Complexity. 21: 1-12. o T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L.
K. M. Poon (2011). Model-based
multidimensional clustering of categorical data. Artificial
Intelligence. 176(1), 2246-2269. |
|
|
Pouch LTMs: LTMs are for discrete data. In
(Poon et al. 2010), we studied a similar class of models, called pouch
latent tree models (PLTMs), for continuous data. An example is shown in Fig.
2. In a PLTM, continuous attributes are first divided into pouches and then
the pouches are attached to the latent variables. In Fig. 2, for instance,
the doubleton pouch [X1X2] and the singleton pouch [X3] are both attached to
the latent variable Y2. The pouches are associated with conditional Gaussian
distributions P(X1;X2jY2) and P(X3jY2). PLTMs are a generalization of
Gaussian mixture models (GMMs). As a matter of fact, an GMM is a PLTM where
there is only one latent variable, and one pouch that consists of all the
attributes. In [30], PLTMs were used to analyze an array of data sets. The
results indicate PLTMs are capable of identifying natural facets of data and
discover meaningful clusterings along each facet. |
|
|
o
L. K. M. Poon, N. L. Zhang, T. Chen, and Y. Wang
(2010). Variable
selection in model-based clustering: To do or to facilitate. ICML-10. |
|
Application in TCM: An important motivation for our work is to meet the need
of traditional Chinese medicine (TCM) research. In (Zhang et al.
2008), we analyzed a TCM data set using LTMs. The latent structures obtained
match the relevant TCM theories. This is exciting and has attracted a lot of
attentions because it provides, for first time, objective justification for
the ancient theories (see Fig. 3). Since then we have been working with
collaborators trying to identify natural clusters in various TCM data sets
and use them as an objective and quantitative basis for the establishment of
TCM diagnosis standards. The effort is worthwhile because TCM diagnosis is
mostly based on observed symptoms rather than laboratory tests, and is
severely influenced by subjective factors and might vary significantly from
one doctor to another. Click here for a separate web page on this topic. |
|
|
o N. L. Zhang,
S. H. Yuan, T. Chen and Y. Wang (2008). Statistical
Validation of TCM Theories.
Journal
of Alternative and Complementary Medicine, 14(5):583-7. o N. L. Zhang, S. H.
Yuan, T. Chen and Y. Wang (2008). Latent tree models and
diagnosis in traditional Chinese medicine. Artificial
Intelligence in Medicine. 42: 229-245. |
|
Application: Density Estimation and
Classification: In (Wang et al 2008a, 2008b), we
study the use of LTMs for density estimation. To estimate the joint
distribution of some random variables, we build an LTM with those variables
as manifest variables. LTMs are computationally simple to deal with. At the
same time, they can achieve good approximation accuracy because they can
represent complex relationships among the manifest variables. A new algorithm
for learning LTMs is developed specifically for this purpose. Called LTAB,
the algorithm first computes the mutual information between each pair of
manifest variables and then constructs a binary tree structure for LTM
through hierarchical clustering of variables. LTAB is much faster than EAST.
It does not obtain interesting model structures. Good approximation to the
true distribution is achieved by setting the cardinalities (i.e., the numbers
of states) of the latent variables high. HNB
models were introduced in (Zhang et al. 2004) as a framework for
discovering latent structures in supervised learning. In that paper, we
developed an algorithm for learning HNB models and tested it on some small
data sets. The algorithm corresponds to the DHC algorithm for LTM. This
work was fellowed up by Langseth andNielsen (2006, 2009). |
|
o Y. Wang, N. L. Zhang. T. Chen (2008a). Latent
Tree Models and Approximate Inference in Bayesian Networks, AAAI-08.
o Y. Wang, N. L.
Zhang and T. Chen (2008b). Latent tree models and approximate inference in Bayesian
networks. Journal of Artificial Intelligence Research, 32,
879-900. o
N. L. Zhang, T. D.
Nielsen, and F. V. Jensen (2004). Latent variable discovery
in classification models. Artificial Intelligence in
Medicine, 30(3): 283-299. o
H. Langseth
and T. D. Nielsen (2009): Latent
Classification models for Binary data. Pattern Recognition
42:2724-2736, 2009. o
H. Langseth
and T. D. Nielsen (2006):
Classification using Hierarchical Naive Bayes models. Machine Learning 63(2),
pp. 135-159, 2006. |
|
Other Applications: In
(Mourad
et al 2010), LTMs are applied to bioinfomatics. |
Raphaël Mourad, Christine Sinoquet,
Philippe Leray (2010). A
hierarchical Bayesian network approach for linkage disequilibrium modeling
and data-dimensionality reduction prior to genome-wide association studies.
BMC Bioinformatics 2011, 12:16doi:10.1186/1471-2105-12-16.
|
|
Softwares |
Send
feedbacks to lzhang@cse.ust.hk |
|
|
Lantern |
GUI
for LTMs: http://www.cse.ust.hk/~lzhang/ltm/softwares/Lantern2.0-beta.exe |
|
|
EAST |
An implementation of the EAST algorithm: http://www.cse.ust.hk/~lzhang/ltm/softwares/EAST.zip |
|
|
HLCM |
Implementation of HSHC algorithm and utilities: http://www.cse.ust.hk/~lzhang/ltm/softwares/hlcm-distribute.zip |
|
PLTM-EAST |
An implementation of the EAST algorithm for Pouch Latent Tree :http://www.cse.ust.hk/~lzhang/ltm/softwares/PLTM-EAST.tar.gz |
|
|
Datasets |
Model, Structure,
BIC Score, Contributor, Date |
|
1. Model, structure, -51,465, N. L.
Zhang, 10-06 2. future better
models (let me know if you find one) |
|
|
1. Model, structure, -73,680, N. L. Zhang, 10-06 2. future better
models |