Dekai WU
Associate Professor of Computer Science and Engineering, HKUST
Human Language Technology Center
Department of Computer Science and Engineering
The Hong Kong University of Science & Technology
HKUST, Clear Water Bay, Kowloon, Hong Kong
tel +852 2358-6989 · fax +852 2358-1477 · room 3539 (lifts 25/26)
lab +852 2358-8831 · room 2580 (lifts 27/28 and 29/30)
dekai@cs.ust.hk · http://www.cs.ust.hk/~dekai
I received my PhD in Computer Science from the University of California at Berkeley, and was a postdoctoral fellow at the University of Toronto (Ontario, Canada) prior to joining HKUST in its founding year in 1992. Other degrees include an Executive MBA from Kellogg (Northwestern University) and HKUST in 2002, and a BS in Computer Engineering from the University of California at San Diego (Revelle College departmental award, cum laude, Phi Beta Kappa) in 1984. I have been a visiting researcher at Columbia University in 1995-96, Bell Laboratories in 1995, and the Technische Universität München (Munich, Germany) during 1986-87.
In December 2011, I was selected by the Association for Computational Linguistics as one of only 17 scientists worldwide to be awarded the honor of founding ACL Fellow, with a citation for "significant contributions to machine translation and the development of inversion transduction grammar" which pioneered the integration of syntactic and semantic models into statistical machine translation paradigms.
I serve as Associate Editor of AI Journal and on the Editorial Board of Journal of Natural Language Engineering. I am serving as a Chair of IWSLT 2012, as well as the ongoing SSST workshop series from 2007 to 2013. I have also served as Co-Chair for EMNLP-2004, and on the Editorial Board of Computational Linguistics and Machine Translation and as Associate Editor of ACM Transactions on Speech and Language Processing, the Organizing Committee of ACL-2000 and WVLC-5 (SIGDAT 1997), and the Executive Committee of the Association for Computational Linguistics (ACL).
Research interests
Computational linguistics; natural language processing; machine translation; cognitive models of human language and communication; machine learning and data mining; multilingual computing; language modeling; speech recognition; language acquisition; dialog systems; information retrieval; Internet information processing; knowledge management; computational musicology; computer music.
Specifically, for machine learning of the relationships between languages, milestone successes pioneered by my statistical machine translation (SMT) research group include:
- the first unstructured SMT models on very different languages
- 1993— Chinese/English alignment
- 1994— Chinese/English statistical machine translation
- 1994— Chinese/English phrase/collocation translation learning
- the first syntactic and tree-structured SMT models
- 1995— inversion transduction grammar (ITG; any synchronous context-free grammar that is binary, ternary, or inverting)
- 1995— bracketing ITG (BTG or BITG)
- 1995— stochastic ITG parameter estimation (EM training)
- 1996— phrasal SMT (segmental ITG)
- 1997— projection of monolingual constraints (bilingual constraint transfer; coercion)
- 1998— SMT with lingustic ITG transduction rules
- 2005— comparable corpora mining BITG
- 2009— linear inversion transduction grammar (LITG)
- 2009— linear transduction grammar (LTG)
- 2011— preterminalized linear inversion transduction grammar (PLITG)
- the first semantic SMT models
- 2005— word sense disambiguation for SMT (WSD for SMT)
- 2007— phrase sense disambiguation for SMT (PSD)
- 2007— semantic role labeling for SMT training (SRL for SMT)
- 2009— semantic role labeling for SMT decoding (SRL for SMT)
- the first semantic MT evaluation models
- 2010— human semantic MT evaluation with SRL-for-MTE (HMEANT)
- 2012— automatic semantic MT evaluation with SRL-for-MTE (MEANT)
- some of this is surveyed in my 2010 chapters
- "Alignment" in CRC Press' Handbook of Natural Language Processing
- "Lexical Semantics for Statistical Machine Translation" in DARPA's Handbook of Natural Language Processing and Machine Translation
Our active research projects are internationally funded by the US DARPA BOLT and GALE programs, the European Union EU-BRIDGE project, and the Hong Kong RGC.
Human Language Technology Center (HLTC)
Activities
- SSST-7, Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation (NAACL-HLT 2013 Workshop), June 2013, Atlanta, Georgia
- IWSLT 2012, International Workshop on Spoken Language Translation, 6-7 December 2012, Hong Kong
- SSST-6, Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation (ACL 2012 Workshop), July 2012, Jeju, South Korea
- SSST-5, Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation (ACL HLT 2011 Workshop), 23 June 2011, Portland, Oregon
- SSST-4, Fourth Workshop on Syntax and Structure in Statistical Translation (COLING 2010 Workshop), 28 August 2010, Beijing
- SSST-3, Third Workshop on Syntax and Structure in Statistical Translation (NAACL HLT 2009 Workshop), 5 June 2009, Boulder, Colorado
- SSST-2, Second Workshop on Syntax and Structure in Statistical Translation (ACL-08: HLT Workshop), 20 June 2008, Columbus, Ohio
- SSST-1, Syntax and Structure in Statistical Translation (NAACL-HLT 2007 Workshop), 26 April 2007, Rochester, New York
- CLSP Workshop 2005, Translation by Parsing, July-August 2005, Johns Hopkins University, Center for Language and Speech Processing
- EMNLP-2004, 2004 Conference on Empirical Methods in Natural Language Processing (at ACL-04), 25-26 July 2004, Barcelona, Spain
- ACL-2000, 38th Annual Meeting of the Association for Computational Linguistics, 1-8 October 2000, Hong Kong
- WVLC-5, Fifth Workshop on Very Large Corpora, 18/20 August 1997, Tsinghua/HKUST
Teaching
- COMP4211 (Machine Learning), Spring 2013
- COMP3031 (Introduction to Programming Languages), Fall 2012
- COMP4221 (Fundamentals of Artificial Intelligence), Spring 2012
- COMP3031 (Introduction to Programming Languages), Fall 2011
- COMP300H (Introduction to Natural Language Processing), Spring 2011
- COMP221 (Fundamentals of Artificial Intelligence), Fall 2010
- COMP300H (Introduction to Natural Language Processing), Spring 2010
- COMP221 (Fundamentals of Artificial Intelligence), Fall 2009
- CSIT523 (Knowledge Management), Summer 2009
- COMP151H (Object-Oriented Programming, Honors Study Track), Spring 2009
- COMP526 (Natural Language Processing), Fall 2008
- CSIT600G (Knowledge Management), Summer 2008
- COMP151H (Object-Oriented Programming, Honors Study Track), Spring 2008
- COMP251 (Introduction to Programming Languages), Fall 2007
- CSIT600G (Knowledge Management), Summer 2007
- COMP151 (Object-Oriented Programming), Spring 2007
- COMP526 (Natural Language Processing), Fall 2006
- COMP251 (Introduction to Programming Languages), Fall 2006
- COMP621N (Advanced Topics in AI), Spring 2006
- COMP151 (Object-Oriented Programming), Spring 2006
- CSIT600G (Knowledge Management), Fall 2005
- COMP621M (Advanced Topics in AI: Structural Statistical Machine Translation), Fall 2005
- COMP271 (Design and Analysis of Algorithms), Spring 2005
- COMP151 (Object-Oriented Programming), Fall 2004
- COMP621J (Advanced Topics in AI: Statistical Machine Translation), Spring 2004
- COMP526 (Natural Language Processing), Fall 2003
- COMP621H (Advanced Topics in AI: Machine Translation), Fall 2003
- COMP151 (Object-Oriented Programming), Spring 2003
- COMP171 (Data Structures and Algorithms), Fall 2002
Current/recent research students
- Markus SAERS (Sweden) PhD 2011 co-advised with Uppsala Universitet
- Marine CARPUAT (France) PhD 2008
- Jackie LO Chi Kiu (HK) PhD (MPhil 2009)
- Tyler BARTH (USA) MPhil
- Karteek ADDANKI (India) MPhil
- Ken LEE Wing Kuen (HK) MPhil 2005
Selected publications
- Markus SAERS, Karteek ADDANKI, and Dekai WU. "From Finite-State to Inversion Transductions: Toward Unsupervised Bilingual Grammar Induction". 24th International Conference on Computational
Linguistics (Coling-2012), 2325-2340. Mumbai: Dec 2012.
We report a wide range of comparative experiments establishing for the first time contrastive foundations for a completely unsupervised approach to bilingual grammar induction that is cognitively oriented toward early category formation and phrasal chunking in the bootstrapping process up the expressiveness hierarchy from finite-state to linear to inversion transduction grammars. We show a consistent improvement in terms of cross-entropy throughout the bootstrapping process, as well as promising decoding experiments using the learned grammars. Rather than relying on external resources such as parses, POS tags or dictionaries, our method is fully unsupervised (in the way this term is typically understood in the machine translation community). This means that the bootstrapping can only rely on information gathered during the previous step, which necessitates some strategy for expanding the expressiveness of the grammars. We present principled approaches for moving from finite-state to linear transduction grammars as well as from linear to inversion transduction grammars. It is our belief that early, integrated category formation and phrasal chunking in this unsupervised bootstrapping process is better aligned to child language acquisition. Finally, we also report exploratory decoding results using some of the learned grammars. This is the first step towards an end-to-end grammar-based statistical machine translation system.
- Anand Karthik TUMULURU, Chi-kiu LO and Dekai WU.
"Accuracy and robustness in measuring the lexical similarity
of semantic role fillers for automatic semantic MT evaluation".
26th Pacific Asia Conference on Language,Information and
Computation (PACLIC-26), 574-581. Bali: Nov 2012.
We present larger-scale evidence overturning previous results, showing that among the many alternative phrasal lexical similarity measures based on word vectors, the Jaccard coefficient most increases the robustness of MEANT, the recently introduced, fully-automatic, state-of-the-art semantic MT evaluation metric. MEANT critically depends on phrasal lexical similarity scores in order to automatically determine which semantic role fillers should be aligned between reference and machine translations. The robustness experiments were conducted across various data sets following NIST MetricsMaTr protocols, showing higher Kendall correlation with human adequacy judgments against BLEU, METEOR (with and without synsets), WER, PER, TER and CDER. The Jaccard coefficient is shown to be more discriminative and robust than cosine similarity, the Min/Max metric with mutual information, Jensen Shannon divergence, or the Dice's coefficient. We also show that with Jaccard coefficient as the phrasal lexical similarity metric, individual word token scores are best aggregated into phrasal segment similarity scores using the geometric mean, rather than either the arithmetic mean or competitive linking style word alignments. Furthermore, we show empirically that a context window size of 5 captures the optimal amount of information for training the word vectors. The combined results suggest a new formulation of MEANT with significantly improved robustness across data sets.
- Chi-kiu LO and Dekai WU.
"Unsupervised vs. supervised weight estimation for semantic MT evaluation metrics".
Proceedings of SSST-6, Sixth Workshop on Syntax and Structure
in Statistical Translation (at ACL 2012). Jeju, South Korea: Jul 2012.
We present an unsupervised approach to estimate the appropriate degree of contribution of each semantic role type for semantic translation evaluation, yielding a semantic MT evaluation metric whose correlation with human adequacy judgments is comparable to that of recent supervised approaches but without the high cost of a human-ranked training corpus. Our new unsupervised estimation approach is motivated by an analysis showing that the weights learned from supervised training are distributed in a similar fashion to the relative frequencies of the semantic roles. Empirical results show that even without a training corpus of human adequacy rankings against which to optimize correlation, using instead our relative frequency weighting scheme to approximate the importance of each semantic role type leads to a semantic MT evaluation metric that correlates comparable with human adequacy judgments to previous metrics that require far more expensive human rankings of adequacy over a training corpus. As a result, the cost of semantic MT evaluation is greatly reduced.
- Ondřej BOJAR and Dekai WU.
"Towards a Predicate-Argument Evaluation for MT".
Proceedings of SSST-6, Sixth Workshop on Syntax and Structure
in Statistical Translation (at ACL 2012). Jeju, South Korea: Jul 2012.
HMEANT (Lo and Wu, 2011a) is a manual MT evaluation technique that focuses on predicate-argument structure of the sentence. We relate HMEANT to an established linguistic theory, highlighting the possibilities of reusing existing knowledge and resources for interpreting and automating HMEANT. We apply HMEANT to a new language, Czech in particular, by evaluating a set of English- to-Czech MT systems. HMEANT proves to correlate with manual rankings at the sentence level better than a range of automatic metrics. However, the main contribution of this paper is the identification of several issues of HMEANT annotation and our proposal on how to resolve them.
- Chi-kiu LO, Anand Karthik TUMULURU and Dekai WU.
"Fully Automatic Semantic MT Evaluation".
7th Workshop on Statistical Machine Translation (at NAACL 2012). Montreal: Jun 2012.
We introduce the first fully automatic, fully semantic frame based MT evaluation metric, MEANT, that outperforms all other commonly used automatic metrics in correlating with human judgment on translation adequacy. Recent work on HMEANT, which is a human metric, indicates that machine translation can be better evaluated via semantic frames than other evaluation paradigms, requiring only minimal effort from monolingual humans to annotate and align semantic frames in the reference and machine translations. We propose a surprisingly effective Occam's razor automation of HMEANT that combines standard shallow semantic parsing with a simple maximum weighted bipartite matching algorithm for aligning semantic frames. The matching criterion is based on lexical similarity scoring of the semantic role fillers through a simple context vector model which can readily be trained using any publicly available large monolingual corpus. Sentence level correlation analysis, following standard NIST MetricsMATR protocol, shows that this fully automated version of HMEANT achieves significantly higher Kendall correlation with human adequacy judgments than BLEU, NIST, METEOR, PER, CDER, WER, or TER. Furthermore, we demonstrate that performing the semantic frame alignment automatically actually tends to be just as good as performing it manually. Despite its high performance, fully automated MEANT is still able to preserve HMEANT's virtues of simplicity, representational transparency, and inexpensiveness.
- Karteek ADDANKI, Chi-kiu LO, Markus SAERS, and Dekai WU.
"LTG vs. ITG Coverage of Cross-Lingual Verb Frame Alternations".
16th Annual Conference of the European Association for Machine Translation (EAMT-2012). Trento, Italy:
May 2012.
We show in an empirical study that not only did all cross-lingual alternations of verb frames across Chinese–English translations fall within the reordering capacity of Inversion Transduction Grammars, but more surprisingly, about 97% of the alternations were expressible by the far more restrictive Linear Transduction Grammars. Also, about 71% of the cross-lingual verb frame alternations turn out to be monotonic even for diverse language pairs such as Chinese–English. We also observe that a source verb frame alternation pattern translates into a small subset of the possible target verb frame alternation patterns, based on the construction of the source sentence and the frame set definitions. As a part of our evaluation, we also present a novel linear time algorithm to determine whether a particular syntactic alignment falls within the expressiveness of Linear Transduction Grammars. To our knowledge, this is the first study that attempts to analyze the cross-lingual alternation behavior of semantic frames and the extent of their coverage under syntax-based machine translation formalisms.
- Simon SHI, Pascale FUNG, Emmanuel PROCHASSON, Chi-kiu LO, and Dekai WU.
"Mining Parallel Documents Using Low Bandwidth and High Precision CLIR from the Heterogeneous Web".
5th International Joint Conference on Natural Language
Processing (IJCNLP 2011). Chiang Mai, Thailand: Nov 2011.
We propose a content-based approach to mine parallel resources from the entire web using cross lingual information retrieval (CLIR) with search query relevance score (SQRS). Our method improves mining recall by going beyond URL matching to find parallel documents from non-parallel sites. We introduce SQRS to improve the precision of mining. Our method makes use of search engines to query for target document given each source document and therefore does not require downloading target language documents in batch mode, reducing computational cost on the local machines and bandwidth consumption. We obtained a very high mining precision (88%) on the parallel documents by the pure CLIR approach. After extracting parallel sentences from the mined documents and using them to train an SMT system, we found that the SMT performance, with 29.88 BLEU score, is comparable to that obtained with high quality manually translated parallel sentences with 29.54 BLEU score, illustrating the excellent quality of the mined parallel material.
- Markus SAERS, Dekai WU, and Chris QUIRK.
"On the Expressivity of Linear Transductions".
Machine Translation Summit XIII (MT Summit
2011). Xiamen, China: Sep 2011.
We investigate the formal expressivity properties of linear transductions, the class of transductions generated by linear transduction grammars, linear inversion transduction grammars and preterminalized linear inversion transduction grammars. While empirical results such as those in previous work are of course an ultimate test of modeling adequacy for machine translation applications, it is equally important to understand the formal theoretical properties of any such new representation. An important part of the expressivity of a transduction is the possibility to align tokens between the two languages generated. We refer to the number of different alignments that are allowed under a transduction as its weak alignment capacity. This aspect of expressivity is quantified for linear transductions using preterminalized linear inversion transduction grammars, and compared to the expressivity of finite-state transductions, inversion transductions and syntax-directed transductions.
- Markus SAERS and Dekai WU.
"Linear Transduction Grammars and Zipper Finite-State Transducers".
Recent Advances in Natural Language Processing (RANLP
2011). Hissar, Bulgaria: Sep 2011.
We examine how the recently explored class of linear transductions relates to finite-state models. Historically neglected, linear transductions are gaining interest in statistical machine translation modeling, due to recent empirical studies demonstrating that their attractive balance of generative capacity and complexity characteristics lead to improved accuracy and speed in learning alignment and translation models. Such work has until now characterized the class of linear transductions in terms of either (a) linear inversion transduction grammars (LITGs) which are linearized restrictions of inversion transduction grammars or (b) linear transduction grammars (LTGs) which are bilingualized generalizations of linear grammars. In this paper, we offer a new alternative characterization of linear transductions, as relating four finite-state languages to each other. In other words, linear transductions are finite-state in four dimensions. We introduce the devices of zipper finite-state automata (ZFSAs) and zipper finite-state transducers (ZFSTs) in order to construct the bridge between linear transductions and finite-state models.
- Markus SAERS, Dekai WU, Chi-kiu LO, and Karteek ADDANKI.
"Speech
Translation with Grammar Driven Probabilistic Phrasal Bilexica Extraction".
12th Annual Conference of the International Speech
Communication Association (Interspeech 2011). Florence, Italy:
Aug 2011.
We introduce a new type of transduction grammar that allows for learning of probabilistic phrasal bilexica, leading to a significant improvement in spoken language translation accuracy. The current state-of-the-art in statistical machine translation relies on a complicated and crude pipeline to learn probabilistic phrasal bilexica---the very core of any speech translation system. In this paper, we present a more principled approach to learning probabilistic phrasal bilexica, based on stochastic transduction grammar learning applicable to speech corpora.
- Chi-kiu LO and Dekai WU.
"SMT vs. AI redux: How semantic frames evaluate MT more accurately".
22nd International Joint Conference on Artificial Intelligence (IJCAI-11). Barcelona: Jul 2011.
We argue for an alternative paradigm in evaluating machine translation quality that is strongly empirical but more accurately reflects the utility of translations, by returning to a representational foundation based on AI oriented lexical semantics, rather than the superficial flat n-gram and string representations recently dominating the field. Driven by such metrics as BLEU and WER, current SMT frequently produces unusable translations where the semantic event structure is mistranslated: who did what to whom, when, where, why, and how? We argue that it is time for a new generation of more “intelligent'' automatic and semi-automatic metrics, based clearly on getting the structure right at the lexical semantics level. We show empirically that it is possible to use simple PropBank style semantic frame representations to surpass all currently widespread metrics' correlation to human adequacy judgments, including even HTER. We also show that replacing human annotators with automatic semantic role labeling still yields much of the advantage of the approach. We combine the best of both worlds: from an SMT perspective, we provide superior yet low-cost quantitative objective functions for translation quality; and yet from an AI perspective, we regain the representational transparency and clear reflection of semantic utility of structural frame-based knowledge representations.
- Chi-kiu LO and Dekai WU.
"MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames".
49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies (ACL HLT
2011). Portland, Oregon: Jun 2011.
We introduce a novel semi-automated metric, MEANT, that assesses translation utility by matching semantic role fillers, producing scores that correlate with human judgment as well as HTER but at much lower labor cost. As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent. But more accurate, non-automatic adequacy-oriented MT evaluation metrics like HTER are highly labor-intensive, which bottlenecks the evaluation cycle. We first show that when using untrained monolingual readers to annotate semantic roles in MT output, the non-automatic version of the metric HMEANT achieves a 0.43 correlation coefficient with human adequacy judgments at the sentence level, far superior to BLEU at only 0.20, and equal to the far more expensive HTER. We then replace the human semantic role annotators with automatic shallow semantic parsing to further automate the evaluation metric, and show that even the semi-automated evaluation metric achieves a 0.34 correlation coefficient with human adequacy judgment, which is still about 80% as closely correlated as HTER despite an even lower labor cost for the evaluation procedure. The results show that our proposed metric is significantly better correlated with human judgment on adequacy than current widespread automatic evaluation metrics, while being much more cost effective than HTER.
- Markus SAERS and Dekai WU.
"Reestimation of Reified Rules in Semiring Parsing and Biparsing".
Proceedings of SSST-5, Fifth Workshop on Syntax, Semantics and
Structure in Statistical Translation (at ACL 2011). Portland,
Oregon: Jun 2011.
We show that reifying the rules from hyperedge weights to first-class graph nodes automatically gives us rule expectations in any kind of grammar expressible as a deductive system, without any explicit algorithm for calculating rule expectations (such as the inside-outside algorithm). This gives us expectation maximization training for any grammar class with a parsing algorithm that can be stated as a deductive system, for free. Having such a framework in place accelerates turn-over time for experimenting with new grammar classes and parsing algorithms---to implement a grammar learner, only the parse forest construction has to be implemented.
- Chi-kiu LO and Dekai WU.
"Structured vs. Flat Semantic Role Representations for Machine Translation
Evaluation".
Proceedings of SSST-5, Fifth Workshop on Syntax, Semantics and
Structure in Statistical Translation (at ACL 2011). Portland,
Oregon:
Jun 2011.
We argue that failing to capture the degree of contribution of each semantic frame in a sentence explains puzzling results in recent work on the MEANT family of semantic MT evaluation metrics, which have disturbingly indicated that dissociating semantic roles and fillers from their predicates actually improves correlation with human adequacy judgments even though, intuitively, properly segregating event frames should more accurately reflect the preservation of meaning. Our analysis finds that both properly structured and flattened representations fail to adequately account for the contribution of each semantic frame to the overall sentence. We then show that the correlation of HMEANT, the human variant of MEANT, can be greatly improved by introducing a simple length-based weighting scheme that approximates the degree of contribution of each semantic frame to the overall sentence. The new results also show that, without flattening the structure of semantic frames, weighting the degree of each frame's contribution gives HMEANT higher correlations than the previously best-performing flattened model, as well as HTER.
- Markus SAERS and Dekai WU.
"Principled Induction of Phrasal Bilexica".
15th Annual Conference of the European Association for Machine
Translation (EAMT-2011). Leuven, Belgium: May 2011.
We aim to replace the long and complicated, pipeline employed to produce probabilistic phrasal bilexica with a theoretically principled, grammar based, approach. To this end, we introduce a phrasal generalization of linear transduction grammars (LTGs), and an iterative induction method that works on raw corpora. Surface-based statistical machine translation (SMT) systems rely heavily on capturing the immediate context of words to be able to translate them accurately. It would be desirable to bring this power into structured SMT systems, but this is far from a trivial problem. Our immediate aim is to build a probabilistic bilexicon, which means that we would like to have a grammar where the entries constitute a natural probability distribution. Since this is not easily achievable with LTGs or linear inversion transduction grammars (LITGs), we introduce the class of preterminalized LITGs, which are equivalent to both LTGs and LITGs in terms of generative capacity, and which have the desired property of separating the lexical rules into one category whose probability distribution maps naturally to the bilexicon's. As a proof of concept, we show that phrasal bilexica, induced in this manner, can be used to improve the performance of a traditional phrase-based SMT system.
- Dekai WU. Alignment. In Nitin INDURKHYA and Fred DAMERAU (editors), CRC Handbook of Natural Language
Processing, Second Edition. 367-408. CRC Press.
2010.
In this chapter we discuss the work done on automatic alignment of parallel texts for various purposes. Fundamentally, an alignment algorithm accepts as input a bitext, and produces as output a bisegmentation relation that identifies corresponding segments between the texts. Bitext alignment fundamentally lies at the heart of all data-driven machine translation methods, and the rapid research progress on alignment since 1990 reflects the advent of statistical machine translation (SMT) and example-based machine translation (EBMT) approaches. Yet the importance of alignment extends as well to many other practical applications for translators, bilingual lexicographers, and even ordinary readers. A wide variety of techniques now exist, ranging from the most simple (counting characters or words) to the more sophisticated, sometimes involving linguistic data (lexicons) which may or may not have been automatically induced themselves. Techniques have been developed for aligning passages of various granularities: documents, paragraphs, sentences, constituents, collocations or phrases, words, and characters. Some techniques work on precisely translated parallel corpora, while others work on noisy, comparable, or non-parallel corpora. Some techniques make use of apparent morphological features, while others rely on cognates and loan-words; of particular interest is work done on languages which do not have a common writing system. Some techniques align only shallow, flat chunks, while others align compositional, hierarchical structures. The robustness and generality of different techniques has generated much discussion.
- Dekai WU, Pascale FUNG, Marine CARPUAT, Chi-kiu LO, Yongsheng YANG,
and Zhaojun WU. Lexical Semantics for Statistical Machine
Translation. In Joseph Olive, Caitlin Christianson, and John
McCary (editors), Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation.
Springer. 2010.
We present efforts toward moving statistical machine translation toward incorporating semantic modeling. The most glaring types of errors made by current systems appear to be prime targets for lexical semantics models, which have heretofore been largely absent from statistical machine translation models. Although sense disambiguation and semantic roles both appear highly relevant to translation accuracy, experience suggests that simply dropping in the existing models is unlikely to improve translation accuracy; rather, adaptations will be necessary. We discuss (1) a new Phrase Sense Disambiguation model that successfully improves statistical phrase-based translation for the first time by making three critical adaptations to traditional word sense disambiguation configurations, and (2) a series of empirical studies that illuminate more precisely the likely contribution of semantic roles in improving statistical machine translation accuracy.
- Markus SAERS, Joakim NIVRE, and Dekai WU.
"A Systematic Comparison between Inversion Transduction Grammar and Linear Transduction Grammar for Word Alignment".
Proceedings of SSST-4, Fourth Workshop on Syntax and Structure in Statistical Translation (at COLING 2010). Beijing: Aug 2010.
We present two contributions to grammar driven translation. First, since both Inversion Transduction Grammar and Linear Inversion Transduction Grammars have been shown to produce bet- ter alignments then the standard word alignment tool, we investigate how the trade-off between speed and end-to-end translation quality extends to the choice of grammar formalism. Second, we prove that Linear Transduction Grammars (LTGs) generate the same transductions as Linear Inversion Transduction Grammars, and present a scheme for arriving at LTGs by bilingualizing Linear Grammars. We also present a method for obtaining Inversion Transduction Grammars from Linear (Inversion) Transduction Grammars, which can speed up grammar induction from parallel corpora dramatically.
- Chi-kiu LO and Dekai WU.
"Semantic vs. Syntactic vs. N-gram Structure for Machine Translation Evaluation".
Proceedings of SSST-4, Fourth Workshop on Syntax and Structure in Statistical Translation (at COLING 2010). Beijing: Aug 2010.
We present results of an empirical study on evaluating the utility of the machine translation output, by assessing the accuracy with which human readers are able to complete the semantic role annotation templates. Unlike the widely-used lexical and n-gram based or syntactic based MT evaluation metrics which are fluency-oriented, our results show that using semantic role labels to evaluate the utility of MT output achieve higher correlation with human judgments on adequacy. In this study, human readers were employed to identify the semantic role labels in the translation. For each role, the filler is considered an accurate translation if it expresses the same meaning as that annotated in the gold standard reference translation. Our SRL based f-score evaluation metric has a 0.41 correlation coefficient with the human judgement on adequacy, while in contrast BLEU has only a 0.25 correlation coefficient and the syntactic based MT evaluation metric STM has only 0.32 correlation coefficient with the human judgement on adequacy. Our results strongly indicate that using semantic role labels for MT evaluation can be significantly more effective and better correlated with human judgement on adequacy than BLEU and STM.
- Markus SAERS, Joakim NIVRE, and Dekai WU.
"Linear Inversion Transduction Grammar Alignments as a Second Translation Path".
Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR (at COLING 2010). Uppsala: Jul 2010.
We explore the possibility of using Stochastic Bracketing Linear Inversion Transduction Grammars for a full-scale German–English translation task, both on their own and in conjunction with alignments induced with GIZA++. The rationale for transduction grammars, the details of the system and some results are presented.
- Markus SAERS, Joakim NIVRE, and Dekai WU.
"Word Alignment with Stochastic Bracketing Linear Inversion Transduction Grammar".
Human Language Technologies: The 2010 Annual Conference of the
North American Chapter of the Association for Computational
Linguistics (NAACL HLT 2010). Los Angeles: Jun 2010.
The class of Linear Inversion Transduction Grammars (LITGs) is briefly introduced, and used to induce a word alignment over a parallel corpus. We show that alignment via Stochastic Bracketing LITGs is considerably faster than Stochastic Bracketing ITGs, while yielding alignments superior to the widely-used heuristic of intersecting bidirectional IBM alignments. Performance is measured as the translation quality of a phrase-based machine translation system built upon the word alignments.
- Chi-kiu LO and Dekai WU.
"Evaluating Machine
Translation Utility via Semantic Role Labels".
Seventh International Conference on Language Resources and
Evaluation (LREC-2010). Malta: May 2010.
We present the methodology that underlies new metrics for semantic machine translation evaluation that we are developing. Unlike widely-used lexical and n-gram based MT evaluation metrics, the aim of semantic MT evaluation is to measure the utility of translations. We discuss the design of empirical studies to evaluate the utility of machine translation output by assessing the accuracy for key semantic roles. Such roles can be annotated using Propbank-style PRED and ARG labels. Recent work by Wu and Fung (2009) introduced methods based on automatic semantic role labeling into statistical machine translation, to enhance the quality of MT output. However, semantic SMT approaches have so far still only been evaluated using lexical and n-gram based SMT evaluation metrics such as BLEU, which are not aimed at evaluating the utility of MT output. Direct data analysis is still needed to understand how semantic models can be leveraged to evaluate the utility of MT output. In this paper, we discuss a new methodology for evaluating the utility of the machine translation output, by assessing the accuracy with which human readers are able to match the Propbank annotation frames.
- Dekai WU.
"Toward Machine Translation with Statistics and Syntax and Semantics".
IEEE Automatic Speech Recognition and Understanding Workshop
(ASRU 2009). Merano, Italy: Dec 2009.
In this paper, we survey some central issues in the historical, current, and future landscape of statistical machine translation (SMT) research, taking as a starting point an extended three-dimensional MT model space. We posit a socio-geographical conceptual disparity hypothesis, that aims to explain why language pairs like Chinese-English have presented MT with so much more difficulty than others. The evolution from simple token-based to segment-based to tree-based syntactic SMT is sketched. For tree-based SMT, we consider language bias rationales for selecting the degree of compositional power within the hierarchy of expressiveness for transduction grammars (or synchronous grammars). This leads us to inversion transductions and the ITG model prevalent in current state-of-the-art SMT, along with the underlying ITG hypothesis, which posits a language universal. Against this backdrop, we enumerate a set of key open questions for syntactic SMT. We then consider the more recent area of semantic SMT. We list principles for successful application of sense disambiguation models to semantic SMT, and describe early directions in the use of semantic role labeling for semantic SMT.
- Anders SØGAARD and Dekai WU.
"Empirical lower bounds on translation unit error rate for the full class of inversion transduction grammars".
11th International Conference on Parsing Technologies (IWPT'09). Paris: Oct 2009. 33-36.
Empirical lower bounds studies in which the frequency of alignment configurations that cannot be induced by a particular formalism is estimated, have been important for the developmenet of syntax-based machine translation formalisms. The formalism that has received most attention has been inversion transduction grammars (ITGs) (Wu, 1997). All previous work on the coverage of ITGs, however, concerns parse failure rates (PFRs) or sentence level coverage, which is not directly related to any of the evaluation measures used in machine translation. Søgaard and Kuhn (2009) induce lower bounds on translation unit error rates (TUERs) for a number of formalisms, incl. normal form ITGs, but not for the full class of ITGs. Many of the alignment configurations that cannot be induced by normal form ITGs can be induced by unrestricted ITGs, however. This paper estimates the difference and shows that the average reduction in lower bounds on TUER is 2.48 in absolute difference (16.01 in average parse failure rate).
- Markus SAERS, Joakim NIVRE, and Dekai WU.
"Learning Stochastic Bracketing Inversion Transduction Grammars with a Cubic Time Biparsing Algorithm".
11th International Conference on Parsing Technologies (IWPT'09). Paris: Oct 2009. 29-32.
We present a biparsing algorithm for Stochastic Bracketing Inversion Transduction Grammars that runs in O(bn3) time instead of O(n6). Transduction grammars learned via an EM estimation procedure based on this biparsing algorithm are evaluated directly on the translation task, by building a phrase-based statistical MT system on top of the alignments dictated by Viterbi parses under the induced bigrammars. Translation quality at different levels of pruning are compared, showing improvements over a conventional word aligner even at heavy pruning levels.
- Anders SØGAARD and Dekai WU.
"Empirical lower bounds on translation unit error rate for the full class of inversion transduction grammars".
11th International Conference on Parsing Technologies (IWPT'09). Paris: Oct 2009. 33-36.
Empirical lower bounds studies in which the frequency of alignment configurations that cannot be induced by a particular formalism is estimated, have been important for the developmenet of syntax-based machine translation formalisms. The formalism that has received most attention has been inversion transduction grammars (ITGs) (Wu, 1997). All previous work on the coverage of ITGs, however, concerns parse failure rates (PFRs) or sentence level coverage, which is not directly related to any of the evaluation measures used in machine translation. Søgaard and Kuhn (2009) induce lower bounds on translation unit error rates (TUERs) for a number of formalisms, incl. normal form ITGs, but not for the full class of ITGs. Many of the alignment configurations that cannot be induced by normal form ITGs can be induced by unrestricted ITGs, however. This paper estimates the difference and shows that the average reduction in lower bounds on TUER is 2.48 in absolute difference (16.01 in average parse failure rate).
- Dekai WU and David CHIANG (editors). Proceedings of SSST-3, Third Workshop on Syntax and Structure in Statistical Translation. NAACL HLT 2009. Boulder, Colorado: Jun 2009. [website]
- Markus SAERS and Dekai WU. "Improving Phrase-Based Translation via Word Alignments from Stochastic Inversion Transduction Grammars".
Proceedings of
SSST-3, Third Workshop on Syntax and Structure in Statistical
Translation. NAACL HLT 2009: Boulder, Colorado: Jun 2009. 28-36.
We argue that learning word alignments through a compositionally-structured, joint process yields higher phrase-based translation accuracy than the conventional heuristic of intersecting conditional models. Flawed word alignments can lead to flawed phrase translations that damage translation accuracy. Yet the IBM word alignments usually used today are known to be flawed, in large part because IBM models (1) model reordering by allowing unrestricted movement of words, rather than constrained movement of compositional units, and therefore must (2) attempt to compensate via directed, asymmetric distortion and fertility models. The conventional heuristics for attempting to recover from the resulting alignment errors involve estimating two directed models in opposite directions and then intersecting their alignments – to make up for the fact that, in reality, word alignment is an inherently joint relation. A natural alternative is provided by Inversion Transduction Grammars, which estimate the joint word alignment relation directly, eliminating the need for any of the conventional heuristics. We show that this alignment ultimately produces superior translation accuracy on BLEU, NIST, and METEOR metrics over three distinct language pairs.
- Dekai WU and Pascale FUNG. "Semantic Roles for SMT:
A Hybrid Two-Pass Model".
Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2009).
Boulder, Colorado: Jun 2009.
We present results on a novel hybrid semantic SMT model that incorporates the strengths of both semantic role labeling and phrase-based statistical machine translation. The approach avoids major complexity limitations via a two-pass architecture. The first pass is performed using a conventional phrase-based SMT model. The second pass is performed by a re-ordering strategy guided by shallow semantic parsers that produce both semantic frame and role labels. Evaluation on a Wall Street Journal newswire genre test set showed the hybrid model to yield an improvement of roughly half a point in BLEU score over a strong pure phrase-based SMT baseline – to our knowledge, the first successful application of semantic role labeling to SMT.
- Dekai WU and Pascale FUNG. "Can Semantic Role Labeling
Improve SMT?".
13th Annual Conference of the European Association for Machine Translation (EAMT 2009).
Barcelona: May 2009. 218-225.
We present a series of empirical studies aimed at illuminating more precisely the likely contribution of semantic roles in improving statistical machine translation accuracy. The experiments reported study several aspects key to success: (1) the frequencies of types of SMT errors where semantic parsing and role labeling could help, and (2) if and where semantic roles offer more accurate guidance to SMT than merely syntactic annotation, and (3) the potential quantitative impact of realistic semantic role guidance to SMT systems, in terms of BLEU and METEOR scores.
- David CHIANG and Dekai WU (editors). Proceedings of SSST-2, Second Workshop on Syntax and Structure in Statistical Translation. ACL-08: HLT, Columbus, Ohio: Jun 2008. [website]
- Marine CARPUAT and Dekai WU. "Evaluation of Context-dependent Phrasal Translation Lexicons for Statistical Machine Translation". Sixth International Conference on Language Resources and Evaluation (LREC-2008). Marrakech:
May 2008.
We present new direct data analysis showing that dynamically-built context-dependent phrasal translation lexicons are more useful resources for phrase-based statistical machine translation (SMT) than conventional static phrasal translation lexicons, which ignore all contextual information. After several years of surprising negative results, recent work suggests that context-dependent phrasal translation lexicons are an appropriate framework to successfully incorporate Word Sense Disambiguation (WSD) modeling into SMT. However, this approach has so far only been evaluated using automatic translation quality metrics, which are important, but aggregate many different factors. A direct analysis is still needed to understand how context-dependent phrasal translation lexicons impact translation quality, and whether the additional complexity they introduce is really necessary. In this paper, we focus on the impact of context-dependent translation lexicons on lexical choice in phrase-based SMT and show that context-dependent lexicons are more useful to a phrase-based SMT system than a conventional lexicon. A typical phrase-based SMT system makes use of more and longer phrases with context modeling, including phrases that were not seen very frequently in training. Even when the segmentation is identical, the context-dependent lexicons yields translations that match references more often than conventional lexicons.
- Dekai WU. "WSD for Semantic SMT: Phrase Sense Disambiguation". Second Symposium on Innovations in Machine Translation Technologies (IMTT-2008). Tokyo: Mar 2008.
- Yihai SHEN, Chi-kiu LO, Marine CARPUAT and Dekai WU. "HKUST Statistical Machine Translation Experiments for IWSLT 2007". Fourth International Workshop on Spoken Language Translation
(IWSLT 2007). Trento:
Oct 2007. 84-88.
This paper describes experiments conducted at HKUST in the IWSLT 2007 evaluation campaign on spoken language translation. Our primary objective was to compare the open-source phrase-based statistical machine translation toolkit Moses against the closed-source Pharaoh. We focused on Chinese to English translation, but we also report results on the Arabic to English, Italian to English, and Japanese to English tasks.
- Marine CARPUAT and Dekai WU. "Context-Dependent Phrasal Translation Lexicons for Statistical Machine
Translation". Machine Translation Summit XI. Copenhagen:
Sep 2007.
Most current statistical machine translation (SMT) systems make very little use of contextual information to select a translation candidate for a given input language phrase. However, despite evidence that rich context features are useful in stand-alone translation disambiguation tasks, recent studies reported that incorporating context-rich approaches from Word Sense Disambiguation (WSD) methods directly into classic word-based SMT systems, surprisingly, did not yield the expected improvements in translation quality. We argue here that, instead, it is necessary to design a context-dependent lexicon that is specifically matched to a given phrase-based SMT model, rather than simply incorporating an independently built and tested WSD module. In this approach, the baseline SMT phrasal lexicon, which uses translation probabilities that are independent of context, is augmented with a context-dependent score, defined using insights from standalone translation disambiguation evaluations. This approach reliably improves performance on both IWSLT and NIST Chinese-English test sets, producing consistent gains on all eight of the most commonly used automated evaluation metrics. We analyze the behavior of the model along a number of dimensons, including an analysis confirming that the most important context features are not available in conventional phrase-based SMT models.
- Marine CARPUAT and Dekai WU. "How Phrase Sense Disambiguation outperforms Word Sense Disambiguation for
Statistical Machine Translation". 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI 2007). Skövde, Sweden:
Sep 2007. 43-52.
We present comparative empirical evidence arguing that a generalized phrase sense disambiguation approach better improves statistical machine translation than ordinary word sense disambiguation, along with a data analysis suggesting the reasons for this. Standalone word sense disambiguation, as exemplified by the Senseval series of evaluations, typically defines the target of disambiguation as a single word. But in order to be useful in statistical machine translation, our studies indicate that word sense disambiguation should be redefined to move beyond the particular case of single word targets, and instead to generalize to multi-word phrase targets. We investigate how and why the phrase sense disambiguation approach---in contrast to recent efforts to apply traditional word sense disambiguation to SMT---is able to yield statistically significant yimprovements in translation quality even under large data conditions, and consistently improve SMT across both IWSLT and NIST Chinese-English text translation tasks. We discuss architectural issues raised by this change of perspective, and consider the new model architecture necessitated by the phrase sense disambiguation approach.
- Pascale FUNG, Zhaojun WU, Yongsheng YANG and Dekai WU. "Learning Bilingual Semantic Frames:
Shallow Semantic Parsing vs. Semantic Role Projection". 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI 2007). Skövde, Sweden:
Sep 2007. 75-84.
To explore the potential application of semantic roles in structural machine translation, we propose to study the automatic learning of English-Chinese bilingual predicate argument structure mapping. We describe ARG_ALIGN, a new model for learning bilingual semantic frames that employs monolingual Chinese and English semantic parsers to learn bilingual semantic role mappings with 72.45% F-score, given an unannotated parallel corpus. We show that, contrary to a common preconception, our ARG_ALIGN model is superior to a semantic role projection model, SYN_ALIGN, which reaches only a 46.63% F-score by assuming semantic parallelism in bilingual sentences. We present experimental data explaining that this is due to cross-lingual mismatches between argument structures in English and Chinese at 17.24% of the time. This suggests that, in any potential application to enhance machine translation with semantic structural mapping, it may be preferable to employ independent automatic semantic parsers on source and target languages, rather than assuming semantic role parallelism.
- Marine CARPUAT and Dekai WU. "Improving Statistical Machine Translation using Word Sense Disambiguation". 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007). Prague:
Jun 2007. 61-72.
We show for the first time that incorporating the predictions of a word sense disambiguation system within a typical phrase-based statistical machine translation (SMT) model consistently improves translation quality across all three different IWSLT Chinese-English test sets, as well as producing statistically significant improvements on the larger NIST Chinese-English MT task---and moreover never hurts performance on any test set, according not only to BLEU but to all eight most commonly used automatic evaluation metrics. Recent work has challenged the assumption that word sense disambiguation (WSD) systems are useful for SMT. Yet SMT translation quality still obviously suffers from inaccurate lexical choice. In this paper, we address this problem by investigating a new strategy for integrating WSD into an SMT system, that performs fully phrasal multi-word disambiguation. Instead of directly incorporating a Senseval-style WSD system, we redefine the WSD task to match the exact same phrasal translation disambiguation task faced by phrase-based SMT systems. Our results provide the first known empirical evidence that lexical semantics are indeed useful for SMT, despite claims to the contrary.
- Dekai WU and David CHIANG (editors). Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation. Rochester, New York: Apr 2007. [website]
- Dekai WU. "MT model space: Statistical vs.
compositional vs. example-based machine translation".
Machine Translation (2005) 19: 213-227.
Springer Online:
http://dx.doi.org/10.1007/s10590-006-9009-3. Berlin: Springer.
We offer a perspective on EBMT from a statistical MT standpoint, by developing a three-dimensional MT model space based on three pairs of definitions: (1) logical versus statistical MT, (2) schema-based versus example-based MT, and (3) lexical versus compositional MT. Within this space we consider the interplay of three key ideas in the evolution of transfer, example-based, and statistical approaches to machine translation. We depict how all translation models face these issues in one way or another, regardless of the school of thought, and suggest where the real questions for the future may lie.
- Dekai WU, Marine CARPUAT, and Yihai SHEN. "Inversion Transduction Grammar Coverage of Arabic-English Word Alignment for Tree-Structured Statistical Machine Translation".
IEEE/ACL 2006 Workshop on Spoken Language Technology
(SLT 2006). Aruba: Dec 2006.
We present the first known direct measurement of word alignment coverage on an Arabic-English parallel corpus using inversion transduction grammar constraints. While direct measurements have been reported for several European and Asian languages, to date no results have been available for Arabic or any Semitic language despite much recent activity on Arabic-English spoken language and text translation. Many recent syntax based statistical MT models operate within the domain of ITG expressiveness, often for efficiency reasons, so it has become important to determine the extent to which the ITG constraint assumption holds. Our results on Arabic provide further evidence that ITG expressiveness appears largely sufficient for core MT models.
- Pascale FUNG, Zhaojun WU, Yongsheng YANG, and Dekai WU. "Automatic learning of Chinese-English semantic structure mapping".
IEEE/ACL 2006 Workshop on Spoken Language Technology
(SLT 2006). Aruba: Dec 2006.
We present twin results on Chinese semantic parsing, with application to English-Chinese cross-lingual verb frame acquisition. First, we describe two new state-of-the-art Chinese shallow semantic parsers leading to an F-score of 82.01 on simultaneous frame and argument boundary identification and labeling. Subsequently, we propose a model that applies the separate Chinese and English semantic parsers to learn cross-lingual semantic verb frame argument mappings with 89.3% accuracy. The only training data needed by this cross-lingual learning model is a pair of non-parallel monolingual Propbanks, plus an unannotated parallel corpus. We also present the first reported controlled comparison of maximum entropy and SVM approaches to shallow semantic parsing, using the Chinese data.
- Marine CARPUAT, Yihai SHEN, Xiaofeng YU, and Dekai WU. "Toward Integrating
Word Sense and Entity Disambiguation into Statistical Machine
Translation".
Third International Workshop on Spoken Language Translation
(IWSLT 2006). Kyoto:
Nov 2006.
[slides]
We describe a machine translation approach being designed at HKUST to integrate semantic processing into statistical machine translation, beginning with entity and word sense disambiguation. We show how integrating the semantic modules consistently improves translation quality across several data sets. We report results on five different IWSLT 2006 speech translation tasks, representing HKUST's first participation in the IWSLT spoken language translation evaluation campaign. We translated both read and spontaneous speech transcriptions from Chinese to English, achieving reasonable performance despite the fact that our system is essentially text-based and therefore not designed and tuned to tackle the challenges of speech translation. We also find that the system achieves reasonable results on a wide range of languages, by evaluating on read speech transcriptions from Arabic, Italian, and Japanese into English.
- Xiaofeng YU, Marine CARPUAT, and Dekai WU. "Boosting for
Chinese Named Entity Recognition".
Fifth SIGHAN Workshop of the Special Interest Group for Chinese
Language Processing (SIGHAN5) at COLING/ACL 2006. Sydney, Australia:
Jul 2006.
We report an experiment in which a high-performance boosting based NER model originally designed for multiple European languages is instead applied to the Chinese named entity recognition task of the third SIGHAN Chinese language processing bakeoff. Using a simple character-based model along with a set of features that are easily obtained from the Chinese input strings, the system described employs boosting, a promising and theoretically well-founded machine learning method to combine a set of weak classifiers together into a final system. Even though we did no other Chinese-specific tuning, and used only one-third of the MSRA and CityU corpora to train the system, reasonable results are obtained. Our evaluation results show that 75.07 and 80.51 overall F-measures were obtained on MSRA and CityU test sets respectively.
- Dekai WU and Ken Wing Kuen LEE. "A grammatical approach
to understanding textual tables using two-dimensional SCFGs".
21st International Conference on Computational Linguistics and
44th Annual Meeting of the Association for Computational Linguistics
(COLING/ACL 2006). Sydney, Australia: Jul 2006.
We present an elegant and extensible model that is capable of providing semantic interpretations for an unusually wide range of textual tables in documents. Unlike the few existing table analysis models, which largely rely on relatively ad hoc heuristics, our linguistically-oriented approach is systematic and grammar based, which allows our model (1) to be concise and yet (2) recognize a wider range of data models than others, and (3) disambiguate to a significantly finer extent the underlying semantic interpretation of the table in terms of data models drawn from relation database theory. To accomplish this, the model introduces Viterbi parsing under two-dimensional stochastic CFGs. The cleaner grammatical approach facilitates not only greater coverage, but also grammar extension and maintenance, as well as a more direct and declarative link to semantic interpretation, for which we also introduce a new, cleaner data model. In disambiguation experiments on recognizing relevant data models of unseen web tables from different domains, a blind evaluation of the model showed 60% precision and 80% recall.
- Dekai WU. "Textual
Entailment Recognition Based on Inversion Transduction Grammars".
In Joaquin QUIÑONERO CANDELA, Ido DAGAN, Bernardo MAGNINI, and
Florence d'ALCHÉ-BUC (editors),
"Machine Learning Challenges, Evaluating Predictive Uncertainty,
Visual Object Classification and Recognizing Textual Entailment",
Lecture Notes in Computer Science (2006) 3944: 299-308.
Springer Online:
http://dx.doi.org/10.1007/11736790_17. Berlin: Springer.
The PASCAL Challenge's textual entailment recognition (RTE) task presents intriguing opportunities to test various implications of the strong language universal constraint posited by Wu's (1995, 1997) Inversion Transduction Grammar (ITG) hypothesis. The ITG Hypothesis provides a strong inductive bias, and has been repeatedly shown empirically to yield both efficiency and accuracy gains for numerous language acquisition tasks. Since the RTE challenge abstracts over many tasks, it invites meaningful analysis of the ITG Hypothesis across tasks including information retrieval, comparable documents, reading comprehension, question answering, information extraction, machine translation, and paraphrase acquisition. We investigate two new models for the RTE problem that employ simple generic Bracketing ITGs. Experimental results show that, even in the absence of any thesaurus to accommodate lexical variation between the Text and the Hypothesis strings, surprisingly strong results for a number of the task subsets are obtainable from the Bracketing ITG's structure matching bias alone.
- Dekai WU and Pascale FUNG. "Inversion Transduction Grammar Constraints for Mining Parallel
Sentences from Quasi-Comparable Corpora".
Second International Joint Conference on
Natural Language Processing (IJCNLP-2005). Jeju, South Korea: Oct 2005.
We present a new implication of Wu's (1997) Inversion Transduction Grammar (ITG) Hypothesis, on the problem of retrieving truly parallel sentence translations from large collections of highly non-parallel documents. Our approach leverages a strong language universal constraint posited by the ITG Hypothesis, that can serve as a strong inductive bias for various language learning problems, resulting in both efficiency and accuracy gains. The task we attack is highly practical since non-parallel multilingual data exists in far greater quantities than parallel corpora, but parallel sentences are a much more useful resource. Our aim here is to mine truly parallel sentences, as opposed to comparable sentence pairs or loose translations as in most previous work. The method we introduce exploits Bracketing ITGs to produce the first known results for this problem. Experiments show that it obtains large accuracy gains on this task compared to the expected performance of state-of-the-art models that were developed for the less stringent task of mining comparable sentence pairs.
- Marine CARPUAT and Dekai WU. "Evaluating the Word
Sense Disambiguation Performance of Statistical Machine Translation".
Second International Joint Conference on
Natural Language Processing (IJCNLP-2005). Jeju, South Korea: Oct 2005.
We present the first known empirical test of an increasingly common speculative claim, by evaluating a representative Chinese-to-English SMT model directly on word sense disambiguation performance, using standard WSD evaluation methodology and datasets from the Senseval-3 Chinese lexical sample task. Much effort has been put in designing and evaluating dedicated word sense disambiguation (WSD) models, in particular with the Senseval series of workshops. At the same time, the recent improvements in the BLEU scores of statistical machine translation (SMT) suggests that SMT models are good at predicting the right translation of the words in source language sentences. Surprisingly however, the WSD accuracy of SMT models has never been evaluated and compared with that of the dedicated WSD models. We present controlled experiments showing the WSD accuracy of current typical SMT models to be significantly lower than that of all the dedicated WSD models considered. This tends to support the view that despite recent speculative claims to the contrary, current SMT models do have limitations in comparison with dedicated WSD models, and that SMT should benefit from the better predictions made by the WSD models.
- Marine CARPUAT and Dekai WU. "Word Sense Disambiguation
vs. Statistical Machine Translation". 43rd Annual Meeting of the
Association for Computational Linguistics (ACL-2005). Ann Arbor, MI:
Jun 2005.
We directly investigate a subject of much recent debate: do word sense disambigation models help statistical machine translation quality? We present empirical results casting doubt on this common, but unproved, assumption. Using a state-of-the-art Chinese word sense disambiguation model to choose translation candidates for a typical IBM statistical MT system, we find that word sense disambiguation does not yield significantly better translation quality than the statistical machine translation system alone. Error analysis suggests several key factors behind this surprising finding, including inherent limitations of current statistical MT architectures.
- Dekai WU. "Recognizing
Paraphrases and Textual Entailment using Inversion Transduction Grammars".
ACL-2005 Workshop on Empirical Modeling of Semantic Equivalence and Entailment. Ann Arbor, MI: Jun 2005.
We present first results using paraphrase as well as textual entailment data to test the language universal constraint posited by Wu's (1995, 1997) Inversion Transduction Grammar (ITG) hypothesis. In machine translation and alignment, the ITG Hypothesis provides a strong inductive bias, and has been shown empirically across numerous language pairs and corpora to yield both efficiency and accuracy gains for various language acquisition tasks. Monolingual paraphrase and textual entailment recognition datasets, however, potentially facilitate closer tests of certain aspects of the hypothesis than bilingual parallel corpora, which simultaneously exhibit many irrelevant dimensions of cross-lingual variation. We investigate this using simple generic Bracketing ITGs containing no language-specific linguistic knowledge. Experimental results on the MSR Paraphrase Corpus show that, even in the absence of any thesaurus to accommodate lexical variation between the paraphrases, an uninterpolated average precision of at least 76% is obtainable from the Bracketing ITG's structure matching bias alone. This is consistent with experimental results on the Pascal Recognising Textual Entailment Challenge Corpus, which show surpisingly strong results for a number of the task subsets.
- Dekai WU. "Textual
Entailment Recognition Based on Inversion Transduction Grammars".
Pattern Analysis, Statistical Modelling and Computational Learning
(PASCAL Challenges Workshop - Recognising Textual Entailment
Challenge). Southampton, UK: Apr 2005.
Also in Joaquin QUIÑONERO CANDELA, Ido DAGAN, Bernardo MAGNINI, and Florence d'ALCHÉ-BUC (editors), Machine Learning Challenges, Lecture Notes in Computer Science 3944, MLCW 2005, 2006. Heidelberg: Springer-Verlag.
The PASCAL Challenge's textual entailment recognition (RTE) task presents intriguing opportunities to test various implications of the strong language universal constraint posited by Wu's (1995, 1997) Inversion Transduction Grammar (ITG) hypothesis. The ITG Hypothesis provides a strong inductive bias, and has been repeatedly shown empirically to yield both efficiency and accuracy gains for numerous language acquisition tasks. Since the RTE challenge abstracts over many tasks, it invites meaningful analysis of the ITG Hypothesis across tasks including information retrieval, comparable documents, reading comprehension, question answering, information extraction, machine translation, and paraphrase acquisition. We investigate two new models for the RTE problem that employ simple generic Bracketing ITGs. Experimental results show that, even in the absence of any thesaurus to accommodate lexical variation between the Text and the Hypothesis strings, surprisingly strong results for a number of the task subsets are obtainable from the Bracketing ITG's structure matching bias alone.
- Pascale FUNG, LIU Yi, YANG Yongsheng, Yihai SHEN, and Dekai WU.
"A
Grammar-Based Chinese to English Speech Translation System for Portable
Devices". 8th International Conference on Spoken Language
Processing (INTERSPEECH 2004 - ICSLP). Jeju, South Korea: Oct 2004.
Portable devices such as PDA phones and smart phones are increasingly popular. Many of these devices already have voice dialing capability. The next step is to offer more powerful personal-assistant features such as speech translation. In this paper, we propose a system that can translate speech commands in Chinese into English, in real-time, on small, portable devices with limited memory and computational power. We address the various computational and platform issues of speech recognition and translation on portable devices. We propose fixed-point computation, discrete front-end speech features, bi-phone acoustic models, grammar-based speech decoding, and unambiguous inversion transduction grammars for transfer-based translation. As a result, our speech translation system requires only 500k memory and a 200MHz CPU.
- Dekai WU, Grace NGAI, and Marine CARPUAT. "Why Nitpicking
Works: Evidence for Occam's Razor in Error Correctors". 20th
International Conference on Computational Linguistics (COLING-2004).
Geneva: Aug 2004.
Empirical experience and observations have shown us when powerful and highly tunable classifiers such as maximum entropy classifiers, boosting and SVMs are applied to language processing tasks, it is possible to achieve high accuracies, but eventually their performances all tend to plateau out at around the same point. To further improve performance, various error correction mechanisms have been developed, but in practice, most of them cannot be relied on to predictably improve performance on unseen data; indeed, depending upon the test set, they are as likely to degrade accuracy as to improve it. This problem is especially severe if the base classifier has already been finely tuned. In recent work, we introduced N-fold Templated Piped Correction, or NTPC (``nitpick''), an intriguing error corrector that is designed to work in these extreme operating conditions. Despite its simplicity, it consistently and robustly improves the accuracy of existing highly accurate base models. This paper investigates some of the more surprising claims made by NTPC, and presents experiments supporting an Occam's Razor argument that more complex models are damaging or unnecessary in practice.
- Weifeng SU, Marine CARPUAT, and Dekai WU. "Semi-Supervised
Training of a Kernel PCA-Based Model for Word Sense Disambiguation".
20th International Conference on Computational Linguistics
(COLING-2004). Geneva: Aug 2004.
In this paper, we introduce a new semi-supervised learning model for word sense disambiguation based on Kernel Principal Component Analysis (KPCA), with experiments showing that it can further improve accuracy over supervised KPCA models that have achieved WSD accuracy superior to the best published individual models. Although empirical results with supervised KPCA models demonstrate significantly better accuracy compared to the state-of-the-art achieved by either naive Bayes or maximum entropy models on Senseval-2 data, we identify specific sparse data conditions under which supervised KPCA models deteriorate to essentially a most-frequent-sense predictor. We discuss the potential of KPCA for leveraging unannotated data for partially-unsupervised training to address these issues, leading to a composite model that combines both the supervised and semi-supervised models.
- Dekai WU, Weifeng SU, and Marine CARPUAT. "A Kernel PCA Method for
Superior Word Sense Disambiguation". 42nd Annual Meeting of the
Association for Computational Linguistics (ACL-2004). Barcelona: Jul
2004.
We introduce a new method for disambiguating word senses that exploits a nonlinear Kernel Principal Component Analysis (KPCA) technique to achieve accuracy superior to the best published individual models. We present empirical results demonstrating significantly better accuracy compared to the state-of-the-art achieved by either naive Bayes or maximum entropy models, on Senseval-2 data. We also contrast against another type of kernel method, the support vector machine (SVM) model, and show that our KPCA-based model outperforms the SVM-based model. It is hoped that these highly encouraging first results on KPCA for natural language processing tasks will inspire further development of these directions.
- Dekai WU and Yihai SHEN. "An Efficient
Algorithm to Induce Minimum Average Lookahead Grammars for Incremental LR
Parsing". ACL-2004 Workshop on Incremental Parsing: Bringing
Engineering and Cognition Together. Barcelona: Jul 2004.
We define a new learning task, minimum average lookahead grammar induction, with strong potential implications for incremental parsing in NLP and cognitive models. Our thesis is that a suitable learning bias for grammar induction is to minimize the degree of lookahead required, on the underlying tenet that language evolution drove grammars to be efficiently parsable in incremental fashion. The input to the task is an unannotated corpus, plus a nondeterministic constraining grammar that serves as an abstract model of environmental constraints confirming or rejecting potential parses. The constraining grammar typically allows ambiguity and is itself poorly suited for an incremental parsing model, since it gives rise to a high degree of nondeterminism in parsing. The learning task, then, is to induce a deterministic LR(k) grammar under which it is possible to incrementally construct one of the correct parses for each sentence in the corpus, such that the average degree of lookahead needed to do so is minimized. This is a significantly more difficult optimization problem than merely compiling LR(k) grammars, since kis not specified in advance. Clearly, naive approaches to this optimization can easily be computationally infeasible. However, by making combined use of GLR ancestor tables and incremental LR table construction methods, we obtain an O(n3+2m) greedy approximation algorithm for this task that is quite efficient in practice.
- Marine CARPUAT, Weifeng SU, and Dekai WU. "Augmenting Ensemble
Classification for Word Sense Disambiguation with a Kernel PCA
Model". Third International Workshop on the Evaluation of Systems
for the Semantic Analysis of Text (Senseval-3). ACL-2004 Workshop.
Barcelona: Jul 2004.
The HKUST word sense disambiguation systems benefit from a new nonlinear Kernel Principal Component Analysis (KPCA) based disambiguation technique. We discuss and analyze results from the Senseval-3 English, Chinese, and Multilingual Lexical Sample data sets. Among an ensemble of four different kinds of voted models, the KPCA-based model, along with the maximum entropy model, outperforms the boosting model and naive Bayes model. Interestingly, while the KPCA-based model typically achieves close or better accuracy than the maximum entropy model, nevertheless a comparison of predicted classifications shows that it has a significantly different bias. This characteristic makes it an excellent voter, as confirmed by results showing that removing the KPCA-based model from the ensemble generally degrades performance.
- Grace NGAI, Dekai WU, Marine CARPUAT, Chi-Shing WANG, and
Chi-Yung WANG. "Semantic Role
Labeling with Boosting, SVMs, Maximum Entropy, SNOW, and Decision
Lists". Third International Workshop on the Evaluation of Systems
for the Semantic Analysis of Text (Senseval-3). ACL-2004 Workshop.
Barcelona: Jul 2004.
This paper describes the HKPolyU-HKUST systems which were entered into the Semantic Role Labeling task in Senseval-3. Results show that these systems, which are based upon common machine learning algorithms, all manage to achieve good performances on the non-restricted Semantic Role Labeling task.
- Richard WICENTOWSKI, Grace NGAI, Dekai WU, Marine CARPUAT, Emily
THOMFORDE, and Adrian PACKEL. "Joining
forces to resolve lexical ambiguity: East meets West in Barcelona".
Third International Workshop on the Evaluation of Systems for the
Semantic Analysis of Text (Senseval-3). ACL-2004 Workshop.
Barcelona: Jul 2004.
This paper describes the component models and combination model built as a joint effort between Swarthmore College, Hong Kong Poly U, and HKUST. Though other models described elsewhere contributed to the final combination model, this paper focuses solely on the joint contributions to the ``Swat-HK'' effort.
- Dekai WU, Grace NGAI, and Marine CARPUAT. "Raising the Bar:
Stacked Conservative Error Correction Beyond Boosting". Fourth
International Conference on Language Resources and Evaluation
(LREC-2004). Lisbon: May 2004.
We introduce a conservative error correcting model, Stacked TBL, that is designed to improve the performance of even high-performing models like boosting, with little risk of accidentally degrading performance. Stacked TBL is particularly well suited for corpus-based natural language applications involving high-dimensional feature spaces, since it leverages the characteristics of the TBL paradigm that we appropriate. We consider here the task of automatically annonating named entities in text corpora. The task does pose a number of challenges for TBL, to which there are some simple yet effective solutions. We discuss the empirical behavior of Stacked TBL, and consider evidence that despite its simplicity, more complex and time-consuming variants are not generally required.
- Lufeng ZHAI, Pascale FUNG, Richard SCHWARTZ, Marine CARPUAT and
Dekai WU. "Using
N-best Lists for Named Entity Recognition from Chinese Speech".
Human Language Technology Conference of the North American Chapter of
the Association for Computational Linguistics (HLT/NAACL-2004).
Boston: May 2004.
We present the first known result for named entity recognition (NER) in realistic large-vocabulary spoken Chinese. We establish this result by applying a maximum entropy model, currently the single best known approach for textual Chinese NER, to the recognition output of the BBN LVCSR system on Chinese Broadcast News utterances. Our results support the claim that transferring NER approaches from text to spoken language is a significantly more difficult task for Chinese than for English. We propose re-segmenting the ASR hypotheses as well as applying post-classification to improve the performance. Finally, we introduce a method of using n-best hypotheses that yields a small but nevertheless useful improvement NER accuracy. We use acoustic, phonetic, language model, NER and other scores as confidence measure. Experimental results show an average of 6.7% relative improvement in precision and 1.7% relative improvement in F-measure.
- Dekai WU, Grace NGAI, and Marine CARPUAT. "N-fold Templated
Piped Correction". First International Joint Conference on
Natural Language Processing (IJCNLP-2004). Hainan, China: Mar 2004.
We describe a broadly-applicable conservative error correcting model, N-fold Templated Piped Correction (NTPC), that consistently improves the accuracy of existing high-accuracy base models. Under circumstances where most obvious approaches actually reduce accuracy more than they improve it, NTPC nevertheless comes with little risk of accidentally degrading performance. NTPC is particularly well suited for natural language applications involving high-dimensional feature spaces, such as bracketing and disambiguation tasks, since its easily customizable template-driven learner allows efficient search over the kind of complex feature combinations that have typically eluded the base models. We show empirically that NTPC yields small but consistent accuracy gains on top of even high-performing models like boosting. We also give evidence that the various extreme design parameters in NTPC are indeed necessary for the intended operating range, even though they diverge from usual practice.
- Dekai WU. "The HKUST leading
question translation system". Machine Translation Summit IX. New Orleans:
Sep 2003.
Slides from Have we found the Holy Grail? (Panel with Ed Hovy, Elliot Macklovitch (chair), Hermann Ney, Steve Richardson, and Dekai Wu.)
- Dekai WU, Grace NGAI, and Marine CARPUAT. "A stacked, voted,
stacked model for named entity recognition". Computational
Natural Language Learning (CoNLL-2003), at Human Language
Technology Conference of the North American Chapter of the Association of
Computational Linguistics (HLT/NAACL-2003). Edmonton, Canada: May
2003.
This paper investigates stacking and voting methods for combining strong classifiers like boosting, SVM, and TBL, on the named-entity recognition task. We demonstrate several effective approaches, culminating in a model that achieves error rate reductions on the development and test sets of 63.6% and 55.0% (English) and 47.0% and 51.7% (German) over the CoNLL-2003 standard baseline respectively, and 19.7% over a strong AdaBoost baseline model from CoNLL-2002.
- Dekai WU, Grace NGAI, Marine CARPUAT, Jeppe LARSEN, and Yongshen
YANG. "Boosting for named
entity recognition". Computational Natural Language Learning
(CoNLL-2002), at 19th International Conference on Computational
Linguistics (Coling-2002), 195-198. Taipei: Sep 2002.
This paper presents a system that applies boosting to the task of named-entity identification. The CoNLL-2002 shared task, for which the system is designed, is language-independent named-entity recognition. Using a set of features which are easily obtainable for almost any language, the presented system uses boosting to combine a set of weak classifiers into a final system that performs significantly better than that of an off-the-shelf maximum entropy classifier.
- Robert WILENSKY, David N CHIN, Marc LURIA, James MARTIN, James
MAYFIELD, and Dekai WU. "The Berkeley UNIX Consultant Project". In
Stephen J HEGNER, Paul McKEVITT, Peter NORVIG, and Robert WILENSKY
(editors), Intelligent Help Systems for Unix. 49-94. Dordrecht:
Kluwer. ISBN 0-7923-6641-7. May 2001.
Also in Artificial Intelligence Review 14(1-2): 43-88 (2000).
UC (UNIX Consultant) is an intelligent, natural-language interface that allows naive users to learn about the UNIX operating system. UC was undertaken because the task was thought to be both a fertile domain for Artificial Intelligence research and a useful application of AI work in planning, reasoning, natural language processing, and knowledge representation.
- Dekai WU. "Bracketing and
aligning words and constituents in parallel text using Stochastic
Inversion Transduction Grammars". In Jean VERONIS (editor),
Parallel Text Processing: Alignment and Use of Translation
Corpora. Dordrecht: Kluwer. ISBN 0-7923-6546-1. Aug 2000.
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major features distinguish the formalism from the finite-state transducers more traditionally found in computational linguistics: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm. A convenient normal form is shown to exist. Analysis of the formalism's expressiveness suggests that it is particularly well-suited to model ordering shifts between languages, balancing needed flexibility against complexity constraints. We discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks such as segmentation, bracketing, phrasal alignment, and parsing.
- Dekai WU. "Alignment". In Robert DALE, Hermann MOISL, and Harold
SOMERS (editors), Handbook of Natural Language Processing.
415-458. New York: Marcel Dekker. ISBN 0-8247-9000-6. Jul 2000.
In this chapter we discuss the work done on automatic alignment of parallel texts for various purposes. Fundamentally, an alignment algorithm accepts as input a bitext, and produces as output a map that identifies corresponding passages between the texts. A rapidly-growing body of research on bitext alignment, beginning around 1990, attests to the importance of alignment to translators, bilingual lexicographers, adaptive machine translation systems, and even ordinary readers. A wide variety of techniques now exist, ranging from the most simple (counting characters or words) to the more sophisticated, sometimes involving linguistic data (lexicons) which may or may not have been automatically induced themselves. Techniques have been developed for aligning passages of various granularities: paragraphs, sentences, constituents, collocations, and words. Some techniques make use of apparent morphological features. Others rely on cognates and loan-words. Of particular interest is work done on languages which do not have a common writing system. The robustness and generality of different techniques has generated much discussion.
- SUI Zhifang, ZHAO Jun, and Dekai WU. "An
Information-Theory-Based Feature Type Analysis for the Modelling of
Statistical Parsing". ACL-2000. Hong Kong: Oct 2000.
The paper proposes an information-theory-based method for feature types analysis in probabilistic evaluation modelling for statistical parsing. The basic idea is that we use entropy and conditional entropy to measure whether a feature type grasps some of the information for syntactic structure prediction. Our experiment quantitatively analyzes several feature types' power for syntactic structure prediction and draws a series of interesting conclusions.
- Yanlei DIAO, Hongjun LU, and Dekai WU. "A comparative study of
classification based personal e-mail filtering". Fourth
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD
2000): 408-419. Kyoto: Apr 2000.
This paper addresses personal E-mail filtering by casting it in the framework of text classification. Modeled as semi-structured documents, E-mail messages consist of a set of fields with predefined semantics and a number of variable length free-text fields. While most work on classification either concentrates on structured data or free text, the work in this paper deals with both of them. To perform classification, a naive Bayesian classifier was designed and implemented, and a decision tree based classifier was implemented. The design considerations and implementation issues are discussed. Using a relatively large amount of real personal E-mail data, comprehensive comparative study was conducted using the two classifiers. The importance of different features is reported. Results of other issues related to building an effective personal E-mail classifier are presented and discussed. It is shown that both classifiers can perform filtering with reasonable accuracy. While the decision tree based classifier outperforms the Bayesian classifier when features and training size are selected optimally for both, a carefully designed naive Bayesian classifier is more robust.
- Dekai WU, SUI Zhifang, and ZHAO Jun. "An information-based method for
selecting feature types for word prediction". Sixth European
Conference on Speech Communication and Technology (EUROSPEECH'99).
Budapest: Sep 1999.
This paper uses an information-based approach to conduct feature types selection for language modeling in a systematic manner. We describe a quantitative analysis of the information gain and the information redundancy for various combinations of feature types inspired by both dependency structure and bigram structure through analyzing an English treebank corpus and taking word prediction as the object. The experiments yield several conclusions on the predictive value of several feature types and feature types combinations for word prediction, which are expected to provide reliable reference for feature type selection in language modeling.
- Aboy WONG and Dekai WU. "Learning a lightweight robust
deterministic parser". Sixth European Conference on Speech
Communication and Technology (EUROSPEECH'99). Budapest: Sep 1999.
We describe a method for automatically learning a parser from labeled, bracketed corpora that results in a fast, robust, lightweight parser that is suitable for real-time dialog systems and similar applications. Unlike ordinary parsers, all grammatical knowledge is captured in the learned decision trees, so no explicit phrase-structure grammar is needed. Another characteristic of the architecture is robustness, since the input need not fit pre-specified productions. The runtime architecture is very slim and references two learned decision trees that allow the parser to operate in a "strictly deterministic" manner in Marcus' (1977) sense. Even without using specific lexical features, we have achieved respectable labeled bracket accuracies of about 81% precision and 82% recall. Processing speed is more than 500 words per CPU second. We keep the parameter space small (in comparison to other statistically learned parsers) by using only part-of-speech tags and constituent labels as features for learning the decision trees. Without any optimization, the decision trees consume only 6M of memory, making it possible to run on platforms with limited memory. The learning method is readily applicable to other languages. Preliminary experiments on a Chinese corpus (which contains about 3000 sentences from Chinese primary school text) have yielded results comparable to that for English.
- Vincent CHOW and Dekai WU. "On the use of right context in
sense-disambiguating language models". Sixth European Conference
on Speech Communication and Technology (EUROSPEECH'99). Budapest:
Sep 1999.
We investigate the utility of right-context (look-ahead information) in incremental left-to-right language models with word sense disambiguation, and discover somewhat unexpectedly that using right-context in addition to left-context (history) may actually reduce accuracy. We describe a left-to-right incremental naive-Bayes sense disambiguator, and then experimentally evaluate three apparently well-motivated extensions to take into account right-context information. The results argue that the contribution of right-context is limited, and that using it would probably necessitate sacrificing pure left-to-right processing.
- Shuwu ZHANG, Harald SINGER, Dekai WU, Yoshinori SAGISAKA. "Improving n-gram modeling using
distance-related unit association maximum entropy language modeling".
Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99). Budapest: Sep 1999.
In this paper, a distance-related unit association maximum entropy (DUAME) language modeling is proposed. This approach can model an event (unit subsequence) using the co-occurrence of full distance unit association (UA) features so that it is able to pursue a functional approximation to higher order N-gram with significantly less memory requirement. A smoothing strategy related to this modeling will also be discussed. Preliminary experimental results have shown that DUAME modeling is comparable to conventional N-gram modeling in perplexity.
- Daniel CHAN Ka-Leung and Dekai WU. "Automatically merging
lexicons that have incompatible part-of-speech categories". Joint
SIGDAT Conference on Empirical Methods in Natural Language Processing and
Very Large Corpora (EMNLP/VLC-99). Maryland: Jun 1999.
We present a new method to automatically merge lexicons that employ different incompatible POS categories. Such incompatibilities have hindered efforts to combine lexicons to maximize coverage with reasonable human effort. Given an "original lexicon", our method is able to merge lexemes from an "additional lexicon" into the original lexicon, converting lexemes from the additional lexicon with about 89% precision. This level of precision is achieved with the aid of a device we introduce called an anti-lexicon, which neatly summarizes all the essential information we need about the co-occurrence of tags and lemmas. Our model is intuitive, fast, easy to implement, and does not require heavy computational resources nor training corpus.
- Dekai WU, ZHAO Jun, and SUI Zhifang. "An
information-theoretic empirical analysis of dependency-based feature
types for word prediction models". Joint SIGDAT Conference on
Empirical Methods in Natural Language Processing and Very Large Corpora
(EMNLP/VLC-99). Maryland: Jun 1999.
Over the years, many proposals have been made to incorporate assorted types of feature in language models. However, discrepancies between training sets, evaluation criteria, algorithms, and hardware environments make it difficult to compare the models objectively. In this paper, we take an information theoretic approach to select feature types in a systematic manner. We describe a quantitative analysis of the information gain and the information redundancy for various combinations of feature types inspired by both dependency structure and bigram structure, using a Chinese treebank and taking word prediction as the object. The experiments yield several conclusions on the predictive value of several feature types and feature types combinations for word prediction, which are expected to provide guidelines for feature type selection in language modeling.
- Aboy WONG and Dekai WU. "Are phrase structured
grammars useful in statistical parsing?". NLPRS 1999.
Beijing: Nov 1999.
In this paper, we argue: (1) To parse accurately, a grammar is not necessary. (2) It is possible to parse deterministically by not conforming to an explicit grammar. We support the above claims by presenting our parser, which is lightweight, grammar-less, deterministic and have the highest accuracy among tag based parsers. The speed of our parser is more than 500 words per CPU second and only 6M of memory is needed for loading the parsing model. In our architecture, the grammatical information is captured by the parsing model. Our parsing model differs from others in that, extra information about how to group constituents are provided. Thus an explicit grammar is not needed in our algorithm.
- Michael CARL and Dekai WU. "Inferring maximally
invertible bi-grammars for example-based machine translation".
NLPRS 1999. Beijing: Nov 1999.
This paper discusses inference strategies of context-free bi-grammars for example based machine translation (EBMT). The EBMT system EDGAR is discussed in detail. The notion of invertible contextfree feature bi-grammar is introduced in order to provide a means to decide upon the degree of ambiguity of the inferred bi-grammar. It is claimed that a maximally invertible bi-grammar can enhance the precision of the bilingual alignment process, reduce the complexity of the inferred grammar, and uncover inconsistencies in bi-corpora. This paper describes preliminary reflections and thus no empirical evaluation of the method is provided.
- Dekai WU and Hongsing WONG. "Machine translation with a
stochastic grammatical channel". COLING-ACL'98. Montreal:
Aug 1998.
We introduce a stochastic grammatical channel model for machine translation, that synthesizes several desirable characteristics of both statistical and grammatical machine translation. As with the pure statistical translation model described by Wu (1996) (in which a bracketing transduction grammar models the channel), alternative hypotheses compete probabilistically, exhaustive search of the translation hypothesis space can be performed in polynomial time, and robustness heuristics arise naturally from a language-independent inversion-transduction model. However, unlike pure statistical translation models, the generated output string is guaranteed to conform to a given target grammar. The model employs only (1) a translation lexicon, (2) a context-free grammar for the target language, and (3) a bigram language model. The fact that no explicit bilingual translation rules are used makes the model easily portable to a variety of source languages. Initial experiments show that it also achieves significant speed gains over our earlier model.
- Dekai WU. "A position statement on Chinese segmentation". Presented at the Chinese Language Processing Workshop, University of Pennsylvania, Philadelphia, Jul 1998.
- Dekai WU. "Stochastic inversion
transduction grammars and bilingual parsing of parallel corpora".
Computational Linguistics 23(3):377-404, Sep 1997.
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major features distinguish the formalism from the finite-state transducers more traditionally found in computational linguistics: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm. A convenient normal form is shown to exist. Analysis of the formalism's expressiveness suggests that it is particularly well-suited to model ordering shifts between languages, balancing needed flexibility against complexity constraints. We discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks such as segmentation, bracketing, phrasal alignment, and parsing.
- Ciprian CHELBA, David ENGLE, Frederick JELINEK, Victor JIMENEZ, Sanjeev
KHUDANPUR, Lidia MANGU, Harry PRINTZ, Eric RISTAD, Ronald ROSENFELD,
Andreas STOLCKE, and Dekai WU. "Structure and performance of a
dependency language model". EUROSPEECH'97. Rhodes, Greece:
Sep 1997.
We present a maximum entropy language model that incorporates both syntax and semantics via a dependency grammar. Such a grammar expresses the relations between words by a directed graph. Because the edges of this graph may connect words that are arbitrarily far apart in a sentence, this technique can incorporate the predictive power of words that lie outside of bigram or trigram range. We have built several simple dependency models, as we call them, and tested them in a speech recognition experiment. We report experimental results for these models here, including one that has a small but statistically significant advantage (p < .02) over a bigram language model.
- Pascale FUNG, Bertram SHI, Dekai WU, LAM Wai Bun, and WONG Shuen
Kong. "Dealing with
multilinguality in a spoken language query translator".
ACL/EACL-97 Workshop on Spoken Language Translation. Madrid: Jul
1997.
Robustness is an important issue for multilingual speech interfaces for spoken language translation systems. We have studied three aspects of robustness in such a system: accent differences, mixed language input, and the use of common feature sets for HMM-based speech recognizers for English and Cantonese. The results of our preliminary experiments show that accent differences case recognizer performance to degrade. A rather surprising finding is that for mixed language input, a straightforward implementation of a mixed language model-based speech recognizer performs less well than the concatenation of pure language recognizers. Our experimental results also show that a common feature set, parameter set, and common algorithm lead to different performance output for Cantonese and English speech recognition modules.
- Dekai WU. "A
polynomial-time algorithm for statistical machine translation".
ACL-96: 34th Annual Meeting of the Assoc. for Computational
Linguistics. Santa Cruz, CA: Jun. 1996.
We introduce a polynomial-time algorithm for statistical machine translation. This algorithm can be used in place of the expensive, slow best-first search strategies in current statistical translation architectures. The approach employs the stochastic bracketing transduction grammar (SBTG) model we recently introduced to replace earlier word alignment channel models, while retaining a bigram language model. The new algorithm in our experience yields major speed improvement with no significant loss of accuracy.
- Xuanyin XIA and Dekai WU. "Parsing Chinese with an
almost-context-free grammar". EMNLP-96, Conference on Empirical
Methods in Natural Language Processing. Philadelphia: May 1996.
We describe a novel parsing strategy we are employing for Chinese. We believe progress in Chinese parsing technology has been slowed by the excessive ambiguity that typically arises in pure context-free grammars. This problem has inspired a modified formalism that enhances our ability to write and maintain robust large grammars, by constraining productions with left/right contexts and/or nonterminal functions. Parsing is somewhat more expensive than for pure context-free parsing, but is still efficient by both theoretical and empirical analyses. Encouraging experimental results with our current grammar are described.
- Dekai WU and Xuanyin XIA. "Large-scale automatic extraction of
an English-Chinese lexicon". Machine Translation
9(3-4): 285-313. 1995.
We report experimental results on automatic extraction of an English-Chinese translation lexicon, by statistical analysis of a large parallel corpus, using limited amounts of linguistic knowledge. To our knowledge, these are the first empirical results of the kind between an Indo-European and non-Indo-European language for any significant vocabulary and corpus size. The learned vocabulary size is about 6,500 English words, achieving translation precision in the 86-96% range, with alignment proceeding at paragraph, sentence, and word levels.
Specifically, we report (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus, (2) experiments supporting the usefulness of restricted lexical cues for statistical paragraph and sentence alignment, and (3) experiments that question the role of hand-derived monolingual lexicons for automatic word translation acquitision.
Using a hand-derived monolingual lexicon, the learned translation lexicon averages 2.33 Chinese translations per English entry, with a manually-filtered precision of 95.1%, and an automatically-filtered weighted precision of 86.0%. We then introduce a fully automatic two-stage statistical methodology that is able to learn translations for collocations. A statistically-learned monolingual Chinese lexicon is first used to segment the Chinese text, before applying bilingual training to produce 6,429 English entries with 2.25 Chinese translations per entry. This method improves the manually-filtered precision to 96.0% and the automatically-filtered weighted precision to 91.0%, an error rate reduction of 35.7% from using a hand-derived monolingual lexicon.
- Dekai WU. "Stochastic
inversion transduction grammars, with application to segmentation,
bracketing, and alignment of parallel corpora". IJCAI-95: 14th
Intl. Joint Conf. on Artificial Intelligence, 1328-1335. Montreal:
Aug 1995.
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with potential application to a variety of parallel corpus analysis problems. The formalism combines three tactics against the constraints that render finite-state transducers less useful: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm. A convenient normal form is shown to exist, and we discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks.
- Dekai WU. "An algorithm
for simultaneously bracketing parallel texts by aligning words".
ACL-95: 33rd Annual Meeting of the Assoc. for Computational
Linguistics, 244-251. Cambridge, MA: Jun 1995.
We describe a grammarless method for simultaneously bracketing both halves of a parallel text and giving word alignments, assuming only a translation lexicon for the language pair. We introduce inversion-invariant transduction grammars which serve as generative models for parallel bilingual sentences with weak order constraints. Focusing on transduction grammars for bracketing, we formulate a normal form, and a stochastic version amenable to a maximum-likelihood bracketing algorithm. Several extensions and experiments are discussed.
- Dekai WU. "Trainable
coarse bilingual grammars for parallel text bracketing". WVLC-3:
3rd Annual Workshop on Very Large Corpora, 69-82. Cambridge, MA: Jun
1995.
Also in Susan ARMSTRONG, Kenneth W. CHURCH, Pierre ISABELLE, Sandra MANZI, Evelyne TZOUKERMANN, and David YAROWSKY (editors), Natural Language Processing Using Very Large Corpora. Dordrecht: Kluwer. ISBN 0-7923-6055-9. Nov 1999.
We describe two new strategies to automatic bracketing of parallel corpora, with particular application to languages where prior grammar resources are scarce: (1) coarse bilingual grammars, and (2) unsupervised training of such grammars via EM (expectation-maximization). Both methods build upon a formalism we recently introduced called stochastic inversion transduction grammars. The first approach borrows a coarse monolingual grammar into our bilingual formalism, in order to transfer knowledge of one language's constraints to the task of bracketing the texts in both languages. The second approach generalizes the inside-outside algorithm to adjust the grammar parameters so as to improve the likelihood of a training corpus. Preliminary experiments on parallel English-Chinese text are supportive of these strategies.
- Dekai WU. "Grammarless
extraction of phrasal translation examples from parallel texts".
TMI-95, Sixth International Conference on Theoretical and
Methodological Issues in Machine Translation, v2, 354-372. Leuven,
Belgium: Jul 1995.
We describe a method for identifying subsentential phrasal translation examples in sentence-aligned parallel corpora, using only a probabilistic translation lexicon for the language pair. Our method differs from previous approaches in that (1) it is founded on a formal basis, making use of an inversion transduction grammar (ITG) formalism that we recently developed for bilingual language modeling, and (2) it requires no language-specific monolingual grammars for the source and target languages. Instead, we devise a generic, language-independent constituent-matching ITG with inherent expressiveness properties that correspond to a desirable level of matching flexibility. Bilingual parsing, in conjunction with a stochastic version of the ITG formalism, performs the phrasal translation extraction.
- Dekai WU and Cindy NG. "Using brackets to improve
search for statistical machine translation". PACLIC-10, 10th
Pacific Asia Conference on Language, Information and Computation.
Hong Kong: Dec 1995.
We propose a method to improve search time and space complexity in statistical machine translation architectures, by employing linguistic bracketing information on the source language sentence. It is one of the advantages of the probabilistic formulation that competing translations may be compared and ranked by a principled measure, but at the same time, optimizing likelihoods over the translation space dictates heavy search costs. To make statistical architectures practical, heuristics to reduce search computation must be incorporated. An experiment applying our method to a prototype Chinese-English translation system demonstrates substantial improvement.
- Pascale FUNG and Dekai WU. "Coerced Markov Models for
cross-lingual lexical tag relations". TMI-95, Sixth International
Conference on Theoretical and Methodological Issues in Machine
Translation, v1, 240-255. Leuven, Belgium: Jul 1995.
We introduce the Coerced Markov Model (CMM) to model the relationship between the lexical sequence of a source language and the tag sequence of a target language, with the objective of constraining search in statistical transfer-based machine translation systems. CMMs differ from standard hidden Markov models in that state sequence assignments can take on values coerced from external sources. Given a Chinese sentence, a CMM can be used to predict the corresponding English tag sequence, thus constraining the English lexical sequence produced by a translation model. The CMM can also be used to score competing translation hypotheses in N-best models. Three fundamental problems for CMM designed are discussed. Their solutions lead to the training and testing stages of CMM.
- Eva FONG and Dekai WU. "Learning restricted
probabilistic link grammars". IJCAI-95 Workshop on New Approaches
to Learning for Natural Language Processing. Montreal: Aug 1995.
Also in Stefan WERMTER, Ellen RILOFF, Gabriele SCHELER (editors), Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, 173-187. 1996. Berlin: Springer-Verlag.
We describe a language model employing a new headed-disjuncts formulation of Lafferty's (1992) probabilistic link grammar, together with (1) an EM training method for estimating the probabilities, and (2) a procedure for learning some simple lexicalized grammar structures. The model in its simplest form is a generalization of n-gram models, but in its general form possesses context-free expressiveness. Unlike the original experiments on probabilistic link grammars, we assume that no hand-coded grammar is initially available (as with n-gram models). We employ untyped links to concentrate the learning on lexical dependencies, and our formulation uses the lexical identities of heads to influence the structure of the parse graph. After learning, the language model consists of grammatical rules in the form of a set of simple disjuncts for each word, plus several sets of probability parameters. The formulation extends cleanly toward learning more powerful context-free grammars. Several issues relating to generalization bias, linguistic constraints, and parameter smoothing are considered. Preliminary experimental results on small artificial corpora are supportive of our approach.
- Dekai WU and Pascale FUNG. "Improving Chinese tokenization
with linguistic filters on statistical lexical acquisition".
ANLP-94: 4th Conference on Applied Natural Language Processing,
180-181. Stuttgart: Oct 1994.
The first step in Chinese NLP is to tokenize or segment character sequences into words, since the text contains no word delimiters. Recent heavy activity in this area has shown the biggest stumbling block to be words that are absent from the lexicon, since successful tokenizers to date have been based on dictionary lookup (e.g., Chang & Chen 1993, Chiang et al. 1992, Lin et al. 1993, Wu & Tseng 1993, Sproat et al. 1994).
We present empirical evidence for four points concerning tokenization of Chinese text:
(1) More rigorous ``blind'' evaluation methodology is needed to avoid inflated accuracy measurements; we introduce the nk-blind method.
(2) The extent of the unknown-word problem is far more serious than generally thought, when tokenizing unrestricted texts in realistic domains.
(3) Statistical lexical acquisition is a practical means to greatly improve tokenization accuracy with unknown words, reducing error rates as much as 32.0%.
(4) When augmenting the lexicon, linguistic constraints can provide simple inexpensive filters yielding significantly better precision, reducing error rates as much as 49.4%. - Dekai WU and Xuanyin XIA. "Learning an English-Chinese
lexicon from a parallel corpus". AMTA-94: Assoc. for Machine
Translation, 206-213. Columbia, MD: Oct 1994.
We report experiments on automatic learning of an English-Chinese translation lexicon, through statistical training on a large parallel corpus. The learned vocabulary size is non-trivial at 6,517 English words averaging 2.33 Chinese translations per entry, with a manually-filtered precision of 95.1% and a single-most-probable precision of 91.2%. We then introduce a significance filtering method that is fully automatic, yet still yields a weighted precision of 86.0%. Learning of translations is adaptive to the domain. To our knowledge, these are the first empirical results of the kind between an Indo-European and non-Indo-European language for any significant corpus size with a non-toy vocabulary.
- Pascale FUNG and Dekai WU. "Statistical augmentation of a
Chinese machine-readable dictionary". WVLC-2: 2nd Annual Workshop
on Very Large Corpora, 69-85. Kyoto: Aug 1994.
Also in Susan ARMSTRONG, Kenneth W. CHURCH, Pierre ISABELLE, Sandra MANZI, Evelyne TZOUKERMANN, and David YAROWSKY (editors), Natural Language Processing Using Very Large Corpora. Dordrecht: Kluwer. ISBN 0-7923-6055-9. Nov 1999.
We describe a method of using statistically-collected Chinese character groups from a corpus to augment a Chinese dictionary. The method is particularly useful for extracting domain-specific and regional words not readily available in machine-readable dictionaries. Output was evaluated both using human evaluators and against a previously available dictionary. We also evaluated performance improvement in automatic Chinese tokenization. Results show that our method outputs legitimate words, acronymic constructions, idioms, names and titles, as well as technical compounds, many of which were lacking from the original dictionary.
- Dekai WU. "Aligning a
parallel English-Chinese corpus statistically with lexical criteria".
ACL-94: 32nd Annual Meeting of the Assoc. for Computational
Linguistics, 80-87. Las Cruces, NM: Jun 1994.
We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's (1991) length-based statistical method to the task of alignment involving a non-Indo-European language; and (3) an improved statistical method that also incorporates domain-specific lexical cues.
- Dekai WU. Aligning parallel English-Chinese texts
statistically with lexical criteria. Technical Report HKUST-CS93-9.
We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's (1991) length-based statistical method to the task of alignment involving a non-Indo-European language; and (3) an improved statistical method that also incorporates domain-specific lexical cues.
- Graeme HIRST and Dekai WU. "Not all reflexive reasoning is deductive". Behavioral and Brain Sciences 16(3): 462-463. 1993.
- Dekai WU. "Approximating maximum-entropy
ratings for evidential parsing and semantic interpretation".
IJCAI-93: 13th Intl. Joint Conf. on Artificial Intelligence,
1290-1296. Chamberry, France: Aug 1993.
We consider the problem of assigning probabilistic ratings to hypotheses in a natural language interpretation system. To facilitate integrating syntactic, semantic, and conceptual constraints, we allow a fully compositional frame representation, which permits co-indexed syntactic constituents and/or semantic entities filling multiple roles. In addition the knowledge base contains probabilistic information encoded by marginal probabilities on frames. These probabilities are used to specify typicality of real-world scenarios on one hand, and conventionality of linguistic usage patterns on the other. Because the theoretical maximum-entropy solution is infeasible in the general case, we propose an approximate method. This method's strengths are (1) its ability to rate compositional structures, and (2) its flexibility with respect to the inputs chosen by the system it is embedded in. Arbitrary sets of hypotheses from the front-end processor can be accepted, as well as arbitrary subsets of constraints heuristically chosen from the long-term knowledge base.
- Dekai WU. "Estimating
probability distributions over hypotheses with variable unification".
AAAI-93: 11th National Conf. on Artificial Intelligence,
790-795. Washington, D.C.: Jul 1993.
We analyze the difficulties in applying Bayesian belief networks to language interpretation domains, which typically involve many unification hypotheses that posit variable bindings. As an alternative, we observe that the structure of the underlying hypothesis space permits an approximate encoding of the joint distribution based on marginal rather than conditional probabilities. This suggests an implicit binding approach that circumvents the problems with explicit unification hypotheses, while still allowing hypotheses with alternative unifications to interact probabilistically. The proposed method accepts arbitrary subsets of hypotheses and marginal probability constraints, is robust, and is readily incorporated into standard unification-based and frame-based models.
- Dekai WU. "An
image-schematic system of thematic roles". PACLING-93: 1st Conf.
of the Pacific Association for Computational Linguistics, 323-332.
Vancouver: Apr 1993.
We describe a system of thematic roles and frames designed to address a number of problems in semantic representations at the lexical semantic level. Our primary objective is broad expressiveness, so that real domains can practically be encoded. However, for both empirical and computational reasons we limit the number of role types to four, allocating this structure to the strongest associations. We show how the system incorporates image-schematic semantics to encode various schematization operations relating to scales and reification.
- Andreas STOLCKE and Dekai WU. "Tree matching with
recursive distributed representations". AAAI 1992 Workshop on
Integrating Neural and Symbolic Processes---The Cognitive Dimension.
San Jose, CA: Jul 1992. Also available as ICSI Technical Report TR-92-025.
We present an approach to the structure unification problem using distributed representations of hierarchical objects. Binary trees are encoded using the recursive auto-association method (RAAM), and a unification network is trained to perform the tree matching operation on the RAAM representations. It turns out that this restricted form of unification can be learned without hidden layers and producing good generalization if we allow the error signal from the unification task to modify both the unification network and the RAAM representations themselves.
- Dekai WU. "Active acquisition of
user models: Implications for decision-theoretic dialog planning and plan
recognition". User Modeling and User-Adapted Interaction
1(2): 149-172. 1991.
This article investigates the implications of active user model acquisition upon plan recognition, domain planning, and dialog planning in dialog architectures. A dialog system performs active user model acquisition by querying the user during the course of the dialog. Existing systems employ passive strategies that rely on inferences drawn from passive observation of the dialog. Though passive acquisition generally reduces unnecessary dialog, in some cases the system can effectively shorten the overall dialog length by selectively initiating subdialogs for acquiring information about the user.
We propose a theory identifying conditions under which the dialog system should adopt active acquisition goals. Active acquisition imposes a set of rationality requirements not met by current dialog architectures. To ensure rational dialog decisions, we propose significant extensions to plan recognition, domain planning, and dialog planning models, incorporating decision-theoretic heuristics for expected utility. The most appropriate framework for active acquisition is a multi-attribute utility model wherein plans are compared along multiple dimensions of utility. We suggest a general architectural scheme, and present an example from a preliminary implementation.
- Dekai WU. "A probabilistic approach to marker propagation". IJCAI 1989. Detroit, MI. 574-582.
- Dekai WU. "Review of Natural Language Understanding". AI Magazine 10(1): 88-90 (1989).
- Robert WILENSKY, David N CHIN, Marc LURIA, James H MARTIN, James H
MAYFIELD, and Dekai WU. "The Berkeley UNIX
Consultant Project". Computational Linguistics 14(3):
35-84 (1988). Also available as UC
Berkeley Technical Report CSD-89-520.
UC (UNIX Consultant) is an intelligent, natural-language interface that allows naive users to learn about the UNIX operating system. UC was undertaken because the task was thought to be both a fertile domain for Artificial Intelligence research and a useful application of AI work in planning, reasoning, natural language processing, and knowledge representation.
- Dekai WU. "Concretion inferences in natural language understanding". GWAI 1987. Springer-Verlag. 74-83.
- Robert WILENSKY, James MAYFIELD, Anthony ALBERT, David CHIN, Charles
COX, Marc LURIA, James H MARTIN, and Dekai WU. UC---A Progress
Report. UC
Berkeley Technical Report CSD-87-303.
UC is an intelligent, natural-language interface that allows naive users to learn about the UNIX operating system. UC was undertaken because the task was thought to be both a fertile domain for Artificial Intelligence research and a useful application of AI work in planning, reasoning, natural language processing, and knowledge representation.
| Department of Computer Science | ![]() |
| The Hong Kong University of Science and Technology | |
| All rights reserved |


