Dekai WU
Associate Professor of Computer Science and Engineering, HKUST
Human Language Technology Center
Department of Computer Science and Engineering
The Hong Kong University of Science & Technology
HKUST, Clear Water Bay, Kowloon, Hong Kong
tel +852 2358-6989 · fax +852 2358-1477 · room 3539 (lifts 25/26)
lab +852 2358-8831 · room 2580 (lifts 27/28 and 29/30)
dekai@cs.ust.hk · http://www.cs.ust.hk/~dekai
Prof. Wu received his PhD in Computer Science from the University of California at Berkeley, and was a postdoctoral fellow at the University of Toronto (Ontario, Canada) prior to joining HKUST in 1992. He received his Executive MBA from Kellogg and HKUST in 2002, and a BS in Computer Engineering from the University of California at San Diego (Revelle College departmental award, cum laude, Phi Beta Kappa) in 1984. He has been a visiting researcher at Columbia University in 1995-96, Bell Laboratories in 1995, and the Technische Universität München (Munich, Germany) during 1986-87. Prof. Wu serves on the Editorial Board of AI Journal, Machine Translation, Journal of Natural Language Engineering, and Communications of COLIPS. He has also served as Co-Chair for EMNLP-2004, and on the Editorial Board of Computational Linguistics and as Associate Editor of ACM Transactions on Speech and Language Processing, the Organizing Committee of ACL-2000 and WVLC-5 (SIGDAT 1997), and the Executive Committee of the Association for Computational Linguistics (ACL).
Research interests
Statistical natural language processing; machine translation; cognitive models of human language and communication; machine learning and data mining; multilingual computing; language modeling; speech recognition; language acquisition; dialog systems; information retrieval; Internet information processing; knowledge management.
Human Language Technology Center (HLTC)
Current/recent activities
- SSST-3, Third Workshop on Syntax and Structure in Statistical Translation (NAACL HLT 2009 Workshop), 5 June 2009, Boulder, Colorado
- SSST-2, Second Workshop on Syntax and Structure in Statistical Translation (ACL-08: HLT Workshop), 20 June 2008, Columbus, Ohio
- SSST-1, Syntax and Structure in Statistical Translation (NAACL-HLT 2007 Workshop), 26 April 2007, Rochester, New York
- CLSP Workshop 2005, Translation by Parsing, July-August 2005, Johns Hopkins University, Center for Language and Speech Processing
- EMNLP-2004, 2004 Conference on Empirical Methods in Natural Language Processing (at ACL-04), 25-26 July 2004, Barcelona, Spain
- ACL-2000, 38th Annual Meeting of the Association for Computational Linguistics, 1-8 October 2000, Hong Kong
- WVLC-5, Fifth Workshop on Very Large Corpora, 18/20 August 1997, Tsinghua/HKUST
Teaching
- COMP326 (Introduction to Natural Language Processing), Spring 2010
- COMP221 (Fundamentals of Artificial Intelligence), Fall 2009
- CSIT523 (Knowledge Management), Summer 2009
- COMP151H (Object-Oriented Programming, Honors Study Track), Spring 2009
- COMP526 (Natural Language Processing), Fall 2008
- CSIT600G (Knowledge Management), Summer 2008
- COMP151H (Object-Oriented Programming, Honors Study Track), Spring 2008
- COMP251 (Introduction to Programming Languages), Fall 2007
- CSIT600G (Knowledge Management), Summer 2007
- COMP151 (Object-Oriented Programming), Spring 2007
- COMP526 (Natural Language Processing), Fall 2006
- COMP251 (Introduction to Programming Languages), Fall 2006
- COMP621N (Advanced Topics in AI), Spring 2006
- COMP151 (Object-Oriented Programming), Spring 2006
- CSIT600G (Knowledge Management), Fall 2005
- COMP621M (Advanced Topics in AI: Structural Statistical Machine Translation), Fall 2005
- COMP271 (Design and Analysis of Algorithms), Spring 2005
- COMP151 (Object-Oriented Programming), Fall 2004
- COMP621J (Advanced Topics in AI: Statistical Machine Translation), Spring 2004
- COMP526 (Natural Language Processing), Fall 2003
- COMP621H (Advanced Topics in AI: Machine Translation), Fall 2003
- COMP151 (Object-Oriented Programming), Spring 2003
- COMP171 (Data Structures and Algorithms), Fall 2002
Current research students
- Marine CARPUAT (PhD)
- Markus SAERS (PhD, co-advised with Uppsala Universitet, Sweden)
- Jackie LO Chi Kiu (MPhil/PhD)
- Tyler Barth (MPhil)
- Ken LEE Wing Kuen (MPhil 2005)
Selected publications
- Dekai WU.
"Toward Machine Translation with Statistics and Syntax and Semantics".
IEEE Automatic Speech Recognition and Understanding Workshop
(ASRU 2009). Merano, Italy: Dec 2009.
In this paper, we survey some central issues in the historical, current, and future landscape of statistical machine translation (SMT) research, taking as a starting point an extended three-dimensional MT model space. We posit a socio-geographical conceptual disparity hypothesis, that aims to explain why language pairs like Chinese-English have presented MT with so much more difficulty than others. The evolution from simple token-based to segment-based to tree-based syntactic SMT is sketched. For tree-based SMT, we consider language bias rationales for selecting the degree of compositional power within the hierarchy of expressiveness for transduction grammars (or synchronous grammars). This leads us to inversion transductions and the ITG model prevalent in current state-of-the-art SMT, along with the underlying ITG hypothesis, which posits a language universal. Against this backdrop, we enumerate a set of key open questions for syntactic SMT. We then consider the more recent area of semantic SMT. We list principles for successful application of sense disambiguation models to semantic SMT, and describe early directions in the use of semantic role labeling for semantic SMT.
- Anders SØGAARD and Dekai WU.
"Empirical lower bounds on translation unit error rate for the full class of inversion transduction grammars".
11th International Conference on Parsing Technologies (IWPT'09). Paris: Oct 2009. 33-36.
Empirical lower bounds studies in which the frequency of alignment configurations that cannot be induced by a particular formalism is estimated, have been important for the developmenet of syntax-based machine translation formalisms. The formalism that has received most attention has been inversion transduction grammars (ITGs) (Wu, 1997). All previous work on the coverage of ITGs, however, concerns parse failure rates (PFRs) or sentence level coverage, which is not directly related to any of the evaluation measures used in machine translation. Søgaard and Kuhn (2009) induce lower bounds on translation unit error rates (TUERs) for a number of formalisms, incl. normal form ITGs, but not for the full class of ITGs. Many of the alignment configurations that cannot be induced by normal form ITGs can be induced by unrestricted ITGs, however. This paper estimates the difference and shows that the average reduction in lower bounds on TUER is 2.48 in absolute difference (16.01 in average parse failure rate).
- Markus SAERS, Joakim NIVRE, and Dekai WU.
"Learning Stochastic Bracketing Inversion Transduction Grammars with a Cubic Time Biparsing Algorithm".
11th International Conference on Parsing Technologies (IWPT'09). Paris: Oct 2009. 29-32.
We present a biparsing algorithm for Stochastic Bracketing Inversion Transduction Grammars that runs in O(bn3) time instead of O(n6). Transduction grammars learned via an EM estimation procedure based on this biparsing algorithm are evaluated directly on the translation task, by building a phrase-based statistical MT system on top of the alignments dictated by Viterbi parses under the induced bigrammars. Translation quality at different levels of pruning are compared, showing improvements over a conventional word aligner even at heavy pruning levels.
- Anders SØGAARD and Dekai WU.
"Empirical lower bounds on translation unit error rate for the full class of inversion transduction grammars".
11th International Conference on Parsing Technologies (IWPT'09). Paris: Oct 2009. 33-36.
Empirical lower bounds studies in which the frequency of alignment configurations that cannot be induced by a particular formalism is estimated, have been important for the developmenet of syntax-based machine translation formalisms. The formalism that has received most attention has been inversion transduction grammars (ITGs) (Wu, 1997). All previous work on the coverage of ITGs, however, concerns parse failure rates (PFRs) or sentence level coverage, which is not directly related to any of the evaluation measures used in machine translation. Søgaard and Kuhn (2009) induce lower bounds on translation unit error rates (TUERs) for a number of formalisms, incl. normal form ITGs, but not for the full class of ITGs. Many of the alignment configurations that cannot be induced by normal form ITGs can be induced by unrestricted ITGs, however. This paper estimates the difference and shows that the average reduction in lower bounds on TUER is 2.48 in absolute difference (16.01 in average parse failure rate).
- Dekai WU and David CHIANG (editors). Proceedings of SSST-3, Third Workshop on Syntax and Structure in Statistical Translation. NAACL HLT 2009: Boulder, Colorado: Jun 2009. [website]
- Markus SAERS and Dekai WU. "Improving Phrase-Based Translation via Word Alignments from Stochastic Inversion Transduction Grammars".
Proceedings of
SSST-3, Third Workshop on Syntax and Structure in Statistical
Translation. NAACL HLT 2009: Boulder, Colorado: Jun 2009. 28-36.
We argue that learning word alignments through a compositionally-structured, joint process yields higher phrase-based translation accuracy than the conventional heuristic of intersecting conditional models. Flawed word alignments can lead to flawed phrase translations that damage translation accuracy. Yet the IBM word alignments usually used today are known to be flawed, in large part because IBM models (1) model reordering by allowing unrestricted movement of words, rather than constrained movement of compositional units, and therefore must (2) attempt to compensate via directed, asymmetric distortion and fertility models. The conventional heuristics for attempting to recover from the resulting alignment errors involve estimating two directed models in opposite directions and then intersecting their alignments – to make up for the fact that, in reality, word alignment is an inherently joint relation. A natural alternative is provided by Inversion Transduction Grammars, which estimate the joint word alignment relation directly, eliminating the need for any of the conventional heuristics. We show that this alignment ultimately produces superior translation accuracy on BLEU, NIST, and METEOR metrics over three distinct language pairs.
- Dekai WU and Pascale FUNG. "Semantic Roles for SMT:
A Hybrid Two-Pass Model".
Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2009).
Boulder, Colorado: Jun 2009.
We present results on a novel hybrid semantic SMT model that incorporates the strengths of both semantic role labeling and phrase-based statistical machine translation. The approach avoids major complexity limitations via a two-pass architecture. The first pass is performed using a conventional phrase-based SMT model. The second pass is performed by a re-ordering strategy guided by shallow semantic parsers that produce both semantic frame and role labels. Evaluation on a Wall Street Journal newswire genre test set showed the hybrid model to yield an improvement of roughly half a point in BLEU score over a strong pure phrase-based SMT baseline – to our knowledge, the first successful application of semantic role labeling to SMT.
- Dekai WU and Pascale FUNG. "Can Semantic Role Labeling
Improve SMT?".
13th Annual Conference of the European Association for Machine Translation (EAMT 2009).
Barcelona: May 2009. 218-225.
We present a series of empirical studies aimed at illuminating more precisely the likely contribution of semantic roles in improving statistical machine translation accuracy. The experiments reported study several aspects key to success: (1) the frequencies of types of SMT errors where semantic parsing and role labeling could help, and (2) if and where semantic roles offer more accurate guidance to SMT than merely syntactic annotation, and (3) the potential quantitative impact of realistic semantic role guidance to SMT systems, in terms of BLEU and METEOR scores.
- David CHIANG and Dekai WU (editors). Proceedings of SSST-2, Second Workshop on Syntax and Structure in Statistical Translation. ACL-08: HLT, Columbus, Ohio: Jun 2008. [website]
- Marine CARPUAT and Dekai WU. "Evaluation of Context-dependent Phrasal Translation Lexicons for Statistical Machine Translation". Sixth International Conference on Language Resources and Evaluation (LREC-2008). Marrakech:
May 2008.
We present new direct data analysis showing that dynamically-built context-dependent phrasal translation lexicons are more useful resources for phrase-based statistical machine translation (SMT) than conventional static phrasal translation lexicons, which ignore all contextual information. After several years of surprising negative results, recent work suggests that context-dependent phrasal translation lexicons are an appropriate framework to successfully incorporate Word Sense Disambiguation (WSD) modeling into SMT. However, this approach has so far only been evaluated using automatic translation quality metrics, which are important, but aggregate many different factors. A direct analysis is still needed to understand how context-dependent phrasal translation lexicons impact translation quality, and whether the additional complexity they introduce is really necessary. In this paper, we focus on the impact of context-dependent translation lexicons on lexical choice in phrase-based SMT and show that context-dependent lexicons are more useful to a phrase-based SMT system than a conventional lexicon. A typical phrase-based SMT system makes use of more and longer phrases with context modeling, including phrases that were not seen very frequently in training. Even when the segmentation is identical, the context-dependent lexicons yields translations that match references more often than conventional lexicons.
- Dekai WU. "WSD for Semantic SMT: Phrase Sense Disambiguation". Second Symposium on Innovations in Machine Translation Technologies (IMTT-2008). Tokyo: Mar 2008.
- Yihai SHEN, Chi-kiu LO, Marine CARPUAT and Dekai WU. "HKUST Statistical Machine Translation Experiments for IWSLT 2007". Fourth International Workshop on Spoken Language Translation
(IWSLT 2007). Trento:
Oct 2007. 84-88.
This paper describes experiments conducted at HKUST in the IWSLT 2007 evaluation campaign on spoken language translation. Our primary objective was to compare the open-source phrase-based statistical machine translation toolkit Moses against the closed-source Pharaoh. We focused on Chinese to English translation, but we also report results on the Arabic to English, Italian to English, and Japanese to English tasks.
- Marine CARPUAT and Dekai WU. "Context-Dependent Phrasal Translation Lexicons for Statistical Machine
Translation". Machine Translation Summit XI. Copenhagen:
Sep 2007.
Most current statistical machine translation (SMT) systems make very little use of contextual information to select a translation candidate for a given input language phrase. However, despite evidence that rich context features are useful in stand-alone translation disambiguation tasks, recent studies reported that incorporating context-rich approaches from Word Sense Disambiguation (WSD) methods directly into classic word-based SMT systems, surprisingly, did not yield the expected improvements in translation quality. We argue here that, instead, it is necessary to design a context-dependent lexicon that is specifically matched to a given phrase-based SMT model, rather than simply incorporating an independently built and tested WSD module. In this approach, the baseline SMT phrasal lexicon, which uses translation probabilities that are independent of context, is augmented with a context-dependent score, defined using insights from standalone translation disambiguation evaluations. This approach reliably improves performance on both IWSLT and NIST Chinese-English test sets, producing consistent gains on all eight of the most commonly used automated evaluation metrics. We analyze the behavior of the model along a number of dimensons, including an analysis confirming that the most important context features are not available in conventional phrase-based SMT models.
- Marine CARPUAT and Dekai WU. "How Phrase Sense Disambiguation outperforms Word Sense Disambiguation for
Statistical Machine Translation". 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI 2007). Skövde, Sweden:
Sep 2007. 43-52.
We present comparative empirical evidence arguing that a generalized phrase sense disambiguation approach better improves statistical machine translation than ordinary word sense disambiguation, along with a data analysis suggesting the reasons for this. Standalone word sense disambiguation, as exemplified by the Senseval series of evaluations, typically defines the target of disambiguation as a single word. But in order to be useful in statistical machine translation, our studies indicate that word sense disambiguation should be redefined to move beyond the particular case of single word targets, and instead to generalize to multi-word phrase targets. We investigate how and why the phrase sense disambiguation approach---in contrast to recent efforts to apply traditional word sense disambiguation to SMT---is able to yield statistically significant yimprovements in translation quality even under large data conditions, and consistently improve SMT across both IWSLT and NIST Chinese-English text translation tasks. We discuss architectural issues raised by this change of perspective, and consider the new model architecture necessitated by the phrase sense disambiguation approach.
- Pascale FUNG, Zhaojun WU, Yongsheng YANG and Dekai WU. "Learning Bilingual Semantic Frames:
Shallow Semantic Parsing vs. Semantic Role Projection". 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI 2007). Skövde, Sweden:
Sep 2007. 75-84.
To explore the potential application of semantic roles in structural machine translation, we propose to study the automatic learning of English-Chinese bilingual predicate argument structure mapping. We describe ARG_ALIGN, a new model for learning bilingual semantic frames that employs monolingual Chinese and English semantic parsers to learn bilingual semantic role mappings with 72.45% F-score, given an unannotated parallel corpus. We show that, contrary to a common preconception, our ARG_ALIGN model is superior to a semantic role projection model, SYN_ALIGN, which reaches only a 46.63% F-score by assuming semantic parallelism in bilingual sentences. We present experimental data explaining that this is due to cross-lingual mismatches between argument structures in English and Chinese at 17.24% of the time. This suggests that, in any potential application to enhance machine translation with semantic structural mapping, it may be preferable to employ independent automatic semantic parsers on source and target languages, rather than assuming semantic role parallelism.
- Marine CARPUAT and Dekai WU. "Improving Statistical Machine Translation using Word Sense Disambiguation". 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007). Prague:
Jun 2007. 61-72.
We show for the first time that incorporating the predictions of a word sense disambiguation system within a typical phrase-based statistical machine translation (SMT) model consistently improves translation quality across all three different IWSLT Chinese-English test sets, as well as producing statistically significant improvements on the larger NIST Chinese-English MT task---and moreover never hurts performance on any test set, according not only to BLEU but to all eight most commonly used automatic evaluation metrics. Recent work has challenged the assumption that word sense disambiguation (WSD) systems are useful for SMT. Yet SMT translation quality still obviously suffers from inaccurate lexical choice. In this paper, we address this problem by investigating a new strategy for integrating WSD into an SMT system, that performs fully phrasal multi-word disambiguation. Instead of directly incorporating a Senseval-style WSD system, we redefine the WSD task to match the exact same phrasal translation disambiguation task faced by phrase-based SMT systems. Our results provide the first known empirical evidence that lexical semantics are indeed useful for SMT, despite claims to the contrary.
- Dekai WU and David CHIANG (editors). Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation. Rochester, New York: Apr 2007. [website]
- Dekai WU. "MT model space: Statistical vs.
compositional vs. example-based machine translation".
Machine Translation (2005) 19: 213-227.
Springer Online:
http://dx.doi.org/10.1007/s10590-006-9009-3. Berlin: Springer.
We offer a perspective on EBMT from a statistical MT standpoint, by developing a three-dimensional MT model space based on three pairs of definitions: (1) logical versus statistical MT, (2) schema-based versus example-based MT, and (3) lexical versus compositional MT. Within this space we consider the interplay of three key ideas in the evolution of transfer, example-based, and statistical approaches to machine translation. We depict how all translation models face these issues in one way or another, regardless of the school of thought, and suggest where the real questions for the future may lie.
- Dekai WU, Marine CARPUAT, and Yihai SHEN. "Inversion Transduction Grammar Coverage of Arabic-English Word Alignment for Tree-Structured Statistical Machine Translation".
IEEE/ACL 2006 Workshop on Spoken Language Technology
(SLT 2006). Aruba: Dec 2006.
We present the first known direct measurement of word alignment coverage on an Arabic-English parallel corpus using inversion transduction grammar constraints. While direct measurements have been reported for several European and Asian languages, to date no results have been available for Arabic or any Semitic language despite much recent activity on Arabic-English spoken language and text translation. Many recent syntax based statistical MT models operate within the domain of ITG expressiveness, often for efficiency reasons, so it has become important to determine the extent to which the ITG constraint assumption holds. Our results on Arabic provide further evidence that ITG expressiveness appears largely sufficient for core MT models.
- Pascale FUNG, Zhaojun WU, Yongsheng YANG, and Dekai WU. "Automatic learning of Chinese-English semantic structure mapping".
IEEE/ACL 2006 Workshop on Spoken Language Technology
(SLT 2006). Aruba: Dec 2006.
We present twin results on Chinese semantic parsing, with application to English-Chinese cross-lingual verb frame acquisition. First, we describe two new state-of-the-art Chinese shallow semantic parsers leading to an F-score of 82.01 on simultaneous frame and argument boundary identification and labeling. Subsequently, we propose a model that applies the separate Chinese and English semantic parsers to learn cross-lingual semantic verb frame argument mappings with 89.3% accuracy. The only training data needed by this cross-lingual learning model is a pair of non-parallel monolingual Propbanks, plus an unannotated parallel corpus. We also present the first reported controlled comparison of maximum entropy and SVM approaches to shallow semantic parsing, using the Chinese data.
- Marine CARPUAT, Yihai SHEN, Xiaofeng YU, and Dekai WU. "Toward Integrating
Word Sense and Entity Disambiguation into Statistical Machine
Translation".
Third International Workshop on Spoken Language Translation
(IWSLT 2006). Kyoto:
Nov 2006.
[slides]
We describe a machine translation approach being designed at HKUST to integrate semantic processing into statistical machine translation, beginning with entity and word sense disambiguation. We show how integrating the semantic modules consistently improves translation quality across several data sets. We report results on five different IWSLT 2006 speech translation tasks, representing HKUST's first participation in the IWSLT spoken language translation evaluation campaign. We translated both read and spontaneous speech transcriptions from Chinese to English, achieving reasonable performance despite the fact that our system is essentially text-based and therefore not designed and tuned to tackle the challenges of speech translation. We also find that the system achieves reasonable results on a wide range of languages, by evaluating on read speech transcriptions from Arabic, Italian, and Japanese into English.
- Xiaofeng YU, Marine CARPUAT, and Dekai WU. "Boosting for
Chinese Named Entity Recognition".
Fifth SIGHAN Workshop of the Special Interest Group for Chinese
Language Processing (SIGHAN5) at COLING/ACL 2006. Sydney, Australia:
Jul 2006.
We report an experiment in which a high-performance boosting based NER model originally designed for multiple European languages is instead applied to the Chinese named entity recognition task of the third SIGHAN Chinese language processing bakeoff. Using a simple character-based model along with a set of features that are easily obtained from the Chinese input strings, the system described employs boosting, a promising and theoretically well-founded machine learning method to combine a set of weak classifiers together into a final system. Even though we did no other Chinese-specific tuning, and used only one-third of the MSRA and CityU corpora to train the system, reasonable results are obtained. Our evaluation results show that 75.07 and 80.51 overall F-measures were obtained on MSRA and CityU test sets respectively.
- Dekai WU and Ken Wing Kuen LEE. "A grammatical approach
to understanding textual tables using two-dimensional SCFGs".
21st International Conference on Computational Linguistics and
44th Annual Meeting of the Association for Computational Linguistics
(COLING/ACL 2006). Sydney, Australia: Jul 2006.
We present an elegant and extensible model that is capable of providing semantic interpretations for an unusually wide range of textual tables in documents. Unlike the few existing table analysis models, which largely rely on relatively ad hoc heuristics, our linguistically-oriented approach is systematic and grammar based, which allows our model (1) to be concise and yet (2) recognize a wider range of data models than others, and (3) disambiguate to a significantly finer extent the underlying semantic interpretation of the table in terms of data models drawn from relation database theory. To accomplish this, the model introduces Viterbi parsing under two-dimensional stochastic CFGs. The cleaner grammatical approach facilitates not only greater coverage, but also grammar extension and maintenance, as well as a more direct and declarative link to semantic interpretation, for which we also introduce a new, cleaner data model. In disambiguation experiments on recognizing relevant data models of unseen web tables from different domains, a blind evaluation of the model showed 60% precision and 80% recall.
- Dekai WU. "Textual
Entailment Recognition Based on Inversion Transduction Grammars".
In Joaquin QUIÑONERO CANDELA, Ido DAGAN, Bernardo MAGNINI, and
Florence d'ALCHÉ-BUC (editors),
"Machine Learning Challenges, Evaluating Predictive Uncertainty,
Visual Object Classification and Recognizing Textual Entailment",
Lecture Notes in Computer Science (2006) 3944: 299-308.
Springer Online:
http://dx.doi.org/10.1007/11736790_17. Berlin: Springer.
The PASCAL Challenge's textual entailment recognition (RTE) task presents intriguing opportunities to test various implications of the strong language universal constraint posited by Wu's (1995, 1997) Inversion Transduction Grammar (ITG) hypothesis. The ITG Hypothesis provides a strong inductive bias, and has been repeatedly shown empirically to yield both efficiency and accuracy gains for numerous language acquisition tasks. Since the RTE challenge abstracts over many tasks, it invites meaningful analysis of the ITG Hypothesis across tasks including information retrieval, comparable documents, reading comprehension, question answering, information extraction, machine translation, and paraphrase acquisition. We investigate two new models for the RTE problem that employ simple generic Bracketing ITGs. Experimental results show that, even in the absence of any thesaurus to accommodate lexical variation between the Text and the Hypothesis strings, surprisingly strong results for a number of the task subsets are obtainable from the Bracketing ITG's structure matching bias alone.
- Dekai WU and Pascale FUNG. "Inversion Transduction Grammar Constraints for Mining Parallel
Sentences from Quasi-Comparable Corpora".
Second International Joint Conference on
Natural Language Processing (IJCNLP-2005). Jeju, Korea: Oct 2005.
We present a new implication of Wu's (1997) Inversion Transduction Grammar (ITG) Hypothesis, on the problem of retrieving truly parallel sentence translations from large collections of highly non-parallel documents. Our approach leverages a strong language universal constraint posited by the ITG Hypothesis, that can serve as a strong inductive bias for various language learning problems, resulting in both efficiency and accuracy gains. The task we attack is highly practical since non-parallel multilingual data exists in far greater quantities than parallel corpora, but parallel sentences are a much more useful resource. Our aim here is to mine truly parallel sentences, as opposed to comparable sentence pairs or loose translations as in most previous work. The method we introduce exploits Bracketing ITGs to produce the first known results for this problem. Experiments show that it obtains large accuracy gains on this task compared to the expected performance of state-of-the-art models that were developed for the less stringent task of mining comparable sentence pairs.
- Marine CARPUAT and Dekai WU. "Evaluating the Word
Sense Disambiguation Performance of Statistical Machine Translation".
Second International Joint Conference on
Natural Language Processing (IJCNLP-2005). Jeju, Korea: Oct 2005.
We present the first known empirical test of an increasingly common speculative claim, by evaluating a representative Chinese-to-English SMT model directly on word sense disambiguation performance, using standard WSD evaluation methodology and datasets from the Senseval-3 Chinese lexical sample task. Much effort has been put in designing and evaluating dedicated word sense disambiguation (WSD) models, in particular with the Senseval series of workshops. At the same time, the recent improvements in the BLEU scores of statistical machine translation (SMT) suggests that SMT models are good at predicting the right translation of the words in source language sentences. Surprisingly however, the WSD accuracy of SMT models has never been evaluated and compared with that of the dedicated WSD models. We present controlled experiments showing the WSD accuracy of current typical SMT models to be significantly lower than that of all the dedicated WSD models considered. This tends to support the view that despite recent speculative claims to the contrary, current SMT models do have limitations in comparison with dedicated WSD models, and that SMT should benefit from the better predictions made by the WSD models.
- Marine CARPUAT and Dekai WU. "Word Sense Disambiguation
vs. Statistical Machine Translation". 43rd Annual Meeting of the
Association for Computational Linguistics (ACL-2005). Ann Arbor, MI:
Jun 2005.
We directly investigate a subject of much recent debate: do word sense disambigation models help statistical machine translation quality? We present empirical results casting doubt on this common, but unproved, assumption. Using a state-of-the-art Chinese word sense disambiguation model to choose translation candidates for a typical IBM statistical MT system, we find that word sense disambiguation does not yield significantly better translation quality than the statistical machine translation system alone. Error analysis suggests several key factors behind this surprising finding, including inherent limitations of current statistical MT architectures.
- Dekai WU. "Recognizing
Paraphrases and Textual Entailment using Inversion Transduction Grammars".
ACL-2005 Workshop on Empirical Modeling of Semantic Equivalence and Entailment. Ann Arbor, MI: Jun 2005.
We present first results using paraphrase as well as textual entailment data to test the language universal constraint posited by Wu's (1995, 1997) Inversion Transduction Grammar (ITG) hypothesis. In machine translation and alignment, the ITG Hypothesis provides a strong inductive bias, and has been shown empirically across numerous language pairs and corpora to yield both efficiency and accuracy gains for various language acquisition tasks. Monolingual paraphrase and textual entailment recognition datasets, however, potentially facilitate closer tests of certain aspects of the hypothesis than bilingual parallel corpora, which simultaneously exhibit many irrelevant dimensions of cross-lingual variation. We investigate this using simple generic Bracketing ITGs containing no language-specific linguistic knowledge. Experimental results on the MSR Paraphrase Corpus show that, even in the absence of any thesaurus to accommodate lexical variation between the paraphrases, an uninterpolated average precision of at least 76% is obtainable from the Bracketing ITG's structure matching bias alone. This is consistent with experimental results on the Pascal Recognising Textual Entailment Challenge Corpus, which show surpisingly strong results for a number of the task subsets.
- Dekai WU. "Textual
Entailment Recognition Based on Inversion Transduction Grammars".
Pattern Analysis, Statistical Modelling and Computational Learning
(PASCAL Challenges Workshop - Recognising Textual Entailment
Challenge). Southampton, UK: Apr 2005.
Also in Joaquin QUIÑONERO CANDELA, Ido DAGAN, Bernardo MAGNINI, and Florence d'ALCHÉ-BUC (editors), Machine Learning Challenges, Lecture Notes in Computer Science 3944, MLCW 2005, 2006. Heidelberg: Springer-Verlag.
The PASCAL Challenge's textual entailment recognition (RTE) task presents intriguing opportunities to test various implications of the strong language universal constraint posited by Wu's (1995, 1997) Inversion Transduction Grammar (ITG) hypothesis. The ITG Hypothesis provides a strong inductive bias, and has been repeatedly shown empirically to yield both efficiency and accuracy gains for numerous language acquisition tasks. Since the RTE challenge abstracts over many tasks, it invites meaningful analysis of the ITG Hypothesis across tasks including information retrieval, comparable documents, reading comprehension, question answering, information extraction, machine translation, and paraphrase acquisition. We investigate two new models for the RTE problem that employ simple generic Bracketing ITGs. Experimental results show that, even in the absence of any thesaurus to accommodate lexical variation between the Text and the Hypothesis strings, surprisingly strong results for a number of the task subsets are obtainable from the Bracketing ITG's structure matching bias alone.
- Pascale FUNG, LIU Yi, YANG Yongsheng, Yihai SHEN, and Dekai WU.
"A
Grammar-Based Chinese to English Speech Translation System for Portable
Devices". 8th International Conference on Spoken Language
Processing (INTERSPEECH 2004 - ICSLP). Jeju, Korea: Oct 2004.
Portable devices such as PDA phones and smart phones are increasingly popular. Many of these devices already have voice dialing capability. The next step is to offer more powerful personal-assistant features such as speech translation. In this paper, we propose a system that can translate speech commands in Chinese into English, in real-time, on small, portable devices with limited memory and computational power. We address the various computational and platform issues of speech recognition and translation on portable devices. We propose fixed-point computation, discrete front-end speech features, bi-phone acoustic models, grammar-based speech decoding, and unambiguous inversion transduction grammars for transfer-based translation. As a result, our speech translation system requires only 500k memory and a 200MHz CPU.
- Dekai WU, Grace NGAI, and Marine CARPUAT. "Why Nitpicking
Works: Evidence for Occam's Razor in Error Correctors". 20th
International Conference on Computational Linguistics (COLING-2004).
Geneva: Aug 2004.
Empirical experience and observations have shown us when powerful and highly tunable classifiers such as maximum entropy classifiers, boosting and SVMs are applied to language processing tasks, it is possible to achieve high accuracies, but eventually their performances all tend to plateau out at around the same point. To further improve performance, various error correction mechanisms have been developed, but in practice, most of them cannot be relied on to predictably improve performance on unseen data; indeed, depending upon the test set, they are as likely to degrade accuracy as to improve it. This problem is especially severe if the base classifier has already been finely tuned. In recent work, we introduced N-fold Templated Piped Correction, or NTPC (``nitpick''), an intriguing error corrector that is designed to work in these extreme operating conditions. Despite its simplicity, it consistently and robustly improves the accuracy of existing highly accurate base models. This paper investigates some of the more surprising claims made by NTPC, and presents experiments supporting an Occam's Razor argument that more complex models are damaging or unnecessary in practice.
- Weifeng SU, Marine CARPUAT, and Dekai WU. "Semi-Supervised
Training of a Kernel PCA-Based Model for Word Sense Disambiguation".
20th International Conference on Computational Linguistics
(COLING-2004). Geneva: Aug 2004.
In this paper, we introduce a new semi-supervised learning model for word sense disambiguation based on Kernel Principal Component Analysis (KPCA), with experiments showing that it can further improve accuracy over supervised KPCA models that have achieved WSD accuracy superior to the best published individual models. Although empirical results with supervised KPCA models demonstrate significantly better accuracy compared to the state-of-the-art achieved by either naive Bayes or maximum entropy models on Senseval-2 data, we identify specific sparse data conditions under which supervised KPCA models deteriorate to essentially a most-frequent-sense predictor. We discuss the potential of KPCA for leveraging unannotated data for partially-unsupervised training to address these issues, leading to a composite model that combines both the supervised and semi-supervised models.
- Dekai WU, Weifeng SU, and Marine CARPUAT. "A Kernel PCA Method for
Superior Word Sense Disambiguation". 42nd Annual Meeting of the
Association for Computational Linguistics (ACL-2004). Barcelona: Jul
2004.
We introduce a new method for disambiguating word senses that exploits a nonlinear Kernel Principal Component Analysis (KPCA) technique to achieve accuracy superior to the best published individual models. We present empirical results demonstrating significantly better accuracy compared to the state-of-the-art achieved by either naive Bayes or maximum entropy models, on Senseval-2 data. We also contrast against another type of kernel method, the support vector machine (SVM) model, and show that our KPCA-based model outperforms the SVM-based model. It is hoped that these highly encouraging first results on KPCA for natural language processing tasks will inspire further development of these directions.
- Dekai WU and Yihai SHEN. "An Efficient
Algorithm to Induce Minimum Average Lookahead Grammars for Incremental LR
Parsing". ACL-2004 Workshop on Incremental Parsing: Bringing
Engineering and Cognition Together. Barcelona: Jul 2004.
We define a new learning task, minimum average lookahead grammar induction, with strong potential implications for incremental parsing in NLP and cognitive models. Our thesis is that a suitable learning bias for grammar induction is to minimize the degree of lookahead required, on the underlying tenet that language evolution drove grammars to be efficiently parsable in incremental fashion. The input to the task is an unannotated corpus, plus a nondeterministic constraining grammar that serves as an abstract model of environmental constraints confirming or rejecting potential parses. The constraining grammar typically allows ambiguity and is itself poorly suited for an incremental parsing model, since it gives rise to a high degree of nondeterminism in parsing. The learning task, then, is to induce a deterministic LR(k) grammar under which it is possible to incrementally construct one of the correct parses for each sentence in the corpus, such that the average degree of lookahead needed to do so is minimized. This is a significantly more difficult optimization problem than merely compiling LR(k) grammars, since kis not specified in advance. Clearly, naive approaches to this optimization can easily be computationally infeasible. However, by making combined use of GLR ancestor tables and incremental LR table construction methods, we obtain an O(n3+2m) greedy approximation algorithm for this task that is quite efficient in practice.
- Marine CARPUAT, Weifeng SU, and Dekai WU. "Augmenting Ensemble
Classification for Word Sense Disambiguation with a Kernel PCA
Model". Third International Workshop on the Evaluation of Systems
for the Semantic Analysis of Text (Senseval-3). ACL-2004 Workshop.
Barcelona: Jul 2004.
The HKUST word sense disambiguation systems benefit from a new nonlinear Kernel Principal Component Analysis (KPCA) based disambiguation technique. We discuss and analyze results from the Senseval-3 English, Chinese, and Multilingual Lexical Sample data sets. Among an ensemble of four different kinds of voted models, the KPCA-based model, along with the maximum entropy model, outperforms the boosting model and naive Bayes model. Interestingly, while the KPCA-based model typically achieves close or better accuracy than the maximum entropy model, nevertheless a comparison of predicted classifications shows that it has a significantly different bias. This characteristic makes it an excellent voter, as confirmed by results showing that removing the KPCA-based model from the ensemble generally degrades performance.
- Grace NGAI, Dekai WU, Marine CARPUAT, Chi-Shing WANG, and
Chi-Yung WANG. "Semantic Role
Labeling with Boosting, SVMs, Maximum Entropy, SNOW, and Decision
Lists". Third International Workshop on the Evaluation of Systems
for the Semantic Analysis of Text (Senseval-3). ACL-2004 Workshop.
Barcelona: Jul 2004.
This paper describes the HKPolyU-HKUST systems which were entered into the Semantic Role Labeling task in Senseval-3. Results show that these systems, which are based upon common machine learning algorithms, all manage to achieve good performances on the non-restricted Semantic Role Labeling task.
- Richard WICENTOWSKI, Grace NGAI, Dekai WU, Marine CARPUAT, Emily
THOMFORDE, and Adrian PACKEL. "Joining
forces to resolve lexical ambiguity: East meets West in Barcelona".
Third International Workshop on the Evaluation of Systems for the
Semantic Analysis of Text (Senseval-3). ACL-2004 Workshop.
Barcelona: Jul 2004.
This paper describes the component models and combination model built as a joint effort between Swarthmore College, Hong Kong Poly U, and HKUST. Though other models described elsewhere contributed to the final combination model, this paper focuses solely on the joint contributions to the ``Swat-HK'' effort.
- Dekai WU, Grace NGAI, and Marine CARPUAT. "Raising the Bar:
Stacked Conservative Error Correction Beyond Boosting". Fourth
International Conference on Language Resources and Evaluation
(LREC-2004). Lisbon: May 2004.
We introduce a conservative error correcting model, Stacked TBL, that is designed to improve the performance of even high-performing models like boosting, with little risk of accidentally degrading performance. Stacked TBL is particularly well suited for corpus-based natural language applications involving high-dimensional feature spaces, since it leverages the characteristics of the TBL paradigm that we appropriate. We consider here the task of automatically annonating named entities in text corpora. The task does pose a number of challenges for TBL, to which there are some simple yet effective solutions. We discuss the empirical behavior of Stacked TBL, and consider evidence that despite its simplicity, more complex and time-consuming variants are not generally required.
- Lufeng ZHAI, Pascale FUNG, Richard SCHWARTZ, Marine CARPUAT and
Dekai WU. "Using
N-best Lists for Named Entity Recognition from Chinese Speech".
Human Language Technology Conference of the North American Chapter of
the Association for Computational Linguistics (HLT/NAACL-2004).
Boston: May 2004.
We present the first known result for named entity recognition (NER) in realistic large-vocabulary spoken Chinese. We establish this result by applying a maximum entropy model, currently the single best known approach for textual Chinese NER, to the recognition output of the BBN LVCSR system on Chinese Broadcast News utterances. Our results support the claim that transferring NER approaches from text to spoken language is a significantly more difficult task for Chinese than for English. We propose re-segmenting the ASR hypotheses as well as applying post-classification to improve the performance. Finally, we introduce a method of using n-best hypotheses that yields a small but nevertheless useful improvement NER accuracy. We use acoustic, phonetic, language model, NER and other scores as confidence measure. Experimental results show an average of 6.7% relative improvement in precision and 1.7% relative improvement in F-measure.
- Dekai WU, Grace NGAI, and Marine CARPUAT. "N-fold Templated
Piped Correction". First International Joint Conference on
Natural Language Processing (IJCNLP-2004). Hainan, China: Mar 2004.
We describe a broadly-applicable conservative error correcting model, N-fold Templated Piped Correction (NTPC), that consistently improves the accuracy of existing high-accuracy base models. Under circumstances where most obvious approaches actually reduce accuracy more than they improve it, NTPC nevertheless comes with little risk of accidentally degrading performance. NTPC is particularly well suited for natural language applications involving high-dimensional feature spaces, such as bracketing and disambiguation tasks, since its easily customizable template-driven learner allows efficient search over the kind of complex feature combinations that have typically eluded the base models. We show empirically that NTPC yields small but consistent accuracy gains on top of even high-performing models like boosting. We also give evidence that the various extreme design parameters in NTPC are indeed necessary for the intended operating range, even though they diverge from usual practice.
- Dekai WU. "The HKUST leading
question translation system". Machine Translation Summit IX. New Orleans:
Sep 2003.
Slides from Have we found the Holy Grail? (Panel with Ed Hovy, Elliot Macklovitch (chair), Hermann Ney, Steve Richardson, and Dekai Wu.)
- Dekai WU, Grace NGAI, and Marine CARPUAT. "A stacked, voted,
stacked model for named entity recognition". Computational
Natural Language Learning (CoNLL-2003), at Human Language
Technology Conference of the North American Chapter of the Association of
Computational Linguistics (HLT/NAACL-2003). Edmonton, Canada: May
2003.
This paper investigates stacking and voting methods for combining strong classifiers like boosting, SVM, and TBL, on the named-entity recognition task. We demonstrate several effective approaches, culminating in a model that achieves error rate reductions on the development and test sets of 63.6% and 55.0% (English) and 47.0% and 51.7% (German) over the CoNLL-2003 standard baseline respectively, and 19.7% over a strong AdaBoost baseline model from CoNLL-2002.
- Dekai WU, Grace NGAI, Marine CARPUAT, Jeppe LARSEN, and Yongshen
YANG. "Boosting for named
entity recognition". Computational Natural Language Learning
(CoNLL-2002), at 19th International Conference on Computational
Linguistics (Coling-2002), 195-198. Taipei: Sep 2002.
This paper presents a system that applies boosting to the task of named-entity identification. The CoNLL-2002 shared task, for which the system is designed, is language-independent named-entity recognition. Using a set of features which are easily obtainable for almost any language, the presented system uses boosting to combine a set of weak classifiers into a final system that performs significantly better than that of an off-the-shelf maximum entropy classifier.
- Robert WILENSKY, David N CHIN, Marc LURIA, James MARTIN, James
MAYFIELD, and Dekai WU. "The Berkeley UNIX Consultant Project". In
Stephen J HEGNER, Paul McKEVITT, Peter NORVIG, and Robert WILENSKY
(editors), Intelligent Help Systems for Unix. 49-94. Dordrecht:
Kluwer. ISBN 0-7923-6641-7. May 2001.
Also in Artificial Intelligence Review 14(1-2): 43-88 (2000).
UC (UNIX Consultant) is an intelligent, natural-language interface that allows naive users to learn about the UNIX operating system. UC was undertaken because the task was thought to be both a fertile domain for Artificial Intelligence research and a useful application of AI work in planning, reasoning, natural language processing, and knowledge representation.
- Dekai WU. "Bracketing and
aligning words and constituents in parallel text using Stochastic
Inversion Transduction Grammars". In Jean VERONIS (editor),
Parallel Text Processing: Alignment and Use of Translation
Corpora. Dordrecht: Kluwer. ISBN 0-7923-6546-1. Aug 2000.
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major features distinguish the formalism from the finite-state transducers more traditionally found in computational linguistics: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm. A convenient normal form is shown to exist. Analysis of the formalism's expressiveness suggests that it is particularly well-suited to model ordering shifts between languages, balancing needed flexibility against complexity constraints. We discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks such as segmentation, bracketing, phrasal alignment, and parsing.
- Dekai WU. "Alignment". In Robert DALE, Hermann MOISL, and Harold
SOMERS (editors), Handbook of Natural Language Processing.
415-458. New York: Marcel Dekker. ISBN 0-8247-9000-6. Jul 2000.
In this chapter we discuss the work done on automatic alignment of parallel texts for various purposes. Fundamentally, an alignment algorithm accepts as input a bitext, and produces as output a map that identifies corresponding passages between the texts. A rapidly-growing body of research on bitext alignment, beginning around 1990, attests to the importance of alignment to translators, bilingual lexicographers, adaptive machine translation systems, and even ordinary readers. A wide variety of techniques now exist, ranging from the most simple (counting characters or words) to the more sophisticated, sometimes involving linguistic data (lexicons) which may or may not have been automatically induced themselves. Techniques have been developed for aligning passages of various granularities: paragraphs, sentences, constituents, collocations, and words. Some techniques make use of apparent morphological features. Others rely on cognates and loan-words. Of particular interest is work done on languages which do not have a common writing system. The robustness and generality of different techniques has generated much discussion.
- SUI Zhifang, ZHAO Jun, and Dekai WU. "An
Information-Theory-Based Feature Type Analysis for the Modelling of
Statistical Parsing". ACL-2000. Hong Kong: Oct 2000.
The paper proposes an information-theory-based method for feature types analysis in probabilistic evaluation modelling for statistical parsing. The basic idea is that we use entropy and conditional entropy to measure whether a feature type grasps some of the information for syntactic structure prediction. Our experiment quantitatively analyzes several feature types' power for syntactic structure prediction and draws a series of interesting conclusions.
- Yanlei DIAO, Hongjun LU, and Dekai WU. "A comparative study of
classification based personal e-mail filtering". Fourth
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD
2000): 408-419. Kyoto: Apr 2000.
This paper addresses personal E-mail filtering by casting it in the framework of text classification. Modeled as semi-structured documents, E-mail messages consist of a set of fields with predefined semantics and a number of variable length free-text fields. While most work on classification either concentrates on structured data or free text, the work in this paper deals with both of them. To perform classification, a naive Bayesian classifier was designed and implemented, and a decision tree based classifier was implemented. The design considerations and implementation issues are discussed. Using a relatively large amount of real personal E-mail data, comprehensive comparative study was conducted using the two classifiers. The importance of different features is reported. Results of other issues related to building an effective personal E-mail classifier are presented and discussed. It is shown that both classifiers can perform filtering with reasonable accuracy. While the decision tree based classifier outperforms the Bayesian classifier when features and training size are selected optimally for both, a carefully designed naive Bayesian classifier is more robust.
- Dekai WU, SUI Zhifang, and ZHAO Jun. "An information-based method for
selecting feature types for word prediction". Sixth European
Conference on Speech Communication and Technology (EUROSPEECH'99).
Budapest: Sep 1999.
This paper uses an information-based approach to conduct feature types selection for language modeling in a systematic manner. We describe a quantitative analysis of the information gain and the information redundancy for various combinations of feature types inspired by both dependency structure and bigram structure through analyzing an English treebank corpus and taking word prediction as the object. The experiments yield several conclusions on the predictive value of several feature types and feature types combinations for word prediction, which are expected to provide reliable reference for feature type selection in language modeling.
- Aboy WONG and Dekai WU. "Learning a lightweight robust
deterministic parser". Sixth European Conference on Speech
Communication and Technology (EUROSPEECH'99). Budapest: Sep 1999.
We describe a method for automatically learning a parser from labeled, bracketed corpora that results in a fast, robust, lightweight parser that is suitable for real-time dialog systems and similar applications. Unlike ordinary parsers, all grammatical knowledge is captured in the learned decision trees, so no explicit phrase-structure grammar is needed. Another characteristic of the architecture is robustness, since the input need not fit pre-specified productions. The runtime architecture is very slim and references two learned decision trees that allow the parser to operate in a "strictly deterministic" manner in Marcus' (1977) sense. Even without using specific lexical features, we have achieved respectable labeled bracket accuracies of about 81% precision and 82% recall. Processing speed is more than 500 words per CPU second. We keep the parameter space small (in comparison to other statistically learned parsers) by using only part-of-speech tags and constituent labels as features for learning the decision trees. Without any optimization, the decision trees consume only 6M of memory, making it possible to run on platforms with limited memory. The learning method is readily applicable to other languages. Preliminary experiments on a Chinese corpus (which contains about 3000 sentences from Chinese primary school text) have yielded results comparable to that for English.
- Vincent CHOW and Dekai WU. "On the use of right context in
sense-disambiguating language models". Sixth European Conference
on Speech Communication and Technology (EUROSPEECH'99). Budapest:
Sep 1999.
We investigate the utility of right-context (look-ahead information) in incremental left-to-right language models with word sense disambiguation, and discover somewhat unexpectedly that using right-context in addition to left-context (history) may actually reduce accuracy. We describe a left-to-right incremental naive-Bayes sense disambiguator, and then experimentally evaluate three apparently well-motivated extensions to take into account right-context information. The results argue that the contribution of right-context is limited, and that using it would probably necessitate sacrificing pure left-to-right processing.
- Shuwu ZHANG, Harald SINGER, Dekai WU, Yoshinori SAGISAKA. "Improving n-gram modeling using
distance-related unit association maximum entropy language modeling".
Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99). Budapest: Sep 1999.
In this paper, a distance-related unit association maximum entropy (DUAME) language modeling is proposed. This approach can model an event (unit subsequence) using the co-occurrence of full distance unit association (UA) features so that it is able to pursue a functional approximation to higher order N-gram with significantly less memory requirement. A smoothing strategy related to this modeling will also be discussed. Preliminary experimental results have shown that DUAME modeling is comparable to conventional N-gram modeling in perplexity.
- Daniel CHAN Ka-Leung and Dekai WU. "Automatically merging
lexicons that have incompatible part-of-speech categories". Joint
SIGDAT Conference on Empirical Methods in Natural Language Processing and
Very Large Corpora (EMNLP/VLC-99). Maryland: Jun 1999.
We present a new method to automatically merge lexicons that employ different incompatible POS categories. Such incompatibilities have hindered efforts to combine lexicons to maximize coverage with reasonable human effort. Given an "original lexicon", our method is able to merge lexemes from an "additional lexicon" into the original lexicon, converting lexemes from the additional lexicon with about 89% precision. This level of precision is achieved with the aid of a device we introduce called an anti-lexicon, which neatly summarizes all the essential information we need about the co-occurrence of tags and lemmas. Our model is intuitive, fast, easy to implement, and does not require heavy computational resources nor training corpus.
- Dekai WU, ZHAO Jun, and SUI Zhifang. "An
information-theoretic empirical analysis of dependency-based feature
types for word prediction models". Joint SIGDAT Conference on
Empirical Methods in Natural Language Processing and Very Large Corpora
(EMNLP/VLC-99). Maryland: Jun 1999.
Over the years, many proposals have been made to incorporate assorted types of feature in language models. However, discrepancies between training sets, evaluation criteria, algorithms, and hardware environments make it difficult to compare the models objectively. In this paper, we take an information theoretic approach to select feature types in a systematic manner. We describe a quantitative analysis of the information gain and the information redundancy for various combinations of feature types inspired by both dependency structure and bigram structure, using a Chinese treebank and taking word prediction as the object. The experiments yield several conclusions on the predictive value of several feature types and feature types combinations for word prediction, which are expected to provide guidelines for feature type selection in language modeling.
- Aboy WONG and Dekai WU. "Are phrase structured
grammars useful in statistical parsing?". NLPRS 1999.
Beijing: Nov 1999.
In this paper, we argue: (1) To parse accurately, a grammar is not necessary. (2) It is possible to parse deterministically by not conforming to an explicit grammar. We support the above claims by presenting our parser, which is lightweight, grammar-less, deterministic and have the highest accuracy among tag based parsers. The speed of our parser is more than 500 words per CPU second and only 6M of memory is needed for loading the parsing model. In our architecture, the grammatical information is captured by the parsing model. Our parsing model differs from others in that, extra information about how to group constituents are provided. Thus an explicit grammar is not needed in our algorithm.
- Michael CARL and Dekai WU. "Inferring maximally
invertible bi-grammars for example-based machine translation".
NLPRS 1999. Beijing: Nov 1999.
This paper discusses inference strategies of context-free bi-grammars for example based machine translation (EBMT). The EBMT system EDGAR is discussed in detail. The notion of invertible contextfree feature bi-grammar is introduced in order to provide a means to decide upon the degree of ambiguity of the inferred bi-grammar. It is claimed that a maximally invertible bi-grammar can enhance the precision of the bilingual alignment process, reduce the complexity of the inferred grammar, and uncover inconsistencies in bi-corpora. This paper describes preliminary reflections and thus no empirical evaluation of the method is provided.
- Dekai WU and Hongsing WONG. "Machine translation with a
stochastic grammatical channel". COLING-ACL'98. Montreal:
Aug 1998.
We introduce a stochastic grammatical channel model for machine translation, that synthesizes several desirable characteristics of both statistical and grammatical machine translation. As with the pure statistical translation model described by Wu (1996) (in which a bracketing transduction grammar models the channel), alternative hypotheses compete probabilistically, exhaustive search of the translation hypothesis space can be performed in polynomial time, and robustness heuristics arise naturally from a language-independent inversion-transduction model. However, unlike pure statistical translation models, the generated output string is guaranteed to conform to a given target grammar. The model employs only (1) a translation lexicon, (2) a context-free grammar for the target language, and (3) a bigram language model. The fact that no explicit bilingual translation rules are used makes the model easily portable to a variety of source languages. Initial experiments show that it also achieves significant speed gains over our earlier model.
- Dekai WU. "A position statement on Chinese segmentation". Presented at the Chinese Language Processing Workshop, University of Pennsylvania, Philadelphia, Jul 1998.
- Dekai WU. "Stochastic inversion
transduction grammars and bilingual parsing of parallel corpora".
Computational Linguistics 23(3):377-404, Sep 1997.
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major features distinguish the formalism from the finite-state transducers more traditionally found in computational linguistics: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm. A convenient normal form is shown to exist. Analysis of the formalism's expressiveness suggests that it is particularly well-suited to model ordering shifts between languages, balancing needed flexibility against complexity constraints. We discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks such as segmentation, bracketing, phrasal alignment, and parsing.
- Ciprian CHELBA, David ENGLE, Frederick JELINEK, Victor JIMENEZ, Sanjeev
KHUDANPUR, Lidia MANGU, Harry PRINTZ, Eric RISTAD, Ronald ROSENFELD,
Andreas STOLCKE, and Dekai WU. "Structure and performance of a
dependency language model". EUROSPEECH'97. Rhodes, Greece:
Sep 1997.
We present a maximum entropy language model that incorporates both syntax and semantics via a dependency grammar. Such a grammar expresses the relations between words by a directed graph. Because the edges of this graph may connect words that are arbitrarily far apart in a sentence, this technique can incorporate the predictive power of words that lie outside of bigram or trigram range. We have built several simple dependency models, as we call them, and tested them in a speech recognition experiment. We report experimental results for these models here, including one that has a small but statistically significant advantage (p < .02) over a bigram language model.
- Pascale FUNG, Bertram SHI, Dekai WU, LAM Wai Bun, and WONG Shuen
Kong. "Dealing with
multilinguality in a spoken language query translator".
ACL/EACL-97 Workshop on Spoken Language Translation. Madrid: Jul
1997.
Robustness is an important issue for multilingual speech interfaces for spoken language translation systems. We have studied three aspects of robustness in such a system: accent differences, mixed language input, and the use of common feature sets for HMM-based speech recognizers for English and Cantonese. The results of our preliminary experiments show that accent differences case recognizer performance to degrade. A rather surprising finding is that for mixed language input, a straightforward implementation of a mixed language model-based speech recognizer performs less well than the concatenation of pure language recognizers. Our experimental results also show that a common feature set, parameter set, and common algorithm lead to different performance output for Cantonese and English speech recognition modules.
- Dekai WU. "A
polynomial-time algorithm for statistical machine translation".
ACL-96: 34th Annual Meeting of the Assoc. for Computational
Linguistics. Santa Cruz, CA: Jun. 1996.
We introduce a polynomial-time algorithm for statistical machine translation. This algorithm can be used in place of the expensive, slow best-first search strategies in current statistical translation architectures. The approach employs the stochastic bracketing transduction grammar (SBTG) model we recently introduced to replace earlier word alignment channel models, while retaining a bigram language model. The new algorithm in our experience yields major speed improvement with no significant loss of accuracy.
- Xuanyin XIA and Dekai WU. "Parsing Chinese with an
almost-context-free grammar". EMNLP-96, Conference on Empirical
Methods in Natural Language Processing. Philadelphia: May 1996.
We describe a novel parsing strategy we are employing for Chinese. We believe progress in Chinese parsing technology has been slowed by the excessive ambiguity that typically arises in pure context-free grammars. This problem has inspired a modified formalism that enhances our ability to write and maintain robust large grammars, by constraining productions with left/right contexts and/or nonterminal functions. Parsing is somewhat more expensive than for pure context-free parsing, but is still efficient by both theoretical and empirical analyses. Encouraging experimental results with our current grammar are described.
- Dekai WU and Xuanyin XIA. "Large-scale automatic extraction of
an English-Chinese lexicon". Machine Translation
9(3-4): 285-313. 1995.
We report experimental results on automatic extraction of an English-Chinese translation lexicon, by statistical analysis of a large parallel corpus, using limited amounts of linguistic knowledge. To our knowledge, these are the first empirical results of the kind between an Indo-European and non-Indo-European language for any significant vocabulary and corpus size. The learned vocabulary size is about 6,500 English words, achieving translation precision in the 86-96% range, with alignment proceeding at paragraph, sentence, and word levels.
Specifically, we report (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus, (2) experiments supporting the usefulness of restricted lexical cues for statistical paragraph and sentence alignment, and (3) experiments that question the role of hand-derived monolingual lexicons for automatic word translation acquitision.
Using a hand-derived monolingual lexicon, the learned translation lexicon averages 2.33 Chinese translations per English entry, with a manually-filtered precision of 95.1%, and an automatically-filtered weighted precision of 86.0%. We then introduce a fully automatic two-stage statistical methodology that is able to learn translations for collocations. A statistically-learned monolingual Chinese lexicon is first used to segment the Chinese text, before applying bilingual training to produce 6,429 English entries with 2.25 Chinese translations per entry. This method improves the manually-filtered precision to 96.0% and the automatically-filtered weighted precision to 91.0%, an error rate reduction of 35.7% from using a hand-derived monolingual lexicon.
- Dekai WU. "Stochastic
inversion transduction grammars, with application to segmentation,
bracketing, and alignment of parallel corpora". IJCAI-95: 14th
Intl. Joint Conf. on Artificial Intelligence, 1328-1335. Montreal:
Aug 1995.
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with potential application to a variety of parallel corpus analysis problems. The formalism combines three tactics against the constraints that render finite-state transducers less useful: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm. A convenient normal form is shown to exist, and we discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks.
- Dekai WU. "An algorithm
for simultaneously bracketing parallel texts by aligning words".
ACL-95: 33rd Annual Meeting of the Assoc. for Computational
Linguistics, 244-251. Cambridge, MA: Jun 1995.
We describe a grammarless method for simultaneously bracketing both halves of a parallel text and giving word alignments, assuming only a translation lexicon for the language pair. We introduce inversion-invariant transduction grammars which serve as generative models for parallel bilingual sentences with weak order constraints. Focusing on transduction grammars for bracketing, we formulate a normal form, and a stochastic version amenable to a maximum-likelihood bracketing algorithm. Several extensions and experiments are discussed.
- Dekai WU. "Trainable
coarse bilingual grammars for parallel text bracketing". WVLC-3:
3rd Annual Workshop on Very Large Corpora, 69-82. Cambridge, MA: Jun
1995.
Also in Susan ARMSTRONG, Kenneth W. CHURCH, Pierre ISABELLE, Sandra MANZI, Evelyne TZOUKERMANN, and David YAROWSKY (editors), Natural Language Processing Using Very Large Corpora. Dordrecht: Kluwer. ISBN 0-7923-6055-9. Nov 1999.
We describe two new strategies to automatic bracketing of parallel corpora, with particular application to languages where prior grammar resources are scarce: (1) coarse bilingual grammars, and (2) unsupervised training of such grammars via EM (expectation-maximization). Both methods build upon a formalism we recently introduced called stochastic inversion transduction grammars. The first approach borrows a coarse monolingual grammar into our bilingual formalism, in order to transfer knowledge of one language's constraints to the task of bracketing the texts in both languages. The second approach generalizes the inside-outside algorithm to adjust the grammar parameters so as to improve the likelihood of a training corpus. Preliminary experiments on parallel English-Chinese text are supportive of these strategies.
- Dekai WU. "Grammarless
extraction of phrasal translation examples from parallel texts".
TMI-95, Sixth International Conference on Theoretical and
Methodological Issues in Machine Translation, v2, 354-372. Leuven,
Belgium: Jul 1995.
We describe a method for identifying subsentential phrasal translation examples in sentence-aligned parallel corpora, using only a probabilistic translation lexicon for the language pair. Our method differs from previous approaches in that (1) it is founded on a formal basis, making use of an inversion transduction grammar (ITG) formalism that we recently developed for bilingual language modeling, and (2) it requires no language-specific monolingual grammars for the source and target languages. Instead, we devise a generic, language-independent constituent-matching ITG with inherent expressiveness properties that correspond to a desirable level of matching flexibility. Bilingual parsing, in conjunction with a stochastic version of the ITG formalism, performs the phrasal translation extraction.
- Dekai WU and Cindy NG. "Using brackets to improve
search for statistical machine translation". PACLIC-10, 10th
Pacific Asia Conference on Language, Information and Computation.
Hong Kong: Dec 1995.
We propose a method to improve search time and space complexity in statistical machine translation architectures, by employing linguistic bracketing information on the source language sentence. It is one of the advantages of the probabilistic formulation that competing translations may be compared and ranked by a principled measure, but at the same time, optimizing likelihoods over the translation space dictates heavy search costs. To make statistical architectures practical, heuristics to reduce search computation must be incorporated. An experiment applying our method to a prototype Chinese-English translation system demonstrates substantial improvement.
- Pascale FUNG and Dekai WU. "Coerced Markov Models for
cross-lingual lexical tag relations". TMI-95, Sixth International
Conference on Theoretical and Methodological Issues in Machine
Translation, v1, 240-255. Leuven, Belgium: Jul 1995.
We introduce the Coerced Markov Model (CMM) to model the relationship between the lexical sequence of a source language and the tag sequence of a target language, with the objective of constraining search in statistical transfer-based machine translation systems. CMMs differ from standard hidden Markov models in that state sequence assignments can take on values coerced from external sources. Given a Chinese sentence, a CMM can be used to predict the corresponding English tag sequence, thus constraining the English lexical sequence produced by a translation model. The CMM can also be used to score competing translation hypotheses in N-best models. Three fundamental problems for CMM designed are discussed. Their solutions lead to the training and testing stages of CMM.
- Eva FONG and Dekai WU. "Learning restricted
probabilistic link grammars". IJCAI-95 Workshop on New Approaches
to Learning for Natural Language Processing. Montreal: Aug 1995.
Also in Stefan WERMTER, Ellen RILOFF, Gabriele SCHELER (editors), Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, 173-187. 1996. Berlin: Springer-Verlag.
We describe a language model employing a new headed-disjuncts formulation of Lafferty's (1992) probabilistic link grammar, together with (1) an EM training method for estimating the probabilities, and (2) a procedure for learning some simple lexicalized grammar structures. The model in its simplest form is a generalization of n-gram models, but in its general form possesses context-free expressiveness. Unlike the original experiments on probabilistic link grammars, we assume that no hand-coded grammar is initially available (as with n-gram models). We employ untyped links to concentrate the learning on lexical dependencies, and our formulation uses the lexical identities of heads to influence the structure of the parse graph. After learning, the language model consists of grammatical rules in the form of a set of simple disjuncts for each word, plus several sets of probability parameters. The formulation extends cleanly toward learning more powerful context-free grammars. Several issues relating to generalization bias, linguistic constraints, and parameter smoothing are considered. Preliminary experimental results on small artificial corpora are supportive of our approach.
- Dekai WU and Pascale FUNG. "Improving Chinese tokenization
with linguistic filters on statistical lexical acquisition".
ANLP-94: 4th Conference on Applied Natural Language Processing,
180-181. Stuttgart: Oct 1994.
The first step in Chinese NLP is to tokenize or segment character sequences into words, since the text contains no word delimiters. Recent heavy activity in this area has shown the biggest stumbling block to be words that are absent from the lexicon, since successful tokenizers to date have been based on dictionary lookup (e.g., Chang & Chen 1993, Chiang et al. 1992, Lin et al. 1993, Wu & Tseng 1993, Sproat et al. 1994).
We present empirical evidence for four points concerning tokenization of Chinese text:
(1) More rigorous ``blind'' evaluation methodology is needed to avoid inflated accuracy measurements; we introduce the nk-blind method.
(2) The extent of the unknown-word problem is far more serious than generally thought, when tokenizing unrestricted texts in realistic domains.
(3) Statistical lexical acquisition is a practical means to greatly improve tokenization accuracy with unknown words, reducing error rates as much as 32.0%.
(4) When augmenting the lexicon, linguistic constraints can provide simple inexpensive filters yielding significantly better precision, reducing error rates as much as 49.4%. - Dekai WU and Xuanyin XIA. "Learning an English-Chinese
lexicon from a parallel corpus". AMTA-94: Assoc. for Machine
Translation, 206-213. Columbia, MD: Oct 1994.
We report experiments on automatic learning of an English-Chinese translation lexicon, through statistical training on a large parallel corpus. The learned vocabulary size is non-trivial at 6,517 English words averaging 2.33 Chinese translations per entry, with a manually-filtered precision of 95.1% and a single-most-probable precision of 91.2%. We then introduce a significance filtering method that is fully automatic, yet still yields a weighted precision of 86.0%. Learning of translations is adaptive to the domain. To our knowledge, these are the first empirical results of the kind between an Indo-European and non-Indo-European language for any significant corpus size with a non-toy vocabulary.
- Pascale FUNG and Dekai WU. "Statistical augmentation of a
Chinese machine-readable dictionary". WVLC-2: 2nd Annual Workshop
on Very Large Corpora, 69-85. Kyoto: Aug 1994.
Also in Susan ARMSTRONG, Kenneth W. CHURCH, Pierre ISABELLE, Sandra MANZI, Evelyne TZOUKERMANN, and David YAROWSKY (editors), Natural Language Processing Using Very Large Corpora. Dordrecht: Kluwer. ISBN 0-7923-6055-9. Nov 1999.
We describe a method of using statistically-collected Chinese character groups from a corpus to augment a Chinese dictionary. The method is particularly useful for extracting domain-specific and regional words not readily available in machine-readable dictionaries. Output was evaluated both using human evaluators and against a previously available dictionary. We also evaluated performance improvement in automatic Chinese tokenization. Results show that our method outputs legitimate words, acronymic constructions, idioms, names and titles, as well as technical compounds, many of which were lacking from the original dictionary.
- Dekai WU. "Aligning a
parallel English-Chinese corpus statistically with lexical criteria".
ACL-94: 32nd Annual Meeting of the Assoc. for Computational
Linguistics, 80-87. Las Cruces, NM: Jun 1994.
We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's (1991) length-based statistical method to the task of alignment involving a non-Indo-European language; and (3) an improved statistical method that also incorporates domain-specific lexical cues.
- Dekai WU. Aligning parallel English-Chinese texts
statistically with lexical criteria. Technical Report HKUST-CS93-9.
We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's (1991) length-based statistical method to the task of alignment involving a non-Indo-European language; and (3) an improved statistical method that also incorporates domain-specific lexical cues.
- Graeme HIRST and Dekai WU. "Not all reflexive reasoning is deductive". Behavioral and Brain Sciences 16(3): 462-463. 1993.
- Dekai WU. "Approximating maximum-entropy
ratings for evidential parsing and semantic interpretation".
IJCAI-93: 13th Intl. Joint Conf. on Artificial Intelligence,
1290-1296. Chamberry, France: Aug 1993.
We consider the problem of assigning probabilistic ratings to hypotheses in a natural language interpretation system. To facilitate integrating syntactic, semantic, and conceptual constraints, we allow a fully compositional frame representation, which permits co-indexed syntactic constituents and/or semantic entities filling multiple roles. In addition the knowledge base contains probabilistic information encoded by marginal probabilities on frames. These probabilities are used to specify typicality of real-world scenarios on one hand, and conventionality of linguistic usage patterns on the other. Because the theoretical maximum-entropy solution is infeasible in the general case, we propose an approximate method. This method's strengths are (1) its ability to rate compositional structures, and (2) its flexibility with respect to the inputs chosen by the system it is embedded in. Arbitrary sets of hypotheses from the front-end processor can be accepted, as well as arbitrary subsets of constraints heuristically chosen from the long-term knowledge base.
- Dekai WU. "Estimating
probability distributions over hypotheses with variable unification".
AAAI-93: 11th National Conf. on Artificial Intelligence,
790-795. Washington, D.C.: Jul 1993.
We analyze the difficulties in applying Bayesian belief networks to language interpretation domains, which typically involve many unification hypotheses that posit variable bindings. As an alternative, we observe that the structure of the underlying hypothesis space permits an approximate encoding of the joint distribution based on marginal rather than conditional probabilities. This suggests an implicit binding approach that circumvents the problems with explicit unification hypotheses, while still allowing hypotheses with alternative unifications to interact probabilistically. The proposed method accepts arbitrary subsets of hypotheses and marginal probability constraints, is robust, and is readily incorporated into standard unification-based and frame-based models.
- Dekai WU. "An
image-schematic system of thematic roles". PACLING-93: 1st Conf.
of the Pacific Association for Computational Linguistics, 323-332.
Vancouver: Apr 1993.
We describe a system of thematic roles and frames designed to address a number of problems in semantic representations at the lexical semantic level. Our primary objective is broad expressiveness, so that real domains can practically be encoded. However, for both empirical and computational reasons we limit the number of role types to four, allocating this structure to the strongest associations. We show how the system incorporates image-schematic semantics to encode various schematization operations relating to scales and reification.
- Andreas STOLCKE and Dekai WU. "Tree matching with
recursive distributed representations". AAAI 1992 Workshop on
Integrating Neural and Symbolic Processes---The Cognitive Dimension.
San Jose, CA: Jul 1992. Also available as ICSI Technical Report TR-92-025.
We present an approach to the structure unification problem using distributed representations of hierarchical objects. Binary trees are encoded using the recursive auto-association method (RAAM), and a unification network is trained to perform the tree matching operation on the RAAM representations. It turns out that this restricted form of unification can be learned without hidden layers and producing good generalization if we allow the error signal from the unification task to modify both the unification network and the RAAM representations themselves.
- Dekai WU. "Active acquisition of
user models: Implications for decision-theoretic dialog planning and plan
recognition". User Modeling and User-Adapted Interaction
1(2): 149-172. 1991.
This article investigates the implications of active user model acquisition upon plan recognition, domain planning, and dialog planning in dialog architectures. A dialog system performs active user model acquisition by querying the user during the course of the dialog. Existing systems employ passive strategies that rely on inferences drawn from passive observation of the dialog. Though passive acquisition generally reduces unnecessary dialog, in some cases the system can effectively shorten the overall dialog length by selectively initiating subdialogs for acquiring information about the user.
We propose a theory identifying conditions under which the dialog system should adopt active acquisition goals. Active acquisition imposes a set of rationality requirements not met by current dialog architectures. To ensure rational dialog decisions, we propose significant extensions to plan recognition, domain planning, and dialog planning models, incorporating decision-theoretic heuristics for expected utility. The most appropriate framework for active acquisition is a multi-attribute utility model wherein plans are compared along multiple dimensions of utility. We suggest a general architectural scheme, and present an example from a preliminary implementation.
- Dekai WU. "A probabilistic approach to marker propagation". IJCAI 1989. Detroit, MI. 574-582.
- Dekai WU. "Review of Natural Language Understanding". AI Magazine 10(1): 88-90 (1989).
- Robert WILENSKY, David N CHIN, Marc LURIA, James H MARTIN, James H
MAYFIELD, and Dekai WU. "The Berkeley UNIX
Consultant Project". Computational Linguistics 14(3):
35-84 (1988). Also available as UC
Berkeley Technical Report CSD-89-520.
UC (UNIX Consultant) is an intelligent, natural-language interface that allows naive users to learn about the UNIX operating system. UC was undertaken because the task was thought to be both a fertile domain for Artificial Intelligence research and a useful application of AI work in planning, reasoning, natural language processing, and knowledge representation.
- Dekai WU. "Concretion inferences in natural language understanding". GWAI 1987. Springer-Verlag. 74-83.
- Robert WILENSKY, James MAYFIELD, Anthony ALBERT, David CHIN, Charles
COX, Marc LURIA, James H MARTIN, and Dekai WU. UC---A Progress
Report. UC
Berkeley Technical Report CSD-87-303.
UC is an intelligent, natural-language interface that allows naive users to learn about the UNIX operating system. UC was undertaken because the task was thought to be both a fertile domain for Artificial Intelligence research and a useful application of AI work in planning, reasoning, natural language processing, and knowledge representation.
| Department of Computer Science | ![]() |
| The Hong Kong University of Science and Technology | |
| All rights reserved |


