Using Semantic Role Labels to Reorder Statistical Machine Translation Output

MPhil Thesis Defence


Title: "Using Semantic Role Labels to Reorder Statistical Machine Translation Output"

By

Miss Chi-Kiu Lo


Abstract

In this thesis, we show that reordering Statistical Machine Translation (SMT) 
output to match its semantic roles with those of the input improves the 
translation quality.

Translation quality can be evaluated in terms of adequacy, fluency and 
fidelity. Current SMT systems attempts to tackle adequacy primarily by 
memorizing in a bi-lexicon all word (or phrase) translation pairs that co-occur 
frequently in a training corpus, using various statistics with the hope of 
improving the accuracy of translation lexical choice. They model the word order 
in the translation output as a statistical dependency problem, relying heavily 
on monolingual n-gram language models of the output language in an attempt to 
compensate for weak bilingual models of word (or phrase) alignment and 
permutation. Since no semantic features are considered throughout the process 
of training and translating, it is not surprising that serious semantic role 
confusion errors appear in the SMT output. To tackle this problem, one approach 
is to integrate semantic information into SMT.

Firstly, we study in detail a state-of-the-art Chinese shallow semantic parser, 
C-ASSERT, which consists of a Chinese word segmenter and a Chinese shallow 
syntactic parser, is studied in detail. A set of controlled experiments is 
carried out by using different Chinese word segmenters and Chinese shallow 
syntactic parsers. It is found that the best performance is obtained when the 
Chinese word segmenter and the Chinese shallow syntactic parser are both the 
maximum entropy model built by our research center at HKUST.

Then, to provide solid groundwork to support our claim that using Semantic Role 
Labels (SRL) to reorder SMT output improves translation quality, a strong SMT 
baseline is set up and optimized.

An objective scoring function is then devised to quantify the matching of 
shallow semantic role between the Chinese source input and the SMT output. 
Finally, an algorithm is built to reorder the SMT output using semantic role 
labels. The experiment results show that the algorithm successfully returns a 
better translation with fewer semantic role confusion errors.


Date:			Monday, 24 August 2009

Time:			2:00pm – 4:00pm

Venue:			Room 3301A
 			Lifts 17-18

Committee Members:	Dr. Dekai Wu (Supervisor)
 			Dr. Brian Mak (Chairperson)
 			Dr. Pascale Fung (ECE)


**** ALL are Welcome ****