Meriem BELOUCIF مريم بلوصيف
PhD candidate at the Department of Computer Science and Engineering, HKUSTHuman Language Technology Center
Department of Computer Science
The Hong Kong University of Science and Technology
HKUST, Clear Water Bay, Kowloon, Hong Kong
lab +852 2358-8831 room 2602 (lifts 29/30)
mbeloucif (at) cs (dot) ust (dot) hk http://www.cs.ust.hk/~mbeloucif/
I am currently pursuing a Doctor of Philosophy (PhD) degree at the Department of Computer Science and Engineering in Hong Kong University of Science and Technology (HKUST) under the supervision of Professor Dekai Wu.
Before joining HKUST, I did a Master degree in Software Engineering at the Department of Computer Science, University of Science and Technology Houari Boumedienne, Algiers, Algeria.
I am currently doing research on statistical machine translation with a particle focus on word alignment, semantic role labeling, machine translation evaluation, quality estimation, automatic post editing and semantic analysis of text.Please feel free to check my curriculum vitae and my Google Scholar.
Statistical Machine Translation, Word Alignment, Semantic Role Labeling, SMT for Low Resource Languages, Quality Estimation, Natural Language Processing, Machine Learning.
Study and implementation of a face authentification and recognition system based on the fusion of Principal Components Analysis (PCA) and Linear Discriminant Analysis (LDA) scores.
- Design and implementation of an information system for hardware monitoring for NAFTAL Algeria.
Grants and Awards
- Meriem BELOUCIF and Dekai WU.
"Injecting a Semantic Objective Function into Early Stage Learning of Spoken Language Translation".
Sixth IEEE Workshop on Spoken Language Technology (SLT
2016). San Diego: Dec 2016.
We propose an approach in which we inject a crosslingual semantic frame based objective function directly into inversion transduction grammar (ITG) induction in order to semantically train spoken language translation systems. This approach represents a follow-up of our recent work on improving machine translation quality by tuning loglinear mixture weights using a semantic frame based objective function in the late, final stage of statistical machine translation training. In contrast, our new approach injects a semantic frame based objective function back into earlier stages of the training pipeline, during the actual learning of the translation model, biasing learning toward semantically more accurate alignments. Our work is motivated by the fact that ITG alignments have empirically been shown to fully cover crosslingual semantic frame alternations. We show that injecting a crosslingual semantic based objective function for driving ITG induction further sharpens the ITG constraints, leading to better performance than either the conventional ITG or the traditional GIZA++ based approaches.
- Meriem BELOUCIF, Markus SAERS and Dekai WU.
"Improving word alignment for low resource languages using English monolingual SRL".
Sixth Workshop on Hybrid Approaches to Translation (HyTra-6). Osaka, Japan: Dec 2016.
We introduce a new statistical machine translation approach specifically geared to learning translation from low resource languages, that exploits monolingual English semantic parsing to bias inversion transduction grammar (ITG) induction. We show that in contrast to conventional statistical machine translation (SMT) training methods, which rely heavily on phrase memorization, our approach focuses on learning bilingual correlations that help translating low resource languages, by using the output language semantic structure to further narrow down ITG constraints. This approach is motivated by previous research which has shown that injecting a semantic frame based objective function while training SMT models improves the translation quality. We show that including a monolingual semantic objective function during the learning of the translation model leads towards a semantically driven alignment which is more efficient than simply tuning loglinear mixture weights against a semantic frame based evaluation metric in the final stage of statistical machine translation training. We test our approach with three different language pairs and demonstrate that our model biases the learning towards more semantically correct alignments. Both GIZA++ and ITG based techniques fail to capture meaningful bilingual constituents, which is required when trying to learn translation models for low resource languages. In contrast, our proposed model not only improve translation by injecting a monolingual objective function to learn bilingual correlations during early training of the translation model, but also helps to learn more meaningful correlations with a relatively small data set, leading to a better alignment compared to either conventional ITG or traditional GIZA++ based approaches.
- Meriem BELOUCIF and Dekai WU.
"Injecting a Semantic Objective Function into Early Stage Learning of Spoken Language Translation."
Oriental COCOSDA 2016, 19th International Conference of the Oriental Chapter of the
International Committee for the Co-ordination and Standardization of
Speech Databases and Assessment Techniques (O-COCOSDA
2016). Bali, Indonesia: Oct 2016.
We describe a new approach for semantically training spoken language translation systems, in which we inject a crosslingual semantic frame based objective function directly into inversion transduction grammar (ITG) induction. This represents an ambitious jump from recent work on improving translation adequacy by using a semantic frame based objective function to drive the tuning of loglinear mixture weights in the final stage of statistical machine translation training. In contrast, our new approach propagates a semantic frame based objective function back into much earlier stages of the pipeline, during the actual learning of the translation model, biasing learning toward semantically more accurate alignments. This approach is motivated by the fact that ITG alignments have empirically been shown to fully cover crosslingual semantic frame alternations, even though they rule out an overwhelming majority of the space of possible alignments. We show that directly driving ITG induction with a crosslingual semantic based objective function not only helps to further sharpen the ITG constraints, but still avoids excising relevant portions of the search space, and leads to better performance than either conventional ITG or GIZA++ based approaches.
- Meriem BELOUCIF and Dekai WU.
"Driving inversion transduction grammar induction with semantic evaluation".
5th Joint Conference on Lexical and Computational Semantics (*SEM 2016), at ACL 2016, 55-63. Berlin: Aug 2016.
We describe a new technique for improving statistical machine translation training by adopting scores from a recent crosslingual semantic frame based evaluation metric, XMEANT, as outside probabilities in expectation-maximization based ITG (inversion transduction grammars) alignment. Our new approach strongly biases early-stage SMT learning towards semantically valid alignments. Unlike previous attempts that have proposed using semantic frame based evaluation metrics as the objective function for late-stage tuning of less than a dozen loglinear mixture weights, our approach instead applies the semantic metric at one of the earliest stages of SMT training, where it may impact millions of model parameters. The choice of XMEANT is motivated by empirical studies that have shown ITG constraints to cover almost all crosslingual semantic frame alternations, which resemble the crosslingual semantic frame matching measured by XMEANT. Our experiments purposely restrict training data to small amounts to show the technique's utility in the absence of a huge corpus, to study the effects of semantic generalizations while avoiding overreliance on memorization. Results show that directly driving ITG training with the crosslingual semantic frame based objective function not only helps to further sharpen the ITG constraints, but still avoids excising relevant portions of the search space, and leads to better performance than either conventional ITG or GIZA++ based approaches.
- Meriem BELOUCIF and Dekai WU.
"A semantically confidence-weighted ITG induction algorithm".
3rd International Workshop on Semantic Machine Learning (SML 2016), at IJCAI 2016. New York: Jul 2016.
We propose a new algorithm to induce inversion transduction grammars, in which a crosslingual semantic frame based objective function is injected as confidence weighting in the early stages of statistical machine translation training. Unlike recent work on improving translation adequacy that uses a monolingual semantic frame based objective function to drive the tuning of loglinear mixture weights in the late stages of statistical machine translation training, our bilingual approach incorporates the semantic objective during the actual learning of the translation model's structure. Our approach assigns higher confidence to training examples in which the semantic frames in the input language more closely match the semantic frames of the output language, as predicted automatically by XMEANT, the crosslingual semantic frame based machine translation evaluation metric. We chose to apply this approach to induce inversion transduction grammars (ITGs), since ITG alignments prune a large majority of the space of possible alignments, while at the same time empirically fully covering all the crosslingual semantic frame alternations of the type we are using for confidence weighting. Results show that boosting semantically compatible training examples in ITG induction improves the translation performance compared to either traditional GIZA++ alignment or conventional ITG alignment based approaches for phrase based statistical machine translation.
- Meriem BELOUCIF, Markus SAERS, and Dekai WU. "Improving Semantic SMT via Soft Semantic Role Label Constraints on ITG Alignments ". Machine Translation Summit XV, Miami, Florida: Oct 2015.
We show that applying semantic role label constraints to bracketing ITG alignment to train MT systems improves the quality of MT output in comparison to the conventional BITG and GIZA alignments. Moreover, we show that applying soft constraints to SRL-constrained BITG alignment leads to a better translation system compared to using hard constraints which appear too harsh to produce meaningful biparses. We leverage previous work demonstrating that BITG alignments are able to fully cover cross-lingual semantic frame alternations, by using semantic role labeling to further narrow BITG constraints, in a soft fashion that avoids losing relevant portions of the search space. SRL-based evaluation metrics like MEANT have shown that tuning towards preserving the shallow semantic structure across translations, robustly improves translation performance. Our approach brings the same intuition into the training phase. We show that our new alignment outperforms both conventional Moses and BITG alignment baselines in terms of the adequacy-oriented MEANT scores, while still producing comparable results in terms of edit distance metrics.
- Meriem BELOUCIF, Chi-kiu LO, and Dekai WU. "Improving MEANT
Based Semantically Tuned SMT". 11th International Workshop
on Spoken Language Translation (IWSLT 2014), 34-41. Lake Tahoe, California: Dec 2014.
We discuss various improvements to our MEANT tuned system, previously presented at IWSLT 2013. In our 2014 system, we incorporate this year's improved version of MEANT, improved Chinese word segmentation, Chinese named entity recognition and dedicated proper name translation, and number expression handling. This results in a significant performance jump compared to last year's system. We also ran preliminary experiments on tuning to IMEANT, our new ITG based variant of MEANT. The performance of tuning to IMEANT is comparable to tuning on MEANT (differences are statistically insignificant). We are presently investigating if tuning on IMEANT can produce even better results, since IMEANT was actually shown to correlate with human adequacy judgment more closely than MEANT. Finally, we ran experiments applying our new architectural improvements to a contrastive system tuned to BLEU. We observed a slightly higher jump in comparison to last year, possibly due to mismatches of MEANT's similarity models to our new entity handling.
- Dekai WU, Chi-kiu LO, Meriem BELOUCIF and Markus SAERS. "Better Semantic Frame Based MT Evaluation via Inversion Transduction Grammars". Proceedings of SSST-8, Eighth
Workshop on Syntax, Semantics and Structure
in Statistical Translation (at EMNLP 2014), 22-33. Doha: Oct 2014.
We introduce an inversion transduction grammar based restructuring of the MEANT automatic semantic frame based MT evaluation metric, which, by leveraging ITG language biases, is able to further improve upon MEANT's already-high correlation with human adequacy judgments. The new metric, called IMEANT, uses bracketing ITGs to biparse the reference and machine translations, but subject to obeying the semantic frames in both. Resulting improvements support the presumption that ITGs, which constrain the allowable permutations between compositional segments across the reference and MT output, score the phrasal similarity of the semantic role fillers more accurately than the simple word alignment heuristics (bag-of-word alignment or maximum alignment) used in previous version of MEANT. The approach successfully integrates (1) the previously demonstrated extremely high coverage of cross-lingual semantic frame alternations by ITGs, with (2) the high accuracy of evaluating MT via weighted f-scores on the degree of semantic frame preservation.
- Chi-kiu LO, Meriem BELOUCIF, Markus SAERS, and Dekai WU.
"XMEANT: Better semantic MT evaluation without reference translations".
52nd Annual Meeting of the Association for Computational
2014), 765-771. Baltimore, Maryland: Jun 2014.
We introduce XMEANT---a new cross-lingual version of the semantic frame based MT evaluation metric MEANT---which can correlate even more closely with human adequacy judgments than monolingual MEANT and eliminates the need for expensive human references. Previous work established that MEANT reflects translation adequacy with state-of-the-art accuracy, and optimizing MT systems against MEANT robustly improves translation quality. However, to go beyond tuning weights in the loglinear SMT model, a cross-lingual objective function that can deeply integrate semantic frame criteria into the MT training pipeline is needed. We show that cross-lingual XMEANT outperforms monolingual MEANT by (1) replacing the monolingual context vector model in MEANT with simple translation probabilities, and (2) incorporating bracketing ITG constraints.
- Chi-kiu LO,Meriem BELOUCIF, and Dekai WU. "Improving machine
translation into Chinese by tuning against Chinese MEANT". 10th International Workshop on Spoken Language Translation (IWSLT 2013). Heidelberg, Germany: Dec 2013.
We present the first ever results showing that Chinese MT output is significantly improved by tuning a MT system against a semantic frame based objective function, MEANT, rather than an n-gram based objective function, BLEU, as measured across commonly used metrics and different test sets. Recent work showed that by preserving the meaning of the translations as captured by semantic frames in the training process, MT systems for translating into English on both formal and informal genres are constrained to produce more adequate translations by making more accurate choices on lexical output and reordering rules. In this paper we describe our experiments in IWSLT 2013 TED talk MT tasks on tuning MT systems against MEANT for translating into Chinese and English respectively. We show that the Chinese translation output benefits more from tuning a MT system against MEANT than the English translation output due to the ambiguous nature of word boundaries in Chinese. Our encouraging results show that using MEANT is a promising alternative to BLEU in both evaluating and tuning MT systems to drive the progress of MT research across different languages.
- Dekai WU, Karteek ADDANKI, Markus SAERS, and Meriem BELOUCIF. "Learning
to Freestyle: Hip Hop Challenge-Response Induction via Transduction
Rule Segmentation". 2013 Conference on Empirical Methods in
Natural Language Processing (EMNLP 2013), 102-112. Seattle: Oct 2013.
We attack an inexplicably under-explored language genre of spoken language "lyrics in music" via completely unsupervised induction of an SMT-style stochastic transduction grammar for hip hop lyrics, yielding a fully-automatically learned challenge-response system that produces rhyming lyrics given an input. Unlike previous efforts, we choose the domain of hip hop lyrics, which is particularly unstructured and noisy. A novel feature of our approach is that it is completely unsupervised and requires no a priori linguistic or phonetic knowledge. In spite of the level of difficulty of the challenge, the model nevertheless produces fluent output as judged by human evaluators, and performs significantly better than widely used phrase-based SMT models upon the same task.
Social NetworkPlease feel free to contact me in any of the following:
Last updated: 15th Feb 2016