一種基于詞素媒介的漢蒙統(tǒng)計(jì)機(jī)器翻譯方法
發(fā)布時(shí)間:2018-05-02 07:59
本文選題:中間語言 + 詞素; 參考:《中文信息學(xué)報(bào)》2017年04期
【摘要】:漢蒙語形態(tài)差異性及平行語料庫規(guī)模小制約了漢蒙統(tǒng)計(jì)機(jī)器翻譯性能的提升。該文將蒙古語形態(tài)信息引入漢蒙統(tǒng)計(jì)機(jī)器翻譯中,通過將蒙古語切分成詞素的形式,構(gòu)造漢語詞和蒙古語詞素,以及蒙古語詞素和蒙古語的映射關(guān)系,彌補(bǔ)漢蒙形態(tài)結(jié)構(gòu)上的非對(duì)稱性,并將詞素作為中間語言,通過訓(xùn)練漢語—蒙古語詞素以及蒙古語詞素-蒙古語統(tǒng)計(jì)機(jī)器翻譯系統(tǒng),構(gòu)建出新的短語翻譯表和調(diào)序模型,并采用多路徑解碼及多特征的方式融入漢蒙統(tǒng)計(jì)機(jī)器翻譯。實(shí)驗(yàn)結(jié)果表明,將基于詞素媒介構(gòu)建出的短語翻譯表和調(diào)序模型引入現(xiàn)有統(tǒng)計(jì)機(jī)器翻譯方法,使得譯文在BLEU值上比基線系統(tǒng)有了明顯提高,一定程度上消解了數(shù)據(jù)稀疏和形態(tài)差異對(duì)漢蒙統(tǒng)計(jì)機(jī)器翻譯的影響。該方法是一種通用的方法,通過詞素和短語兩個(gè)層面信息的結(jié)合,實(shí)現(xiàn)了兩種語言在形態(tài)結(jié)構(gòu)上的對(duì)稱,不僅適用于漢蒙統(tǒng)計(jì)機(jī)器翻譯,還適用于形態(tài)非對(duì)稱且低資源的語言對(duì)。
[Abstract]:The differences of Chinese and Mongolian morphology and the small size of parallel corpus restrict the improvement of statistical machine translation performance. In this paper, the morphological information of Mongolian language is introduced into the statistical machine translation of Han and Mongolian languages. By dividing Mongolian language into morpheme forms, this paper constructs the mapping relationship between Chinese words and Mongolian morphemes, as well as Mongolian morphemes and Mongolian morphemes. In order to make up for the asymmetry in morphology and structure of Han and Mongolian, and take morpheme as the intermediate language, a new phrase translation table and order model are constructed by training the morpheme of Chinese and Mongolian and the statistical machine translation system of morpheme and morpheme in Mongolian. Multipath decoding and multi-feature approach are used to integrate the statistical machine translation of Han and Meng. The experimental results show that the phrase translation table and the orchestration model based on morpheme medium are introduced into the existing statistical machine translation methods, and the BLEU value of the translation is significantly higher than that of the baseline system. To some extent, the effects of data sparsity and morphological differences on statistical machine translation in Han and Mongolia are eliminated. This method is a general method. By combining morpheme and phrase information, the two languages are symmetrical in morphology and structure, which is not only suitable for the statistical machine translation of Han and Mongolian. It also applies to asymmetric and low resource language pairs.
【作者單位】: 中國科學(xué)技術(shù)大學(xué)自動(dòng)化系;中國科學(xué)院合肥智能機(jī)械研究所;
【基金】:國家自然科學(xué)基金(61502445,61572462) 中國科學(xué)院信息化專項(xiàng)(XXH12504-1-10)
【分類號(hào)】:H085.3
【相似文獻(xiàn)】
相關(guān)期刊論文 前1條
1 馮志偉;;《統(tǒng)計(jì)機(jī)器翻譯》述評(píng)[J];外語教學(xué)與研究;2013年04期
相關(guān)會(huì)議論文 前2條
1 付雷;呂雅娟;劉群;;基于句型模板和統(tǒng)計(jì)機(jī)器翻譯技術(shù)的翻譯方法[A];內(nèi)容計(jì)算的研究與應(yīng)用前沿——第九屆全國計(jì)算語言學(xué)學(xué)術(shù)會(huì)議論文集[C];2007年
2 柴春光;宗成慶;;影響統(tǒng)計(jì)翻譯系統(tǒng)性能的因素分析[A];第三屆學(xué)生計(jì)算語言學(xué)研討會(huì)論文集[C];2006年
相關(guān)碩士學(xué)位論文 前1條
1 修馳;統(tǒng)計(jì)機(jī)器翻譯語料預(yù)處理中的問題研究[D];北京語言大學(xué);2009年
,本文編號(hào):1833049
本文鏈接:http://www.lk138.cn/wenyilunwen/yuyanyishu/1833049.html
最近更新
教材專著