典型文献
Pretrained Models and Evaluation Data for the Khmer Language
文献摘要:
Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(NLP)tasks.In recent years,PTMs have been widely used in most NLP applications,especially for high-resource languages,such as English and Chinese.However,scarce resources have discouraged the progress of PTMs for low-resource languages.Transformer-based PTMs for the Khmer language are presented in this work for the first time.We evaluate our models on two downstream tasks:Part-of-speech tagging and news categorization.The dataset for the latter task is self-constructed.Experiments demonstrate the effectiveness of the Khmer models.In addition,we find that the current Khmer word segmentation technology does not aid performance improvement.We aim to release our models and datasets to the community in hopes of facilitating the future development of Khmer NLP applications.
文献关键词:
中图分类号:
作者姓名:
Shengyi Jiang;Sihui Fu;Nankai Lin;Yingwen Fu
作者机构:
School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510000,China;Guangzhou Key Laboratory of Multilingual Intelligent Processing,Guangdong University of Foreign Studies,Guangzhou 510000,China
文献出处:
引用格式:
[1]Shengyi Jiang;Sihui Fu;Nankai Lin;Yingwen Fu-.Pretrained Models and Evaluation Data for the Khmer Language)[J].清华大学学报自然科学版(英文版),2022(04):709-718
A类:
Pretrained,Khmer,Trained,discouraged
B类:
Models,Evaluation,Data,Language,large,corpus,pretrained,models,PTMs,can,capture,different,levels,concepts,context,hence,generate,universal,representations,which,greatly,benefit,downstream,natural,processing,NLP,tasks,In,recent,years,have,been,widely,used,most,applications,especially,high,languages,such,English,Chinese,However,scarce,resources,progress,low,Transformer,are,presented,this,work,first,We,evaluate,two,Part,speech,tagging,news,categorization,latter,self,constructed,Experiments,demonstrate,effectiveness,addition,find,that,current,word,segmentation,technology,does,not,aid,performance,improvement,aim,release,datasets,community,hopes,facilitating,future,development
AB值:
0.580683
相似文献
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。