首站-论文投稿智能助手
典型文献
Pretrained Models and Evaluation Data for the Khmer Language
文献摘要:
Trained on a large corpus,pretrained models(PTMs)can capture different levels of concepts in context and hence generate universal language representations,which greatly benefit downstream natural language processing(NLP)tasks.In recent years,PTMs have been widely used in most NLP applications,especially for high-resource languages,such as English and Chinese.However,scarce resources have discouraged the progress of PTMs for low-resource languages.Transformer-based PTMs for the Khmer language are presented in this work for the first time.We evaluate our models on two downstream tasks:Part-of-speech tagging and news categorization.The dataset for the latter task is self-constructed.Experiments demonstrate the effectiveness of the Khmer models.In addition,we find that the current Khmer word segmentation technology does not aid performance improvement.We aim to release our models and datasets to the community in hopes of facilitating the future development of Khmer NLP applications.
文献关键词:
作者姓名:
Shengyi Jiang;Sihui Fu;Nankai Lin;Yingwen Fu
作者机构:
School of Information Science and Technology,Guangdong University of Foreign Studies,Guangzhou 510000,China;Guangzhou Key Laboratory of Multilingual Intelligent Processing,Guangdong University of Foreign Studies,Guangzhou 510000,China
引用格式:
[1]Shengyi Jiang;Sihui Fu;Nankai Lin;Yingwen Fu-.Pretrained Models and Evaluation Data for the Khmer Language)[J].清华大学学报自然科学版(英文版),2022(04):709-718
A类:
Pretrained,Khmer,Trained,discouraged
B类:
Models,Evaluation,Data,Language,large,corpus,pretrained,models,PTMs,can,capture,different,levels,concepts,context,hence,generate,universal,representations,which,greatly,benefit,downstream,natural,processing,NLP,tasks,In,recent,years,have,been,widely,used,most,applications,especially,high,languages,such,English,Chinese,However,scarce,resources,progress,low,Transformer,are,presented,this,work,first,We,evaluate,two,Part,speech,tagging,news,categorization,latter,self,constructed,Experiments,demonstrate,effectiveness,addition,find,that,current,word,segmentation,technology,does,not,aid,performance,improvement,aim,release,datasets,community,hopes,facilitating,future,development
AB值:
0.580683
相似文献
Efficient Visual Recognition:A Survey on Recent Advances and Brain-inspired Methodologies
Yang Wu;Ding-Heng Wang;Xiao-Tong Lu;Fan Yang;Man Yao;Wei-Sheng Dong;Jian-Bo Shi;Guo-Qi Li-Applied Research Center Laboratory,Tencent Platform and Content Group,Shenzhen 518057,China;School of Automation Science and Engineering,Faculty of Electronic and Information Engineering,Xi'an Jiaotong University,Xi'an 710049,China;School of Artificial Intelligence,Xidian University,Xi'an 710071,China;Division of Information Science,Nara Institute of Science and Technology,Nara 6300192,Japan;Peng Cheng Laboratory,Shenzhen 518000,China;Department of Computer and Information Science,University of Pennsylvania,Philadelphia PA 19104-6389,USA;Institute of Automation,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100190,China
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。