首站-论文投稿智能助手
典型文献
Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning
文献摘要:
Training and testing different models in the field of text classification mainly depend on the pre-classified text document datasets. Recently, seven datasets have emerged for Arabic text classification, including Single-Label Arabic News Articles Dataset (SANAD), Khaleej, Arabiya, Akhbarona, KALIMAT, Waten2004, and Khaleej2004. This study investigates which of these datasets can provide significant training and fair evaluation for text classification (TC). In this investigation, well-known and accurate learning models are used, including naive Bayes (NB), random forest (RF), K-nearest neighbor (KNN), support vector machines (SVM), and logistic regression (LR) models. We present relevance and time measures of training the models with these datasets to enable Arabic language researchers to select the appropriate dataset to use based on a solid basis of comparison. The performances of the five learning models across the seven datasets are measured and compared with the performances of the same models trained on a well-known English language dataset. The analysis of the relevance and time scores shows that training the SVM model on Khaleej and Arabiya obtained the most significant results in the shortest amount of time, with the accuracy of 82%.
文献关键词:
作者姓名:
Ahmad Hussein Ababneh
作者机构:
Computer Science Department,American University of Madaba,Madaba 2882
文献出处:
引用格式:
[1]Ahmad Hussein Ababneh-.Investigating the Relevance of Arabic Text Classification Datasets Based on Supervised Learning)[J].电子科技学刊,2022(02):187-208
A类:
SANAD,Khaleej,Arabiya,Akhbarona,KALIMAT,Waten2004,Khaleej2004
B类:
Investigating,Relevance,Arabic,Text,Classification,Datasets,Based,Supervised,Learning,Training,testing,different,models,field,text,classification,mainly,depend,classified,document,datasets,Recently,seven,have,emerged,including,Single,Label,News,Articles,This,study,investigates,which,these,provide,significant,training,fair,evaluation,this,investigation,well,known,accurate,learning,used,naive,Bayes,NB,random,forest,RF,nearest,neighbor,KNN,support,vector,machines,logistic,regression,LR,We,present,relevance,measures,enable,language,researchers,select,appropriate,solid,basis,comparison,performances,five,across,measured,compared,same,trained,English,analysis,scores,shows,that,obtained,most,results,shortest,amount,accuracy
AB值:
0.521328
相似文献
机标中图分类号,由域田数据科技根据网络公开资料自动分析生成,仅供学习研究参考。