Comparing Arabic NLP tools for Hadith Classification

Kaouther Faidi, Raja Ayed, Ibrahim Bounhas, Bilel Elayeb

Abstract


 

 

 

  

Text classification is the process of classifying documents into a predefined set of categories based on their content. As Arabic words may have more complicated forms than many other languages, it is challenging to choose the indexing unit and to get rid of affixes. In this paper we compare the performance of different techniques for classifying Al-Hadith Al-Shareef which was analyzed with six Arabic tools (Al-Stem Darwish, Al-Stem Alex, Khoja’s stemmer, Quadrigrams, Trigrams and a disambiguation tool based on AraMorph). We also compare three classification techniques implemented on WEKA toolkit; namely decision trees (DT), Naïve Bayes algorithm (NB) and SVM algorithm (Support Vector Machines). We used the TF-IDF to compute the relative frequency of each word in a particular document and the cross validation to evaluate the result of the classifiers. Experimental results show that Khoja’s stemmer outperformed the other tools and that the SVM classifier achieves the highest accuracy followed by the Naïve Bayes classifier, and decisions trees classifier respectively.

 


Full Text:

PDF

Refbacks

  • There are currently no refbacks.