Stylometric Authentication of an Uncredible Extra-Hadith Collection
Abstract
In this paper, we describe a survey on the stylometric authentication of an uncredible extradataset claimed to be a part of the Hadith, but for which religious scholars showed that it was probably not (i.e., fabricated or weak collection). The extra-Hadith collection is analyzed and compared to the genuine certified Hadith book of Bukhari. For that purpose, we present a stylometric approach based on the author style of the Matn (i.e., pure speech of the Prophet - Pbuh). That is, two experiments are conducted and commented: the first experiment is an authorship attribution on 19 text segments; and the second experiment is an automatic document clustering on 15 text segments. In the first experiment, we used character 4-grams and the nearest neighbor classification technique with Manhattan distance. In the 2nd experiment, we used a Hierarchical Clustering with Manhattan distance and Spearman distance. The results of both classification and clustering experiments show a difference in author style between the uncredible extra-Hadith collection (or at least a main part of it) and the genuine Bukhari Hadith. Although the authentication technique is made here at the subset level (i.e., text subsets of about 500 words each), the obtained results give a scientific agreement to the Islamic religious scholars about their evaluation on the doubtful collection: the uncredible collection, or at least a main part of it, does not have the same author style as the genuine Hadith one.
Full Text:
PDFRefbacks
- There are currently no refbacks.