Non-authentic Hadith Corpus: Design and Methodology

Taghreed Tarmom, Eric Atwell, Mohammad Alsalka

Abstract


The primary religious text of Islam is the Quran. The Hadith—the second source—refers to any action, saying, order or silent approval of the holy prophet Muhammad that has been delivered through a chain of narrators. Each Hadith has an Isnad—the chain of narrators—and a Matan—the act of the Prophet Muhammad. In contrast to the Quran, some Hadiths, which have been handed down over the centuries, have been corrupted by narrators who were not competent in transferring them. These have been classified by Hadith scholars as a non-authentic Hadith (NAH). To evaluate different classifiers regarding the automatic classification of Arabic Hadith, it was necessary to build Arabic Hadith corpora that contained samples of authentic and non-authentic Hadith, which were used for training models and testing. This paper aimed to create a new NAH corpus which consists of 452,624 words from six different Hadith books. The subsequent aim is to annotate this corpus to determine some Hadith features such as the Isnad, the Matan and the Hadith authenticity and to provide a ground truth.


Full Text:

PDF

References


• Al-Kabi, M., Wahsheh, H. and Alsmadi, I. (2014). A topical classification of Hadith Arabic text. IMAN, 2014, 2.

• Alkahtani, S. (2015). Building and verifying parallel corpora between Arabic and English. PhD thesis. Computer Science, Bangor University. Available at http://e.bangor.ac.uk/6546/1/saad_alkahtani_dissertation.pdf

• Alkahtani, S. and Teahan, W. (2015, December). A new parallel corpus of Arabic/English. In Proceedings of the Eighth Saudi Students Conference in the UK, January (p. 279). World Scientific.

• Altammami, S., Atwell, E. and Alsalka, A., 2019. Text segmentation using n-grams to annotate Hadith corpus. In Proceedings of the 3rd Workshop on Arabic Corpus Linguistics, 31–39.

• Alukah.net. (2016). مخطوطة كتاب اللآلي المصنوعة في الأحاديث الموضوعة. [online] Available at https://www.alukah.net/manu/files/manuscript_5520/makhtot.pdf [Accessed 21 Nov. 2019].

• Ibn al-Salah. (1236). Muqaddimah Ibn al-Salah ‘Introduction to the Science of Hadith’. pp. 193–195, Dar al-Ma’arif, Cairo.

• Kennedy, G. (1998). An introduction to corpus linguistics. London & New York: Longman.

• Tarmom, T., Teahan, W., Atwell, E. and Alsalka, M. (2018). Compression vs traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case study. Submitted to Journal of Natural Language Engineering.

• Zerrouki, T. (2010). Pyarabic, an Arabic language library for Python. Available at https://pypi.python.org/pypi/pyarabic/.


Refbacks

  • There are currently no refbacks.