Using an Islamic Question and Answer Knowledge Base to answer questions about the holy Quran

Bothaina Hamoud, Eric Atwell

Abstract


This paper presents the QAEQAS Quranic Arabic/English Question Answering System, which relies on a specialized search dataset corpus, and data redundancy. Our corpus is composed of questions along with their answers. The questions are phrased in many different ways in differing contexts to optimize Question Answering (QA) performance. As a complete question answering solution, the Python NLTK natural language toolkit has been used to process the user question as well as to implement the search engine to retrieve candidate results and then extract the best answer. The system takes and accepts a Natural Language (NL) question in English or Arabic from the user - through a GUI - as an input, then matches this question with the knowledge base questions, and then returns the corresponding answer. A keyword based search was used. First the user question was tokenized to get the keywords, and then the stop words were removed. The remaining keywords were used for searching the corpus looking for matched questions. After that, the system used scoring and ranking to find the best matched question and then return the corresponding answer for this question. QAEQAS deals with a wide range of question types including facts, definitions. It produces both short and long answers with a precision of 79% and a recall of 76 for Arabic version; and a precision of 75% and a recall of 73% for English version.


Full Text:

PDF

References


• Abdelnasser, H., Mohamed, R., Ragab, M., Mohamed, A., Farouk, B., El-Makky, N., & Torki, M. (2014). Al-Bayan: an Arabic question answering system for the holy Quran. Proceedings of the EMNLP 2014 Workshop on Arabic Natural Langauge Processing (ANLP), pages 57–64, Doja, Qatar.

• Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. O'Reilly Media, Inc.‏

• Comma Separated Values (CSV) Standard File Format. http://edoceo.com/utilitas/csv-file-format

• Gusmita, R. H., Durachman, Y., Harun, S., Firmansyah, A. F., Sukmana, H. T., & Suhaimi, A. (2014). A rule-based question answering system on relevant documents of Indonesian Quran Translation. Proceedings of International Conference on Cyber and IT Service Management (CITSM). pp. 104-107.

• Hamoud, B., & Atwell, E. (2016). Quran question and answer corpus for data mining with WEKA. In the Conference of Basic Sciences and Engineering Studies (SGCAC). pp. 211-216. IEEE.

• Kanaan, G., Hammouri, A., Al-Shalabi, R., & Swalha, M. (2009). A new question answering system for the Arabic language. American Journal of Applied Sciences, 6(4), pp.797-805.‏

• Khan, H. U., Saqlain, S. M., Shoaib, M., & Sher, M. (2013). Ontology based semantic search in Holy Quran. International Journal of Future Computer and Communication, 2(6), pp570-575.

• Lundh, F. (1999). An introduction to tkinter.

• http://www. pythonware.com/library/tkinter/introduction/index.htm.‏

• Perkins, J. (2014). Python 3 Text Processing with NLTK 3 Cookbook. Packt Publishing Ltd.‏

• Pujar, S., Priyaa, B., & Sethia, K. (2015). Distributed QA System. Research Report, New York University, USA.‏

• Shawar, B. A., & Atwell, E. (2009). Arabic question-answering via instance based learning from an FAQ corpus. In Proceedings of International Conference on Corpus Linguistics. Lancaster University, UK.‏

• Sherkat, E., & Farhoodi, M. (2014). A hybrid approach for question classification in Persian automatic question answering systems. In the 4th International eConference on Computer and Knowledge Engineering ICCKE. pp. 279-284.

• Ullman, J. D., Leskovec, J., & Rajaraman, A. (2011). Mining of Massive Datasets

• Yunus, M. A., Zainuddin, R., & Abdullah, N. (2010). Semantic query for quran documents results. In IEEE Conference on Open Systems (ICOS) pp. 1-5. IEEE.

• Zhenqiu, L. (2012). Design of automatic question answering system base on CBR. Procedia Engineering, 29, 981-985.

• Zerrouki, T., Alhawait, K., & Balla, A. (2014). Autocorrection Of Arabic Common Errors For Large Text Corpus. Proceedings of the EMNLP 2014 Workshop on Arabic Natural Langauge Processing (ANLP), pages 127–131, Doha, Qatar.


Refbacks

  • There are currently no refbacks.