Towards Concept Extraction for Ontologies on Arabic language

Abeer Al-Arfaj, AbdulMalik Al-Salman

Abstract


Ontology is one of the most popular representation model used for knowledge representation, sharing and reusing. The Arabic language has complex morphological, grammatical, and semantic aspects. Due to complexity of Arabic language, automatic Arabic terminology extraction is difficult. In addition, concept extraction from Arabic documents has been challenging research area, because, as opposed to term extraction, concept extraction are more domain related and more selective. Manual concept extraction is time-consuming process and not objective. Automatic concept extraction methods often analyze a document to determine the important domain terms, which can be a single word or multi-word term. In the literature, there are many approaches, techniques and algorithms used for term extraction. In this paper, we deal with fundamental layers involved in ontology construction from Arabic text: extracting the relevant domain terminology from a text and discovering domain concepts. Moreover, we study the problem of Arabic concept extraction from domain texts and provide a comparative review of the existing Arabic term extraction approaches highlighting the challenges posed by Arabic language characteristics. Despite the efforts to combine methods on Arabic term extraction, the field is still open for new development. The paper also proposes a future study to address this issue.


Full Text:

PDF

References


• Al-Arfaj, A. and Al-Salman, A. (2015a). Ontology Construction from Text: Challenges and Trends. International Journal of Artificial Intelligence and Expert Systems (IJAE), 6(2), pp.15-26

• Al-Arfaj, A. and Al-Salman, A. (2015b). Arabic NLP Tools for Ontology Construction from Arabic Text: An Overview. In Proceeding of International Conference on Electrical and Information Technologies, ICEIT'15 March 25-27, 2015 Marrakech, Morocco

• Al-Arfaj, A. and Al-Salman, A. (2014). Towards Ontology Construction from Arabic Texts- A Proposed Framework. In Proceeding of The 14th IEEE International Conference on Computer and Information Technology (CIT 2014), pp. 737-742

• AL-Balushi, H and AB AZIZ, M. (2014). A hybrid Method of Linguistic Approach and statistical method for Nested Noun Compound extraction. Journal of Theoretical & Applied Information Technology, 67(3), pp. 601-608

• Al-Gahtani S, and Black W and Mc-Naught J.(2009). Arabic part-of-speech-tagging using transformation-based learning. In Proceedings of the 2nd International Conference on Arabic Language Resources and Tools, Cairo, Egypt, The MEDAR Consortium, pp:66-70

• Al-Tanni A and Abu-Al-Rub S. (2009). A rule-based approach for tagging nonvocalized Arabic words. The International Arab Journal of Information Technology, 6(3), pp.320-328.

• Attia, M. (2006). An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic Modelling Finite State Networks. In Challenges of arabic for NLP/MT conference, the british computer society, london, UK.

• Attia, M., Toral, A., Tounsi, L., Pecina, P. (2010). Automatic Extraction of Arabic Multiword Expressions. The 7th Conference on Language Resources and Evaluation (LREC).

• Ahmad, K., Gillam, L and Tostevin, L. (1999). Weirdness indexing for logical document extrapolation and retrieval (wilder). In the Eight Text Retrieval Conference (TREC-8).

• Astrakhantsev, N and Turdakov, D. (2013). Automatic Construction and Enrichment of informal ontologies: A survey. Programming and Computer Software, 39(1), pp. 34-42

• Buitelaar, P., Cimiano, P., Magnini, B.(2005). Ontology Learning from Text: An Overview. In Ontology learning from text: methods, evaluation and applications.

• Breuker J, Dieng R, Guarino N, Mantaras RLd, Mizoguchi R, Musen M, editors. Amsterdam, Berlin, Oxford, Tokyo, Washington DC: IOS Press.

• Beseiso, M., Ahmad, A and Ismail, R. (2010). A Survey of Arabic Language Support in Semantic Web. International Journal of Computer Applications. 9(1), 35-40.

• Beseiso, M., Ahmad,A and Ismail,R. (2011). An Arabic language framework for semantic web. In proceeding of International Conference on Semantic Technology and Information Retrieval (STAIR).

• Black, W., Elkateb, S.,Rodriguez, H., Alkhalifa, M., Vossen, P., Pease, A and C. Fellbaum. (2006). Introducing

• the Arabic WordNet Project. In proceedings of the Third International WordNet Conference.

• Boulaknadel, S., Daille, B and Aboutajdine, D. (2008). A multi-word term extraction program for Arabic language. In proceeding of the 6the international conference on Language Resources and Evaluation, Morocco, pp. 1485-1488.

• Bounhas, I. and Slimani, Y. (2009). A hybrid approach for Arabic multi-word term extraction. In Proceedings of the IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE), Dalian, China, pp. 429-436

• Cimiano. P, Volker. J, and Studer. R. (2006). Ontologies on Demand? – A Description of the State-of-the-Art, Applications, Challenges and Trends for Ontology Learning from Text. Information, 57 (6-7), 315-320.

• Cimiano, P. (2006). Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. In Studies in Philosophy and Religion, Springer.

• Crangle, C., Zbyslaw, A., Cherry, M. and Hong, E. L. (2004). Concept Extraction and Synonymy Management for Biomedical Information Retrieval. In Proceedings of the Thirteenth Text REtrievel Conference (TREC 2004).

• Diab, M. (2009). Second Generation AMIRA Tools for Arabic Processing: Fast and Robust Tokenization, POS tagging, and Base Phrase Chunking. In proceeding of second International Conference on Arabic Language Resources and Tools. Eygpt, The MEDAR Consortium, pp. 285–288.

• Diab, M., Hacioglu, K and Jurafsky, D. (2004). Automatic Tagging of Arabic Text: From raw text to Base Phrase Chunks. In proceedings of HLT-NAACL. pp. 149-152

• El-Beltagy, S and Rafea A. (2008). KP-Miner: A Keyphrase Extraction System for English and Arabic Documents. Information systems. 34(1),132-144

• El-shishtawy ,T and Al-sammak, A. (2012). Arabic Keyphrase Extraction using Linguistic knowledge and Machine Learning Techniques. In proceeding of the 2nd International Conference on Arabic Language Resources and Tools.

• El-Mahdaouy, A., Said El Alaoui Ouatik and Gaussier, E. (2013). A study of association measures and their combination for Arabic MWT extraction. In Proceedings 10th International Conference on Terminology and Artificial Intelligence, pp. 45-52

• Farghaly, A. & Shaalan, K. (2009). Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Inform. Process. 8, 4, Article 14 ), 22 pages.

• Frantzi K., Ananiadou S and Mima H. (2000). Automatic Recognition of for Multi-word terms: the C-

o Value/NC-value method. International Journal of Digital Libraries, 3(2), pp. 117-132.

• Harris Z. (1970). Distributional structure. structural and transformational linguistics, pp.775–794

• Hajic, J., Smrz, O., Buckwalter, T and Jin, H. (2005). Feature-Based tagger of approximations of Functional Arabic morphology. In proceeding of the Fourth workshop on Treebanks and linguistic theories (TLT 2005), pp.53-64.

• Jacquemin, C., and Bourigault, D. (2001). Term Extraction and Automatic Indexing. In R. Mitkov, editor, Handbook of Computational Linguistics. Oxford University Press, Oxford.

• Maedche A and Staab S. (2001). Ontology Learning for the Semantic Web. IEEE Intelligent Systems, Special Issue on the Semantic Web, 16(2), 72 –79.

• Mashaan Abed, A., Sabrina Tiun and AlBared, M. (2013). Arabic Term Extraction using Combined Approach on Islamic document. Journal of Theoretical & Applied Information Technology, 58 (3), pp.601-608

• Nizar Habash,N., Rambow, O and Roth, R. (2009). MADA+TOKAN: A Toolkit for Arabic Tokenization, Diacritization, Morphological Disambiguation, POS Tagging, Stemming and Lemmatization. In Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, The MEDAR Consortium.

• Pazienza, M., Pennacchiotti, M., and Zanzotto, F. (2005). Terminology Extraction: An Analysis of Linguistic and Statistical Approaches, Knowledge Mining, ser.: Studies in Fuzziness and Soft Computing, Sirmakessis, S., Ed., Berlin/Heidelberg: Springer, vol. 185, pp. 255– 279.

• Rizoiu, M and Velcin, J. (2011). Topic Extraction for Ontology Learning. Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances, Wong W., Liu W. and Bennamoun M. eds. (Ed.). pp. 38-61.

• Saif, A and Ab Aziz, M. (2011). An Automatic Collocation Extraction from Arabic Corpus. Journal of Computer Science. 7 (1), 6-11.

• Salton, G and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24 (5), 513–523

• Spasic I., Ananiadou S., McNaught J and Kumar A. (2005). Text mining and ontologies in biomedicine: Making sense of raw text. Bioinformatics, 6(3),pp. 239-251

• Zaidi, S., Laskri, M and Abdelali, A. (2010). Arabic collocations extraction using Gate. In Proceeding

• international conference on Machine and Web Intelligence (ICMWI), pp. 473 - 475.

• Zouaq. A and Nkambou, R. (2010). A Survey of Domain Ontology Engineering: Methods and Tools. In Bourdeau & Mizoguchi (Eds): 'Advances in Intelligent Tutoring Systems', Springer, pp.1-20.

• Zouaq, A., Gasevic, D and Hatala, M. (2011). Towards open ontology learning and filtering. Information Systems, 36(7), 1064–1081.


Refbacks

  • There are currently no refbacks.