K-MEANS BASED ALGORITHM FOR ISLAMIC DOCUMENT CLUSTERING
Abstract
Document clustering is an unsupervised learning task. It is a form of data analysis, aims to group a set of objects into subsets or clusters. In this paper, the target domain of clustered documents is Islamic religious domain. The Islamic document clustering is considered as an important task for gaining more effective results with; the traditional information retrieval (IR) systems, organizing web text and text mining. Fast and high-quality document clustering can tremendously facilitate the user to successfully navigate, particularly on the Internet since the number of available online documents is increasing rapidly, everyday. Thus, religious domain has become an interesting and challenging area for Natural Language Processing (NLP). The aim of this paper is to evaluate the efficiency and accuracy of Arabic Islamic document clustering base on K-means algorithm with three similarity/distance measures; Cosine, Jaccard similarity and Euclidean distance. In order to implement the algorithms, we have to pre-process the data (document). The pre-processing steps are necessary in order to eliminate noise and keep only useful information so that we can boost the performance of documents clustering. Additionally, this research investigates the effect of using stemming and without stemming words on the accuracy of Arabic Islamic text clustering. Based on our experiments, we have found that the stemming process than gives better impact than without stemming process, and the K-means with Cosine similarity measure achieves the highest score of performance.
.
Full Text:
PDFReferences
Abualkishik, A. M. & Omar, K. (2009), Quranic Braille System, International Journal of Human and Social Sciences, Vol. 4, No. 8, pp. 600-606.
American Foundation for the Blind. http://www.afb.org Retrieve at April 2013.
Andrews, R. & Haythornthwaite, C. (2007). The SAGE Handbook of E-Learning Research. London: SAGE.
Branson, R. K., Rayner, G. T., Cox, J. L., Furman, J. P., King, F. J., & Hannum, W. H. (1975). Inter-service Procedures for Instructional Systems Development. (5 vols.) (TRADOC Pam 350-30 NAVEDTRA 106A). Ft. Monroe, VA: U.S. Army Training and Doctrine Command, August 1975 (NTIS No. ADA 019 486 through ADA 019 490).
Chang, B.W. & Ungar, D. (1993). Animation: From Cartoons to the User Interface. User Interface Software and Technology, Atlanta, GA, November 3-5, pp. 45-55.
Gagné, Robert M., Wager, Walter W., Golas, Katharine C., & Keller, John M., (2005). Principles of Instructional Design (5TH ed.). USA: Thomson Wadsworth.
Hajarul Bahti Zakaria, Mohd Huzairi Awang @ Husain, Bani Hidayati Mohd Shafie, Nor Hayati Fatmi Talib, & Nabiroh Kassim (2010). Isu dan Cabaran Guru dalam Pendidikan Al-Quran Pelajar Bermasalah Penglihatan, Proceedings of the 4th International Conference on Teacher Education; Join Conference UPI & UPSI Bandung, Indonesia, 8-10 November 2010.
HEFCE. (2005). HEFCE Strategy for E-Learning March 2005/12 Policy Development. Bristol: Higher Education Funding Council for England.
Hudson, S.E & Stasko, J.T. (1993). Animation Support in a User Interface Toolkit: Flexible, Robust, and Reusable Abstractions. Proceedings of the 6th Annual ACM Symposium on User Interface Software and Technology, New York, USA, pp. 57-67.
Official portal of Special Education division, http://www.moe.gov.my/bpkhas/ Retrieve at April 2013.
Reiser, R. A., & Dempsey, J. V. (2012). Trends and Issues in Instructional Design and Technology. Boston: Pearson.
Thomas, B. H. & Calder, P. (2001). Applying cartoon animation techniques to graphical user interfaces, ACM Transactions on Computer-Human Interaction, vol. 8:3, pp. 198-222.
Zamzuri M. A. (2008). Effective Instructional Courseware Design to Improve Students’ Cognitive Skills: A Practical Guide for Educators as Multimedia Author. Proceedings of 2nd International Malaysian Educational Technology Convention Proceeding, Kuantan Malaysia. 245-252.
Zhang, W. & Chen, Y. (2010). Application of Psychology Theory in Multimedia Courseware. International Journal of Psychological Studies, pp. 176-178.
Refbacks
- There are currently no refbacks.