Faten Khalfallah Hammouda*, Abdelsalam Abdelhamid Almarimi**
*Faculty of Economy and Management, Sfax, Tunisia
** Higher Institute of Electronics BaniWalid, Libya
Abstract
This paper proposes a system based on a heuristic lemmatization for Arabic text indexation and classification. This system is not related to any linguistic rule. The proposed method is limited to five different domains: sports, medicine, politics, economics, and agriculture. The main idea is collecting different texts that related to the chosen domains and studying them by extracting the pertinent terms. Every entered text will have the formatting stage in which we can remove some words and letters that do not have any importance for the meaning. After that, the frequencies’ average is calculated to classify the text and its related domain.
Keywords: Natural Language Processing, Indexation, and Classification.