Arabic Text Summarization Using Aggregate Similarity
660
ACIT'2025 will be held in Arab Academy for Science, Technology & Maritime Transport, December 16-18, 2025 - Alexandria, Egypt
660
Abstract
This paper proposes an Arabic text summarization approach based on an aggregate similarity method which is originally proposed for the Korean language text. The proposed approach depends mainly on nouns as indicators of the importance of the sentences. Hence, the noun extraction process is the main process in the proposed approach. To do summarization of a given document, the document is segmented into sentences and then the sentences are tokenized into words. The noun extraction process is performed using fourteen noun extraction rules that are used as indication for the distinction of nouns from other non noun words. In the next step the frequencies for each noun in each sentence and in the whole document are computed and the sentence similarity between the noun frequency in the sentence and the document is calculated using the Inner Product measure. The summation of all similarities of every sentence represents an Aggregate similarity; the sentences that have the highest value of similarity are selected as the summary where the number of sentences that are selected is determined by a user defined threshold value. To evaluate the proposed approach, a dataset of fifty documents is used and the performance of the approach is evaluated using the Recall and Precision measures. The results obtained were 62% for Precision, 70 % for Recall, and 14% for the compression rate. As a conclusion, the result is acceptable according to the nature of the Arabic language which has rich vocabulary and complex grammar rules.
Keywords: Arabic Language Processing, Information Retrieval, Text Summarization, Aggregate Similarity.