Back

Dissimilarities Detections in Arabic and English Texts

Using n-grams, Histograms and Self Organizing Maps

Scholar's Press ( 2018-01-05 )

€ 67,90

The main goals of our research is to apply mathematical methods to cover anomalies and discrepancies in texts. English and Arabic texts were analyzed from many statistical characteristics point of view. We covered some basic statistical differences between lengths of used words in both languages and the results were applied in some heuristics for measurements of text parts dissimilarities. In the research we prepared three methods for the analysis of texts: (1) Element n-gram profiles method: The method is based on similarity/dissimilarity occurrences of n-grams in text parts in a comparison to a full text. (2) Histogram method: Histograms of text sequences are analyzed from a cluster point of view. If a cluster dispersion is not large, the text is probably written by the same author. If the cluster dispersion is large, the text is critical and it will be split in two or more parts and the same analysis will be done for the text parts. (3) Neural networks { Systems of Self-Organizing Maps: The systems were trained to input sequences and after the training they determine text parts with anomalies using a cumulative error and some complex analysis.

Book Details:
ISBN-13:	978-620-2-30271-5
ISBN-10:	6202302712
EAN:	9786202302715
Book language:	English
By (author) :	Abdulwahed Almarimi
Number of pages:	128
Published on:	2018-01-05
Category:	Informatics, IT

Dissimilarities Detections in Arabic and English Texts

Using n-grams, Histograms and Self Organizing Maps

Book Details: