Dissimilarities Detections in Arabic and English Texts

Dissimilarities Detections in Arabic and English Texts

Using n-grams, Histograms and Self Organizing Maps

Scholar's Press ( 2018-01-05 )

€ 67,90

Buy at the MoreBooks! Shop

The main goals of our research is to apply mathematical methods to cover anomalies and discrepancies in texts. English and Arabic texts were analyzed from many statistical characteristics point of view. We covered some basic statistical differences between lengths of used words in both languages and the results were applied in some heuristics for measurements of text parts dissimilarities. In the research we prepared three methods for the analysis of texts: (1) Element n-gram profiles method: The method is based on similarity/dissimilarity occurrences of n-grams in text parts in a comparison to a full text. (2) Histogram method: Histograms of text sequences are analyzed from a cluster point of view. If a cluster dispersion is not large, the text is probably written by the same author. If the cluster dispersion is large, the text is critical and it will be split in two or more parts and the same analysis will be done for the text parts. (3) Neural networks { Systems of Self-Organizing Maps: The systems were trained to input sequences and after the training they determine text parts with anomalies using a cumulative error and some complex analysis.

Book Details:

ISBN-13:

978-620-2-30271-5

ISBN-10:

6202302712

EAN:

9786202302715

Book language:

English

By (author) :

Abdulwahed Almarimi

Number of pages:

128

Published on:

2018-01-05

Category:

Informatics, IT