Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Images		Images
src		src
README.md		README.md

Repository files navigation

ArabicTextCleaner

Do What?

Tokenization: Breaking down Arabic text into individual words or tokens.
Normalization: Standardizing text by converting characters to their base forms Additionally, the removal of non-Arabic characters.

from to

أ-إ-آ ا

ى ي

ة ه

الْعَرَبِيَّة العربية

العـــربية العربية
Stop Word Removal: Eliminating common and less informative words like articles and conjunctions.(You can also review the text file containing stop words and make modifications as needed in the src file)
Stemming: Reducing words to their root forms to enhance text analysis and information retrieval. (التجذيع)

These preprocessing steps are essential for enhancing the quality and usability of Arabic text data in various NLP and machine learning tasks.

Input

Outout

Feel free to use and contribute to this repository to advance our Arabic text processing projects. ')