Skip to content

SssiiiSssiii/ArabicTextCleaner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

ArabicTextCleaner

  • Tokenization: Breaking down Arabic text into individual words or tokens.
  • Normalization: Standardizing text by converting characters to their base forms(التجذيع)
  • Stop Word Removal: Eliminating common and less informative words like articles and conjunctions.
  • Stemming: Reducing words to their root forms to enhance text analysis and information retrieval.
  • These preprocessing steps are essential for enhancing the quality and usability of Arabic text data in various NLP and machine learning tasks.
  • Feel free to use and contribute to this repository to advance our Arabic text processing projects.

Input

Alt text

Output

Alt text