A dataset of 2095 plain text articles of 5 categories with over 805k words in total.
-
Updated
Jan 30, 2018
A dataset of 2095 plain text articles of 5 categories with over 805k words in total.
Turkish writings dataset that promotes creativity, content, composition, grammar, spelling and punctuation.
Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)
The E2E Dataset, packed as a PyTorch DataSet subclass
datasets with text data for use in NLP, Text analysis, information extraction, ML research.
News classification
Library for generation of russian names
10 languages are classified using the stopwords included in the nltk library.
Reading the data from OPIEC - an Open Information Extraction corpus
Sentiment analysis on nltk movie reviews data set using Naive Bayes Classifier achieving more than 93% accuracy
Webcrawler for Turkish news.
Creating a NLP Pipeline to 'Clean' Movie Reviews Data and writing cleaned data to output file
Biomolecular events mined by Reach from PubMed Central
Extracts Transcript and Summary (Abstractive and Extractive) from the AMI Meeting Corpus
Add a description, image, and links to the nlp-datasets topic page so that developers can more easily learn about it.
To associate your repository with the nlp-datasets topic, visit your repo's landing page and select "manage topics."