Experiments with word2vec embeddings for synonyms detection, for the Romanian language.
-
Updated
Sep 10, 2023 - Python
Experiments with word2vec embeddings for synonyms detection, for the Romanian language.
I tried to figure out positive and negative comments on my Youtube videos. So, I used NLP to analyze comments. I set the main language as Korean, but you can try setting English as the main language.
Python program for detecting unintentional bilingual and translation instances in NLP datasets.
Sentiment Analysis on Product Reviews ( Project Associated with Zummit Infolabs ).
My project storage in NLP
Webcrawler for Turkish news.
Official repository for "Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning".
a novel Romanian language dataset for offensive message detection with manually annotated comment from a local Romanian news website (stiri de cluj) into five classes
A dataset of 2095 plain text articles of 5 categories with over 805k words in total.
4,308 short stories (4 million words) scraped from https://reddit.com/r/WritingPrompts
Repo for Turkish sentiment analysis dataset, "Vitamins and Supplements Customer Reviews"
Data preprocessing and training on Drug Review Dataset using Hugging Face library
➰Loop through a TSV file and pass columns of data to an external program. A Bash script.
a Python library for managing and annotating text corpuses in different formats.
Classifying a SMS as spam or non-spam using Natural Language Processing (NLP) and Machine Learning
EACL 2021 paper (SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification)
Creating a NLP Pipeline to 'Clean' Movie Reviews Data and writing cleaned data to output file
Add a description, image, and links to the nlp-datasets topic page so that developers can more easily learn about it.
To associate your repository with the nlp-datasets topic, visit your repo's landing page and select "manage topics."