Social media is bustling with ever growing cases of trolling, aggression and hate. A huge amount of data is generated each day which is insurmountable for manual inspection.
In this work, we propose an efficient and fast pipeline to detect aggression and misogyny in social media texts. We use data from the Second Workshop on Trolling, Aggression and Cyber Bullying for our task.
We employ a BERT based pipeline to augment our data. Next we employ Tf-Idf and XGBoost based pipeline for detecting aggression and misogyny.
Our model achieves 0.73 and 0.85 (both Weighted F1 Score) on the 2 prediction tasks, which ranks very close to the state of the art.
However, the training time, model size and resource requirements are drastically reduced compared to state of the art models, making our proposed pipeline useful for fast inference. We describe the pipeline, examine the results and conduct error analysis to understand the shortcomings of our model.
- Create a virtual environment. See here
- Clone the repository See here
- Navigate to the cloned repository
- Install requirements as
pip install -r requirements.txt
- Navigate to
/core
directory and set it as your current working directory - run
bash ./run.sh
for train, validation and inference
Team Name(Cited in paper) | Score Sub Task A | Score Sub Task B |
---|---|---|
Julian | 0.802 | 0.851 |
abaruah | 0.728 | 0.870 |
sdhanshu | 0.759 | 0.857 |
Our Model | 0.735 | 0.852 |
Task A (Aggression Detection) Confusion Matrix
Classes are (left to right and top to bottom)
- OAG (Overtly Aggressive): Explicitly Aggressive Terms
- CAG (Covertly Aggressive): Covertly Aggressive Terms like sarcasm
- NAG (Non Aggressive): Non Aggressive texts
Task B (Misogyny Detection) Confusion Matrix
Classes are (left to right and top to bottom)
- NGEN: Neutral Texts
- GEN: Contains misogynistic connotations
assets
- Images for reportcore
- Code related to training and testing after augmentationinput
- Data Input. Contains train, test and gold datamodels
- Serialized Model Filesnotebooks
- Notebooks done in Google Colab.notebooks/Data_Augmentation_Aggression_Detection.ipynb
contains detailed code regarding the augmentation processreports
- .tex filestest_results
- Test CSV file