This repository hosts a notebook featuring an in-depth analysis of several RNN models, together with CNN and Multinomial Naive Bayes along with an app deployment using Streamlit. The following models were meticulously evaluated:
- Basic Keras Model
- LSTM Model
- LSTM GRU Model
- LSTM Bidirectional Model
- TextVectorization + Keras Embedding
- Text_to_word_sequence + Word2Vec Embedding
- Basic CNN Model
The dataset used has been downloaded from Kaggle and contains a set of Fake and Real News.
The app can be tested following this link.
In the first stage, a set of helper functions was created in order to easily visualize the data analysis and modelling results
- Plot WordCLoud: Generate a word cloud for a specific label value and display it in a subplot
- Plot Confusion Matrix: Plot a confusion matrix to visualize classification results
- ## Plot Precision/Recall Results: Calculates model accuracy, precision, recall, and F1-score of a binary classification model and returns the results as a DataFrame
The first approach was to analyze the Dataset and the labels and subjects distribution
The first approach was to train 2 Pytorch EfficientNet models (EffNetB0, EffNetB2) with 5 and 10 epochs using the pretrained model weights of EffNetB0 for the DataLoaders in order to stablish a baseline. The EffNetB2 with 10 epochs showed the best performance above 93% on the test set.
Then the EffNetB2 with 10 epochs was trained again but this time using the pretrained model weights of EffNetB2 for the DataLoaders. This time an accuracy above 95% on the test set and above 93% on the validation set was achieved .
The last step was to deploy and app hosted in Hugging Face using Gradio. This app can be tested with available sample images or with own ones.