PhishRod is a web application that leverages machine learning to detect phishing websites. The goal is to become a platform with a big community that fights against the ever growing phishing attacks 💀
Until then, we'll keep Fishing what's Phishing You 🦈
-
Single web application
-
Allows the user to:
-
Identify phishing attacks
-
Give their feedback by writing reviews
-
Visualize statistics about phish attacks tested with PhishRod
-
-
A contact form that gets sent to the Admin email
-
An Admin space to:
-
Review the usage of the application
-
Visualize users’ reviews
-
For building and running the application you need:
We highly recommend to use these exact versions of Python and Django because this project is not tested with the other releases.
This section is optional if you are just looking to use the application since it is already set up with a model, but if you want to tweak on the classification model used then fellow these steps.
The first step to create our classification model PhishRod will be using to identify phishing websites is to construct a labeled dataset to train the model with.
For that we need to have a list of phishing and non phishing websites that we will extract a set of features from.
The details about the features implemented can be found here: Features.md
The set of phishing and non phishing website that were the input to our feature extraction are respectively:
verified_online.json
this is a Json with an array of verified phishing websites from PhishTank.com, a great platform for combating phishing as well, and it exposes its database for developers, you can get it heretop-1m.csv
this is a csv with the top 1 million trusted websites from Alexa, this will serve as our non phishing websites list.
After providing these two files in the data
directory, its just a matter of running our feature extraction scripts:
-
Extract_Features.py
andExtract_Features_Non_Phish.py
to extract the phishing and non phishing dataset respectively. These are highly multi-threaded scripts that follow the Thread pool design. -
DataProcess_non_Phish.py
is an attempt to use processes instead of threads, the results were less efficient than the thread's probably due to the lack of computation power (only tested on 2 cores machine).
To run the extraction scripts simply use these commands:
$ python web_scraping\feature_extraction\Extract_Features.py
$ python web_scraping\feature_extraction\Extract_Features_Non_Phish.py
After the previous step we should have 2 new files under data
called extracted_Non_Phish.csv
and extracted_Phish.csv
that will serve as the input to our model training:
classifier.py
has the pipeline to train and test then dump the model toclassifier.pkl
python object that will be used after to verify the URL entered to PhishRod. It also has a section to cross validate the model and visualise the different aspects of it such as feature importance. So if any changes need to be brought to PhishRod classification model it should live there.
To run the classifier script on have the model, simply run:
$ python classifier/classifier.py
the result should be the model dump at classifier/classifier.pkl
so make sure it exists before moving to the next step.
For this step, we recommend setting up a virtual environment and activating it, this is optional: Python 3 Virtual Environment Tutorial
Install project dependencies:
$ pip install -r requirements.txt
Then simply apply the migrations:
$ python manage.py migrate
You can now run the development server:
$ python manage.py runserver
After finishing the installation PhishRod should now be accessible at localhost.
Tha interface is simple, with one input area to enter the URL and a button to identify whether the website is a phish or not. When scrolling the user will find some statistics around recent tests, and a way to send feedback and contact the Admin.
After Sending a URL the user will have the results shortly after:
-
Phish not detected: a lock animation will appear, and a rating space for the user to send his rating on the results
-
Phish detected: a Bomb animation will appear, and a rating space for the user to send his rating on the results
PhishRod is equipped with an Admin space where an administrator will be able to review any recent activities. To access it, got to localhost/admin