Skip to content

Phish Rod is a web application that leverages machine learning to detect phishing websites.

Notifications You must be signed in to change notification settings

MontaLabidi/PHISH-Detection-using-ML

Repository files navigation

PhishRod

main_page_720

PhishRod is a web application that leverages machine learning to detect phishing websites. The goal is to become a platform with a big community that fights against the ever growing phishing attacks 💀

Until then, we'll keep Fishing what's Phishing You 🦈

Features

  • Single web application

  • Allows the user to:

    • Identify phishing attacks

    • Give their feedback by writing reviews

    • Visualize statistics about phish attacks tested with PhishRod

  • A contact form that gets sent to the Admin email

  • An Admin space to:

    • Review the usage of the application

    • Visualize users’ reviews

Getting Started

Requirements

For building and running the application you need:

We highly recommend to use these exact versions of Python and Django because this project is not tested with the other releases.

Installation

Classification Model Setup

This section is optional if you are just looking to use the application since it is already set up with a model, but if you want to tweak on the classification model used then fellow these steps.

1- Feature Extraction

The first step to create our classification model PhishRod will be using to identify phishing websites is to construct a labeled dataset to train the model with.

For that we need to have a list of phishing and non phishing websites that we will extract a set of features from.

The details about the features implemented can be found here: Features.md

The set of phishing and non phishing website that were the input to our feature extraction are respectively:

  • verified_online.json this is a Json with an array of verified phishing websites from PhishTank.com, a great platform for combating phishing as well, and it exposes its database for developers, you can get it here
  • top-1m.csv this is a csv with the top 1 million trusted websites from Alexa, this will serve as our non phishing websites list.

After providing these two files in the data directory, its just a matter of running our feature extraction scripts:

  • Extract_Features.py and Extract_Features_Non_Phish.py to extract the phishing and non phishing dataset respectively. These are highly multi-threaded scripts that follow the Thread pool design.

  • DataProcess_non_Phish.py is an attempt to use processes instead of threads, the results were less efficient than the thread's probably due to the lack of computation power (only tested on 2 cores machine).

To run the extraction scripts simply use these commands:

$ python web_scraping\feature_extraction\Extract_Features.py
$ python web_scraping\feature_extraction\Extract_Features_Non_Phish.py

2- Training & Exporting the Model

After the previous step we should have 2 new files under data called extracted_Non_Phish.csv and extracted_Phish.csv that will serve as the input to our model training:

  • classifier.py has the pipeline to train and test then dump the model to classifier.pkl python object that will be used after to verify the URL entered to PhishRod. It also has a section to cross validate the model and visualise the different aspects of it such as feature importance. So if any changes need to be brought to PhishRod classification model it should live there.

To run the classifier script on have the model, simply run:

$ python classifier/classifier.py

the result should be the model dump at classifier/classifier.pkl so make sure it exists before moving to the next step.

Running the application locally

For this step, we recommend setting up a virtual environment and activating it, this is optional: Python 3 Virtual Environment Tutorial

Install project dependencies:

$ pip install -r requirements.txt

Then simply apply the migrations:

$ python manage.py migrate

You can now run the development server:

$ python manage.py runserver

Usage

After finishing the installation PhishRod should now be accessible at localhost.

Tha interface is simple, with one input area to enter the URL and a button to identify whether the website is a phish or not. When scrolling the user will find some statistics around recent tests, and a way to send feedback and contact the Admin.

page_scroll

Identifying a phishing website

After Sending a URL the user will have the results shortly after:

  • Phish not detected: a lock animation will appear, and a rating space for the user to send his rating on the results

    phish_not_detected

  • Phish detected: a Bomb animation will appear, and a rating space for the user to send his rating on the results

    phish_detected

Administration

PhishRod is equipped with an Admin space where an administrator will be able to review any recent activities. To access it, got to localhost/admin

  • A login page will appear where the admin can provide his credentials:

    admin_login

  • After the successful login, the admin will have a simple landing page:

    admin_main_menu

  • The Admin can access the reviews by clicking on the Reviews button:

    admin_reviews

  • The Admin can access the URLs recently entered by clicking on the URL button:

    admin_url

About

Phish Rod is a web application that leverages machine learning to detect phishing websites.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published