Features-Extractor

Work inspired by E. Viegas, A. Santin and V. Abreu's paper:
"Enabling Anomaly-based Intrusion Detection Through Model Generalization".

About the project

L'obiettivo è quello di ricreare un IDS (Intrusion Detection System) addestrando un modello di Machine Learning sulla base del traffico ricreato all'interno di un ambiente virtuale. Il traffico generato è difficile da utilizzare per addestrare modelli di machine learning in quanto è dipendente dallo scenario in corso, quindi porterebbe a modelli addestrati per quello specifico scenario. Per risolvere questo problema è necessario trattare il traffico generato in modo da essere indipendente dalla sessione simulata (ambiente virtuale o reale).

Workflow

Si ascolta il traffico generato (HTTP, SMTP, SMNP, SSH) mediante tcpdump;
Si converte il file .dump generato in un file chiamato totaltraffic.c contente array C mediante wireshark;
Si lancia featuresextractor.py;
Alla fine si ottengono 50 features indipendenti dallo scenario ed utilizzabili per l'addestramento del modello.

50 Features + Target:

IP_TYPE
IP_LEN
FR_LENGHT
IP_ID
IP_RESERVED
IP_DF
IP_MF
IP_OFFSET
IP_PROTO
IP_CHECKSUM
UDP_SPORT
UDP_DPORT
UDP_LEN
UDP_CHK
ICMP_TYPE
ICMP_CODE
ICMP_CHK
TCP_SPORT
TCP_DPORT
TCP_SEQ
TCP_ACK
TCP_FFIN
TCP_FSYN
TCP_FRST
TCP_FPUSH
TCP_FACK
TCP_FURG
COUNT_FR_SRC_DST
COUNT_FR_DST_SRC
NUM_BYTES_SRC_DST
NUM_BYTES_DST_SRC
NUM_PUSHED_SRC_DST
NUM_PUSHED_DST_SRC
NUM_SYN_FIN_SRC_DST
NUM_SYN_FIN_DST_SRC
NUM_FIN_SRC_DST
NUM_FIN_DST_SRC
NUM_ACK_SRC_DST
NUM_ACK_DST_SRC
NUM_SYN_SRC_DST
NUM_SYN_DST_SRC
NUM_RST_SRC_DST
NUM_RST_DST_SRC
COUNT_SERV_SRC_DST
COUNT_SERV_DST_SRC
NUM_BYTES_SERV_SRC_DST
NUM_BYTES_SERV_DST_SRC
FIRST_PACKET
FIRST_SERV_PACKET
CONN_STATUS
TYPE

Nota: SRC_DST e DST_SRC indicano il verso della trasmissione che sarà, rispettivamente, inviato e ricevuto.

La variabile target può assumere due valori:

Attack: se il pacchetto proviene dal client;
Normal: se il pacchetto proviene dal server;

Getting started

È necessario creare un ambiente virtuale come quello illustrato in figura:

È possibile usare qualsiasi virtualizzatore l'importante è che le macchine client possano comunicare esclusivamente con il server. Il server è l'unico punto di accesso ad internet e si occupa di fornire connettività ai client.

L'obiettivo è creare un ambiente il più isolato possibile.

Client e server implementano diversi tipi di servizi:

Per realizzare lo scenario descritto sono state utilizzate distribuzioni basate su Debian (ParrotOS e Kali Linux).

Si può utilizzare per ogni client la seguente configurazione in \etc\network\interfaces

#Client1 configuration (dhcp or static)

auto eth0
iface eth0 inet dhcp

#Default gateway
post-up route add default gw 10.0.1.2

Per rendere automatico il processo di acquisizione del traffico i client sono stati sincronzzati con il server seguendo lo schema qui illustrato:

Nota: RUN.py è uno script che si occupa di lanciare uno dei 4 script illustrati e dopo un certo periodo fa partire sia l'attacco LOIC che SYNflood.

Prerequisites

Per la creazione del dataset si sfrutta la libreria nota pandas:
```
pip install pandas
```
Per generare l'attacco HTTPflood è necessario LOIC (scaricabile dal link: https://sourceforge.net/projects/loic/).

Per utilizzarlo è possibile utilizzare mono:

   sudo apt install apt-transport-https dirmngr gnupg ca-certificates
   sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF
   echo "deb https://download.mono-project.com/repo/debian stable-buster main" | sudo tee /etc/apt/sources.list.d/mono-official-stable.list
   sudo apt update
   sudo apt install mono-devel

Dalla cartella LOIC:
```
sudo mono /src/bin/Debug/LOIC.exe
```
Per synflood si utilizzi la suite metasploit

Per utilizzare featuresExtractor è necessario convertire i pacchetti in array C.

È possibile utilizzare Wireshark a tale scopo:

Il file da ottenere deve avere la struttura illustrata, featuresextractor si occuperà di estrarre le informazioni e creare il dataset.

Usage

Al featuresExtractor.py è necessario passare prima la lista degli IP delle interfacce del Server e successivamente quella del Client.
L'ordine è importante.

Clone the repo:

git clone https://github.com/Theviki20110/Features-Extractor.git

Launch the script:

sudo python featuresExtractor.py [IP_ser_int1, IP_ser_int2,...] [IP_client1_int, IP_client2_int,...]

Example

Qui è riportato un esempio per capire come utilizzare featuresextractor.
I file utilizzati (scaricabili da example) devono essere presenti nella stessa cartella in cui viene lanciato lo scritp.

Prima di tutto si estrae il file:
```
tar -xvf example.tar.gz
```

Si lancia lo script:

sudo python featuresExtractor.py [10.0.1.2,10.0.2.2,10.0.3.2] [10.0.1.3,10.0.2.3,10.0.3.3]

License

Distributed under the AGPL-3.0 License. See LICENSE.txt for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Script		Script
example		example
res		res
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features-Extractor

About the project

Workflow

Getting started

Prerequisites

Usage

Example

License

About

Releases

Packages

Languages

License

Theviki20110/Features-Extractor

Folders and files

Latest commit

History

Repository files navigation

Features-Extractor

About the project

Workflow

Getting started

Prerequisites

Usage

Example

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages