Name	Name	Last commit message	Last commit date
Latest commit History 291 Commits
Application	Application
Backend	Backend
Docs	Docs
.gitattributes	.gitattributes
.gitignore	.gitignore
.travis.yml	.travis.yml
LICENSE	LICENSE
README.md	README.md

SecurityWhale

Predicting Security Vulnerabilities using Code Metrics and Machine Learning. This project was created for the University of Central Florida's Computer Science Senior Design class for Spring 2019 to Fall 2019.

Documents detailing the project including our Design Document, Conference Paper, and Final Committee Presentation can be viewed under the "Docs" folder and are available for download.

Project Summary

Code is inherently insecure, with both security vulnerabilities and other faults in production systems decreasing the reliability and performance of code that runs the modern world. Our project connects a world-class front-end Application, driven by a machine learning backend that has been trained on publicly available metadata about the development cycle, in order to allow developers to make accurate predictions about where to find potential code faults within a software package. This will allow developers, both small and large, to be informed on where problematic areas in their software projects are likely to occur. Resolving issues before they are released using this tool will save time and money.

To wit, we utilize cutting-edge machine learning technology for our statistical model, expanding on existing work in the field by a contributor, Dr. Elaine Weyuker. Key to the project is comprehensive data acquisition methods and custom utilities to ensure our model can operate at maximum efficiency. Our product requires a robust front-end to interface with the end-user and communicate results in a meaningful way about how to improve the development cycle. Finally, a flexible backend infrastructure is necessary to reliably deliver the data required at each step, in addition to accommodating shifting requirements at an early stage in the project.

Project Objectives

Predict security vulnerabilities within code using code metrics from public GitHub repos and the project’s file structure
Provide users with informative security analytics regarding their code
Improve upon existing work in the field by utilizing:
- Machine Learning statistical modeling
- Data acquisition using publicly available APIs and file metadata
Determine feasibility of using metrics from development cycle to predict vulnerabilities and faults

Technologies Used

Data Acquisition:

Data Source:
- GitHub
- CVE Database
Programming Language:
- Python (v3.7)
Libraries Used:
- PyGitHub
- GitPython
- OS
- MySQL.Connector
Integrated Development Environment (IDE):
- PyCharm

Backend:

Ubuntu VM hosted by Digital Ocean:
- MySQL Database to connect Data, Modeling roles
  - Local database connections through mysql.connector
- Bootstrap website running on Apache, SSL cert through Let’s Encrypt
- Anaconda - Python Sandboxing

Machine Learning:

Programming Languages:
- Python
Libraries:
- Numpy
- Keras
- Tensorflow
- Scikit Learn

Application:

Programming Languages:
- C#
Libraries:
- LibGit2Sharp
- OctoKit
- LiveCharts
CI/CD:
- Travis CI
Integrated Development Environment (IDE):
- Visual Studio 2019
Testing Framework:
- Microsoft Unit Testing Framework for Managed Code
- Arrange, Act, Assert testing paradigm
- Six unit tests
- Two integration tests

Existing Work

Predicting the Location and Number of Faults in Large Software Systems:
- Thomas J. Ostrand, Elaine J. Weyuker, and Robert M. Bell
Negative binomial regression model that predicts the number of faults in a file
Predictions based on code faults and modification history of previous releases
Applied to 2 large industrial systems at AT&T:
- The top 20% of predicted problematic files contained between 71% and 92% of the faults that were actually detected, with the overall average being 83%
Showed it is possible to predict faults with intensive efforts to map faults with metadata

Authors

Baran Barut - Front-End Application, Testing Plan

Michael Harris - Project Manager, Backend

Curtis Helsel - Machine Learning

Kyle Reid - Data Acquisition

Thomas Serrano - Data Acquisition, Application-Backend Interface

Other Contributors

Dr. Paul Gazzillo - Project Sponsor and Advisor

Dr. Elaine Weyuker - Previous Work and Advising

Dr. Mark Heinrich - Senior Design II Professor and Advising

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SecurityWhale

Project Summary

Project Objectives

Technologies Used

Data Acquisition:

Backend:

Machine Learning:

Application:

Existing Work

Authors

Other Contributors

About

Releases

Packages

Contributors 4

Languages

License

Bonfire/SecurityWhale

Folders and files

Latest commit

History

Repository files navigation

SecurityWhale

Project Summary

Project Objectives

Technologies Used

Data Acquisition:

Backend:

Machine Learning:

Application:

Existing Work

Authors

Other Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages