Skip to content

ML Model and Live Website to detect incomplete cancer pathology reports in real-time.

Notifications You must be signed in to change notification settings

PeruDayani/CRGC-NLP

Repository files navigation

CRGC-NLP

ML Model and Live Website to detect incomplete cancer pathology reports in real-time.

Project Aim

The aim of this project was to create and deploy a machine learning model via a web application to allow pathologists to submit a bladder cancer report and get real time alerts via email if the submitted report was classified as incomplete.

The key features required were:

  1. A NLP machine learning model to parse and classify a pathology report. This was achieved using a custom vectorizer and multi-model architecture to maximize the utilization of the medical data provided.
  2. A live web application to deploy the model as a proof of concept for a viable and scalable solution to provide pathologists access to feedback. This was achieved by developing a web application using Flask and deploying on Heroku.
  3. During EDA and model development, significant keywords were found as key identifiers in the reports that tied well with the CRGC's mission to motivate better structured reports. Therefore, the keywords used are also displayed for each submitted path report.

Project Results

Multi-layer ML model achieved 96% accurancy is classifying bladder cancer path. reports as incomplete. The deployment of the model on the website serves as a viable proof of concept to reduce path report error correction time by 5-6 days and save an estimated of 200-250 lives per year.

Live Website: https://crgc-mvp.herokuapp.com/ (Under development)
Final Deliverable Demo: https://www.youtube.com/embed/gZLGlP98EsA
Final Deliverable Presentation: https://drive.google.com/open?id=1srR26ON6Vu-ygoowqm9AqW7AJcM6NW0Y03QRXd-njM4

Team

The following students worked on this project:

  1. Peru Dayani
    • Developed model architecture and models using python and sckit-learn.
    • Developed custom TF-IDF text vectorizer for medical data.
    • Developed web app using Flask, python, HTML, bootstrap and JS.
    • Deployed web app on Heroku and maintains it.
    • Lead team meetings with CRGC representatives.
  2. Carlos Calderon
    • Compared ML model architectures
  3. Saumya Choudhary
    • Conducted EDA on data
    • Developed data cleaning pipeline

About

ML Model and Live Website to detect incomplete cancer pathology reports in real-time.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published