Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine learning provider interface #631

Open
wants to merge 53 commits into
base: dev
Choose a base branch
from

Conversation

Rui-Jesus
Copy link
Collaborator

@Rui-Jesus Rui-Jesus commented Dec 16, 2022

Summary

This is a work in progress to implement a new interface for image analysis providers in Dicoogle.
The endgoal is to provide a standard interface, compatible with DICOM to manage image analysis providers and algorithms. This is particularly important in Pathology, given the heterogeneity in the field. Another important consideration with this interface is the possibility to integrate services like Google AI, that exposes AutoML interfaces, allowing users with little IT knowledge to develop their own inference models.

Here is a draft of the intended architecture:
image
The basic idea is to provide a standard interface that developers can use to implement image analysis providers. These providers should allow the creation and management of inference models, i.e, creation of labels, labelled datasets, and inference requests.

All the methods from the providers are exposed via servlets located in Dicoogle core. The API definitions took inspiration from MONAI and the DICOM Standards for image segmentation and overlays.

This PR should be seen as the initial step in supporting AI integrations in Dicoogle. It currently implements the following endpoints/functionalities:

  • Inference requests, for DICOM objects or image annotations in WSI
  • Training requests
  • Cancel training
  • List models

Additionally, the following miscellaneous features were implemented:

  • A new flag "wsi" to indicate whether or not this dicoogle installation supports WSI. This is a temporary bandaid while the problems with supporting both WSI and radiology at the same time are not fixed (there is a conflict in the codecs)
    • When set to true, a image reader for DICOM is registered, needed to work with WSI images
  • A new "roi" endpoint to extract ROIs (regions of interest) from WSIs. It is described in the api documentation.

Future Work

  • Implementation of the remaining endpoints, planned in the MLProviderInterface:
    • Create model endpoint
    • Delete model endpoint
  • Support for other types of data such as CSV
  • Support of segmentation and classification models in the dataset object

Tasks to be implemented in this PR

  • Inference requests
  • Model listing
  • Datastore of labels/annotations (for training)
  • Should be sufficiently flexible to support classification and segmentation tasks.
  • Dataset parameters should be somewhat configurable (patch size, mask or contours, etc.)
  • Caching of images (used to preemptively upload data to a provider to run inference on, needed for example in MONAI
  • Implementation of interface method that returns the implemented endpoints of each provider.
  • Integration of the bulk annotation supplement in annotation definition

Resources

Test plan

  • Test this contribution with a plugin that implements this interface
  • Test the several endpoints and validate the responses
    • Documentation of the new endpoints added can be found in the openapi documentation
    • Validate the documentation too.

@Rui-Jesus Rui-Jesus marked this pull request as draft December 16, 2022 12:39
@Rui-Jesus Rui-Jesus self-assigned this Dec 16, 2022
@Rui-Jesus Rui-Jesus added in progress dicoogle-core dependencies Pull requests that update a dependency file java Pull requests that update Java code labels Dec 16, 2022
@Rui-Jesus Rui-Jesus requested review from bastiao and Enet4 and removed request for bastiao December 16, 2022 14:07
@Rui-Jesus Rui-Jesus added this to In progress in Release Process Dec 16, 2022
@Rui-Jesus Rui-Jesus force-pushed the new/machine-learning-provider branch from e61b751 to 4480519 Compare May 6, 2023 16:02
@bastiao
Copy link
Member

bastiao commented Jul 8, 2023

Please resolve the conflicts in this PR. As we previously discussed, we have reached a significant milestone, and it is now appropriate to review the API definition and proceed with merging this pull request.

@joselfrias
Copy link

Note: dim-providers must be configured in server.xml, otherwise the server fails to launch.

Copy link
Collaborator

@Enet4 Enet4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving some notes for now.

@Rui-Jesus Rui-Jesus force-pushed the new/machine-learning-provider branch from 438808b to ea37d05 Compare July 10, 2023 13:46
@Rui-Jesus Rui-Jesus moved this from In progress to Ready to test in Release Process Jul 19, 2023
@Rui-Jesus Rui-Jesus marked this pull request as ready for review July 19, 2023 11:55
Copy link
Collaborator

@Enet4 Enet4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the code as an initial review iteration. Please look into the comments inline.

@Rui-Jesus Rui-Jesus requested a review from Enet4 July 26, 2023 08:24
@bastiao
Copy link
Member

bastiao commented Jul 29, 2023

Looks good! And of course, it works as we have been using with it internally.
@Rui-Jesus question about MLCSVDataset: is this used by any use case , we are currently handling? I'm wondering if we really have any advantage of calling it to MLCSVDataset or having something more generic.

I will keep diving on the API to hopefully merge this one soon.

@Rui-Jesus
Copy link
Collaborator Author

Looks good! And of course, it works as we have been using with it internally. @Rui-Jesus question about MLCSVDataset: is this used by any use case , we are currently handling? I'm wondering if we really have any advantage of calling it to MLCSVDataset or having something more generic.

I will keep diving on the API to hopefully merge this one soon.

I'm open to any name change, on any of the data objects or even the interface. MLCSVDataset is not being used anywhere currently. I entertained the idea of supporting tabular datasets in this API and thought of using a CSV file for it, for a future where we would need to work with tabular data.

I'm not sure how this could be accomplished directly from viewer applications, but maybe for third party services that can compute tabular data it could be useful. But I do not have any plans to implement it soon.

@Rui-Jesus Rui-Jesus moved this from Ready to test to In progress in Release Process Dec 1, 2023
Rui-Jesus and others added 26 commits June 7, 2024 17:16
…Dataset signature to dataStore. Added new server setting to enable WSI operation. Some code adjustments.
…with DICOM codes. Several code improvements.
Co-authored-by: Eduardo Pinho <enet4mikeenet@gmail.com>
@Rui-Jesus Rui-Jesus force-pushed the new/machine-learning-provider branch from c7aab31 to c86143f Compare June 7, 2024 16:17
@Rui-Jesus Rui-Jesus requested a review from Enet4 June 7, 2024 16:18
@Rui-Jesus
Copy link
Collaborator Author

I think this PR is ready for a code-freeze. There are still missing features, but this is an ongoing work, and the API is likely to change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file dicoogle-core enhancement feedback-request in discussion java Pull requests that update Java code sdk
Projects
Release Process
In progress
Development

Successfully merging this pull request may close these issues.

None yet

4 participants