Boundary and feature formats #35

cooperlab · 2016-02-24T20:01:43Z

Define the formats to be produced by object segmentation and feature extraction.

Boundaries - will need a function to convert a list of 2 x N arrays (x,y) to some format for consumption into Girder.

Features - how to link the feature names to the rows of a K x N array. Would these be consumed into Girder or kept as arrays on disk?

cdeepakroy · 2016-04-13T16:10:08Z

We can use the pandas dataframe for the features.

cdeepakroy · 2016-04-13T16:10:59Z

For boundaries, could we use a label map instead?

cooperlab · 2016-04-13T16:27:54Z

@cdeepakroy Label maps are fine. Great for supporting additional processing but not ingestion friendly. It looks like we are going to have programmatic commits of the boundary polylines so an intermediate ingestion-friendly format is probably not needed.

cooperlab · 2016-04-13T16:30:46Z

Compression will be important to keep the label image sizes down. LZW should work great for this.

cdeepakroy · 2016-04-13T16:41:04Z

For label maps something like run-length encoding will save a lot of space since i would exprect these label maps to be quite sparse in number of non-zero valued pixels.

cooperlab · 2016-04-13T16:46:56Z

They will compress very nicely. Still may want to go with single or uint32 types here for when they are in memory.

cooperlab · 2016-04-15T15:20:04Z

@slee172 - please test performance for reading / writing features to HDF5. Let's get some insights from @cdeepakroy on how the features will stream out of the pipeline. Tiles are processed in parallel - so will each tile generate a unique HDF5? Should we try to aggregate the features from multiple tiles into 1 file per slide?

HDF5 provides internal structure to provide random access, so if we aggregate tiles we should organize the features for efficient random access.

slee172 · 2016-04-15T16:12:10Z

@cooperlab I've got the same issue when writing features to HDF5. For 3 slides, it took about 30 min on python, but it took about less than 1 min on c++. I just used c++ instead of python when writing. @cdeepakroy could you inform me a way of processing features when using HDF5 as @cooperlab mentioned?

cooperlab · 2016-04-15T16:32:51Z

@slee172 that is a significant difference.

Is the interface for writing HDF5 clear, or is it possible we are misusing it and writing 1 record/row at a time?

slee172 · 2016-04-15T16:51:38Z

@cooperlab The python version I used directly writes the features into HDF5s using h5py package, but c++ version uses thread when processing slides. This is one different between them. I would say we can improve the python version when using multithread on it, but we could find another way if there is.

cdeepakroy · 2016-04-15T16:52:10Z

@cooperlab What kind of query operations would be doing mostly?

For classification problem usecases maybe we will need all nuclei in the slide or maybe in a specific region coming from some segmentation (of tissue etc) algorithm
For active learning usecases, maybe we will have to find the nuclei in all tiles seen by the user in the current view
As the user pans, zooms-in-out we might need some kind of caching

Any other common query use cases?

cooperlab · 2016-04-15T16:54:43Z

I think that covers the spatial query cases nicely. It's mostly about 1. serving up or analyzing data relevant to the field of view or 2. tile-based parallelization of analyses that are downstream of feature extraction.

cooperlab assigned slee172 Feb 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boundary and feature formats #35

Boundary and feature formats #35

cooperlab commented Feb 24, 2016

cdeepakroy commented Apr 13, 2016

cdeepakroy commented Apr 13, 2016

cooperlab commented Apr 13, 2016

cooperlab commented Apr 13, 2016

cdeepakroy commented Apr 13, 2016

cooperlab commented Apr 13, 2016

cooperlab commented Apr 15, 2016 •

edited

Loading

slee172 commented Apr 15, 2016

cooperlab commented Apr 15, 2016 •

edited

Loading

slee172 commented Apr 15, 2016

cdeepakroy commented Apr 15, 2016

cooperlab commented Apr 15, 2016

Boundary and feature formats #35

Boundary and feature formats #35

Comments

cooperlab commented Feb 24, 2016

cdeepakroy commented Apr 13, 2016

cdeepakroy commented Apr 13, 2016

cooperlab commented Apr 13, 2016

cooperlab commented Apr 13, 2016

cdeepakroy commented Apr 13, 2016

cooperlab commented Apr 13, 2016

cooperlab commented Apr 15, 2016 • edited Loading

slee172 commented Apr 15, 2016

cooperlab commented Apr 15, 2016 • edited Loading

slee172 commented Apr 15, 2016

cdeepakroy commented Apr 15, 2016

cooperlab commented Apr 15, 2016

cooperlab commented Apr 15, 2016 •

edited

Loading

cooperlab commented Apr 15, 2016 •

edited

Loading