Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating geojson files on run command #184

Closed
wants to merge 13 commits into from

Conversation

swaradgat19
Copy link
Contributor

@swaradgat19 swaradgat19 commented Aug 14, 2023

fixes #181

@swaradgat19
Copy link
Contributor Author

I'm currently updating the tests. Will push them as and when I resolve them

@swaradgat19
Copy link
Contributor Author

@kaczmarj Since we've updated the result directories (model-outputs-csv/geojson), we will have to update the tests such that we assert that csv and json files be stored in tmp_path/model-outputs-csv and tmp_path/model-outputs-geojson directories right? Just trying to get an intuition so that I can modify the tests accordingly.

@swaradgat19
Copy link
Contributor Author

Updated the tests. All are passing except one. In the test test_issue_97, when we are running the command again using runner.invoke, it fails because the output directory already exists (for geojson). Perhaps we can let it generate the resulting geojson directory again? Or should I handle it in the test itself?

@kaczmarj
Copy link
Member

Since we've updated the result directories (model-outputs-csv/geojson), we will have to update the tests such that we assert that csv and json files be stored in tmp_path/model-outputs-csv and tmp_path/model-outputs-geojson directories right?

Yes that is correct.

it fails because the output directory already exists (for geojson).

i don't see this error in the github actions logs. what is the traceback?

@swaradgat19
Copy link
Contributor Author

swaradgat19 commented Aug 15, 2023

It was getting raised because we are checking whether the output directory exists or not. If it was, we were raising the FileExistsError (instead of the Click.Exceptions I believe).

def parallelize_geojson(csvs: list, results_dir: Path, num_workers: int) -> None:
    output = results_dir / "model-outputs-geojson"

    if not results_dir.exists():
        raise FileExistsError(f"results_dir does not exist: {results_dir}")
    if output.exists():
        # raise FileExistsError("Output directory already exists.")
        shutil.rmtree(f"{output}")
# rest of the code

To handle that, I'm just deleting the directory if it already exists(using shutil) and then it is getting created again below. I'm doing this so that the test passes, although we would want to change this.

@kaczmarj
Copy link
Member

i see. so what we do for model outputs typically is skip any slides that already have model ouptut CSVs that exist. we should implement the same behavior for the geojson conversion.

so in the list of csvs to be converted, we should remove any that already exist as geojson. so existing geojsons will not be touched.

@swaradgat19
Copy link
Contributor Author

Got it. I'll make the changes

wsinfer/write_geojson.py Outdated Show resolved Hide resolved
wsinfer/write_geojson.py Outdated Show resolved Hide resolved
wsinfer/write_geojson.py Outdated Show resolved Hide resolved
wsinfer/write_geojson.py Outdated Show resolved Hide resolved
wsinfer/write_geojson.py Outdated Show resolved Hide resolved
wsinfer/write_geojson.py Outdated Show resolved Hide resolved
@swaradgat19
Copy link
Contributor Author

swaradgat19 commented Aug 16, 2023

@kaczmarj Not entirely sure why the pytorch-nightly test is failing. Might be an issue with slide_path perhaps?

@kaczmarj
Copy link
Member

i think there are two issues.

  1. the style tests are failing. to fix that, run isort and black on the code to format the code.
  2. to fix the pytorch nightly test, i think we need to check that a certain variable is not None.

page0 = series0[0]

add the following between lines 225 and 226

if page0 is None:
    raise CannotReadSpacing()

@swaradgat19
Copy link
Contributor Author

swaradgat19 commented Aug 17, 2023

Tried a try-except too. Didn't work

@kaczmarj
Copy link
Member

i will take a look at this. it could be that something in the tifffile has changed slightly

@kaczmarj
Copy link
Member

i'll review this pr soon. in the meantime, can you please merge the main branch into your branch? i made a few fixes in #188 . you will also have to resolve a merge conflict with wsinfer/wsi.py.

Copy link
Member

@kaczmarj kaczmarj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i left a few change requests. thanks for working on this @swaradgat19

import uuid
from functools import partial
from pathlib import Path

import click
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

click isn't used here so we can remove

Suggested change
import click

def convert(input: str | Path, output: str | Path) -> None:
df = pd.read_csv(input)
def make_geojson(csv: Path, results_dir: Path) -> None:
filename = csv.stem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit pick but could you rename this to slide_id ?

@@ -374,3 +375,6 @@ def run(
json.dump(run_metadata, f, indent=2)

click.secho("Finished.", fg="green")

csvs = list((results_dir / "model-outputs-csv").glob("*.csv"))
write_geojsons(csvs, results_dir, num_workers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the geojson writing should happen in the fille run_inference.py -- see

slide_df.to_csv(slide_csv, index=False)
for where the CSVs are written. The GeoJSON conversion can happen right after that.


output.mkdir(exist_ok=False)
# Makes a list of filenames for both geojsons and csvs
geojson_filenames = [filename.stem for filename in geojsons]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nit.. instead of using filenames in the variable name, could you use slide_ids ? it will make it more obvious that these are slide IDs.

@swaradgat19
Copy link
Contributor Author

i left a few change requests. thanks for working on this @swaradgat19

Sure @kaczmarj ! I was actually trying to merge the main into my main branch, but ran into issues ( Github isn't allowing me to sync my forked repo because I'm 1 commit behind and 13 commits ahead of SBU-BMI/wsinfer). I've created a new branch fix/geojson_command with #188 included. Should I open a new PR with that branch?

@kaczmarj
Copy link
Member

that's fine, let's continue the discussion in #191 . in the future, you can fix this sort of "merge conflict" on the command line. here are some docs that should help https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line

closing because this is replaced by #191

@kaczmarj kaczmarj closed this Sep 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

always create geojson outputs with 'wsinfer run'
2 participants