-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generating geojson files on run command #184
Conversation
I'm currently updating the tests. Will push them as and when I resolve them |
@kaczmarj Since we've updated the result directories (model-outputs-csv/geojson), we will have to update the tests such that we assert that csv and json files be stored in |
Updated the tests. All are passing except one. In the test |
Yes that is correct.
i don't see this error in the github actions logs. what is the traceback? |
It was getting raised because we are checking whether the output directory exists or not. If it was, we were raising the def parallelize_geojson(csvs: list, results_dir: Path, num_workers: int) -> None:
output = results_dir / "model-outputs-geojson"
if not results_dir.exists():
raise FileExistsError(f"results_dir does not exist: {results_dir}")
if output.exists():
# raise FileExistsError("Output directory already exists.")
shutil.rmtree(f"{output}")
# rest of the code To handle that, I'm just deleting the directory if it already exists(using |
i see. so what we do for model outputs typically is skip any slides that already have model ouptut CSVs that exist. we should implement the same behavior for the geojson conversion. so in the list of |
Got it. I'll make the changes |
@kaczmarj Not entirely sure why the pytorch-nightly test is failing. Might be an issue with |
i think there are two issues.
Line 225 in b05b7ee
add the following between lines 225 and 226 if page0 is None:
raise CannotReadSpacing() |
Tried a |
i will take a look at this. it could be that something in the tifffile has changed slightly |
i'll review this pr soon. in the meantime, can you please merge the main branch into your branch? i made a few fixes in #188 . you will also have to resolve a merge conflict with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i left a few change requests. thanks for working on this @swaradgat19
import uuid | ||
from functools import partial | ||
from pathlib import Path | ||
|
||
import click |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
click isn't used here so we can remove
import click |
def convert(input: str | Path, output: str | Path) -> None: | ||
df = pd.read_csv(input) | ||
def make_geojson(csv: Path, results_dir: Path) -> None: | ||
filename = csv.stem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit pick but could you rename this to slide_id
?
@@ -374,3 +375,6 @@ def run( | |||
json.dump(run_metadata, f, indent=2) | |||
|
|||
click.secho("Finished.", fg="green") | |||
|
|||
csvs = list((results_dir / "model-outputs-csv").glob("*.csv")) | |||
write_geojsons(csvs, results_dir, num_workers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the geojson writing should happen in the fille run_inference.py
-- see
wsinfer/wsinfer/modellib/run_inference.py
Line 194 in 123d7c1
slide_df.to_csv(slide_csv, index=False) |
|
||
output.mkdir(exist_ok=False) | ||
# Makes a list of filenames for both geojsons and csvs | ||
geojson_filenames = [filename.stem for filename in geojsons] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a nit.. instead of using filenames
in the variable name, could you use slide_ids
? it will make it more obvious that these are slide IDs.
Sure @kaczmarj ! I was actually trying to merge the main into my main branch, but ran into issues ( Github isn't allowing me to sync my forked repo because I'm 1 commit behind and 13 commits ahead of SBU-BMI/wsinfer). I've created a new branch |
that's fine, let's continue the discussion in #191 . in the future, you can fix this sort of "merge conflict" on the command line. here are some docs that should help https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line closing because this is replaced by #191 |
fixes #181