Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing an option for the user to decide on simplifying GADM shapes #1138

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

SermishaNarayana
Copy link

Closes # (if applicable).

Changes proposed in this Pull Request

Checklist

  • I consent to the release of this PR's code under the AGPLv3 license and non-code contributions under CC0-1.0 and CC-BY-4.0.
  • I tested my contribution locally and it seems to work fine.
  • Code and workflow changes are sufficiently documented.
  • Newly introduced dependencies are added to envs/environment.yaml and doc/requirements.txt.
  • Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
  • Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
  • Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

@SermishaNarayana
Copy link
Author

Screenshot 2024-10-10 at 6 03 39 PM
@ekatef Here, I have plotted the difference in the shape files with and without simplifying the GADM shapes for the US. The differences lie mostly in the consideration of some small islanded lands and in the borders of the US states

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution :D added a comment, please also add a release note.
We are very close I believe :D

@@ -106,6 +106,7 @@ cluster_options:

build_shape_options:
gadm_layer_id: 1 # GADM level area used for the gadm_shapes. Codes are country-dependent but roughly: 0: country, 1: region/county-like, 2: municipality-like
simplify_gadm: false # When true, shape polygons are simplified else no
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @SermishaNarayana :D
What about we make this option numeric? Like we can rename it as simplify_tolerance, that by default it is 0.01 (current default value) and if the value is False or <=0, then the simplification does not occur?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davide-f Yes, I am planning to add it and it is already in progress. I was trying to understand the reason for the explosion of the regions_onshore.geojson file without simplification in the meantime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SermishaNarayana @davide-f thanks a lot for taking care about that!

I'm afraid that it may not work with having a numerical option to simplify due to some _simplify_polys( ) function itself...

@jome1 did a great investigation on behaviour of _simplify_polys( ) which has demonstrated that all the polygons are being simplified independently of each other. That can lead to emerging a number of "enclaves" across the border of the regions.

I suspect explosion of the polygons you observed can be related to that: once we call _simplify_polys( ) it results in emerging of large amount of such enclaves because the geometry is quite complex. The good news is that the issue is likely to be resolved with the next release of shapely which should contain an improved simplification algorithm

So, I'd probably leave a boolean flag for now and returned to the idea to add a numeric parameter, once an advanced simplification option will be available in shapely. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekatef

If I understand it right, the problem you mention occurs when simplifying the polygon and using the _simplify_polys() function in build_shapes . But in the current observation of the explosion of regions_onshore.geojson filesize, the issue occurs when we have the GADM simplification turned off. The code in this case skips _simplify_polys(). Is the issue somehow still related then?

Please correct me if I understood it wrong

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SermishaNarayana you are absolutely right, that is _simplify_polys( ) which creates issues in the #1051

Have misunderstood that you are observing the file expansion in case _simplify_polys( ) is bypassed. Not sure if that's really related to #1051, then. Thank you so much for the explanation!

Just to be sure that I get the problem: you are also observing some strange geometry effects, right? If that is the case could you please post a picture there? Would be great to understand what is going on with geometries

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure but it might be indeed related to #1051. I explain it myself, that due to the holes and overlaps in the fundamental/basis shapes, the clustering has some erros. The clustering happens on the basis of the shapes of the US states in this case, so the shapes of the US states itself should be flawless to receive flawless clusters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone.
Indeed this is weird and the combination of multiple issues:

  1. the tolerance used for the gadm (when 0 or negative or False, no simplification is applied)
  2. the algorithm in simplify polys that sometimes may lead to "normal" overlaps though not wanted
  3. the same algorithm that sometimes lead to weird clustered shapes

In this PR we probably shall focus on 1 while tracking the other issues in the other PRs, otherwise it feels a bit overwhelming and hopefully along the way shapely improves the algorithm too.

I'd propose to use a numeric (or false) value instead of a pure boolean as it is an inexpensive feature that only improves the existing also giving quite flexibility.
With the boolean we only allow (a) enable the feature with predefined tolerance or (b) disabling the feature entirely that is a bit extreme.
With the number, the tolerance itself can be calibrated.

The tolerance has also implications into the geospatial comparison of subsequent rules, therefore the higher the simplification the better the performances, at the cost of accuracy though.
That's why customizing the number may be handy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the picture, the geometry of the region "US.26_1_AC" which refers to the state of Florida as tagged in the regions_onshore_elec_s_50.geojson file has been plotted. But the geometry of the state is projected to cover many more states of the US

@SermishaNarayana could you share the geojson?
Do I understand well that the gadm file of florida contains the broader US? What is the large black shape you are showing? I understand the borders but not much the large states though

The original build_shapes should not overlap theoretically, but in practise it may be.

having slight overlap across boundaries may not lead to issues; alteratively, we can ensure no overlap in build_shapes with a heuristic, but not sure that is needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davide-f

I have attached the regions_onshore_elec_s_50.geojson file when simplifying the GADM shapes with a tolerance of 0.01 (default)
regions_onshore_elec_s_50_simplify_gadm.geojson.zip

The big black shape in the previous comment shows the geometric shape of the state of Florida as classified in this particular geojson file

Copy link
Member

@davide-f davide-f Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SermishaNarayana can you share also the gadm_shapes and the whole shapes folder? simplify_polys apply on gadm_shapes first and with alternative_clustering I expected that to be used.
The output of bus_regions is regions_onshore; then other rules apply and they edit them further.
You may have found another bug later on down the chain, but it is very likely that it is not linked to this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants