Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h3shape_to_cells_experimental takes string containment modes #436

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

ajfriend
Copy link
Contributor

@ajfriend ajfriend commented Dec 31, 2024

  • Just use string containment modes for h3shape_to_cells_experimental
  • Drop polygon_to_cells_experimental alias
  • Add polygon_to_cells and h3shape_to_cells_experimental to docs
  • Avoid repeating tests when checking coverage

@ajfriend ajfriend changed the title polygon to cells just takes string arguments h3shape_to_cells_experimental takes string containment modes Dec 31, 2024
Comment on lines 573 to 578
containment_modes = {
'center': 0,
'full': 1,
'overlap': 2,
'bbox_overlap': 3,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on names? I took a stab here to shorten things, but we should also align with upstream. I could go either way with bbox_overlap vs overlap_bbox, but this way each containment mode has a distinct first letter and should be less likely to accidentally select overlap_bbox when going for overlap.

@ajfriend
Copy link
Contributor Author

Aside: The test configuration is getting a little wonky with things like:

test:
	./env/bin/pytest tests/test_lib --cov=h3 --cov=tests/test_lib --cov-fail-under=100

	./env/bin/pip install cython
	./env/bin/cythonize tests/test_cython/cython_example.pyx
	./env/bin/pytest tests/test_cython --cov=tests/test_cython

It will probably be easy for the makefile and the github actions files to get out of sync. Hopefully we can get back to a single test command when we figure out the Cython compilation issue and can have everything at 100% coverage.

@jmealo
Copy link

jmealo commented Dec 31, 2024

Looks good so far.

I'm reviewing this on mobile, so I likely missed it...

Are we checking that the input geometry is valid? Invalid geometry could lead to invalid memory access (unless h3 fixed this recently).

@ajfriend
Copy link
Contributor Author

Are we checking that the input geometry is valid? Invalid geometry could lead to invalid memory access (unless h3 fixed this recently).

I don't think we currently do a ton of validation, but it it something we could consider adding more of in the future.

The H3 C library does some, and can halt and return an error code, but I'm not sure of the extent of validation it does. Maybe @nrabinowitz or @isaacbrodsky would be more familiar?

@jmealo, are you aware of any good test cases? Or maybe examples of other libraries that do the kind of validation you're thinking of?

@jmealo
Copy link

jmealo commented Dec 31, 2024

It was my understanding that this specific function was marked experimental in the h3 library because it failed fuzzing/valgrind-type tests.

In PostGIS we're going to be calling either ST_IsValid or ST_MakeValid to make it safe to call. In my testing, the overhead was extremely negligible.

For the memory safety and avoiding undefined behavior benefits alone it seems well worth it.

IIRC that function comes from GDAL. If the validation happens in Python, I would consider adding a boolean flag skipValidation that defaults to false.

This would be safe for folks who don't RTFM and for folks who do, provided they validate geometry elsewhere, no additional performance overhead.

I would add an unsafe_ prefix or explicitly name the function and input parameters to make it abundantly clear to only pass previously validated geometry or use a boolean flag to skip validation that defaults to false.

Edited: Clarified thoughts on safety versus performance.

@isaacbrodsky
Copy link
Collaborator

It was my understanding that this specific function was marked experimental in the h3 library because it failed fuzzing/valgrind-type tests.

The fuzzer issue related to underestimating the size of the output buffer with maxPolygonToCellsExperimentalSize. The function was made safe by passing in the size of the output buffer. This allows the function to detect that it would write past the end of buffer and return an error if so. There should not be any outstanding fuzzer or valgrind failures on this function. (There may be a few timeouts.)

@jmealo
Copy link

jmealo commented Dec 31, 2024 via email

@@ -596,13 +603,6 @@ def h3shape_to_cells_experimental(h3shape, res, flags=0):
return _out_collection(mv)


def polygon_to_cells_experimental(h3shape, res, flags=0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the name of the function in the C library and on the docs site. Why is the alias being removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this previously, and the preferred function name in h3-py is h3shape_to_cells since polygon doesn't account for multipolygons. We added the alias polygon_to_cells() in h3-py just as a pointer for folks to discover the preferred function.

As this function is experimental, and we already have the pointer, I think adding an additional alias is redundant. Also, we'll get back to the status quo when we merge the experimental function into the primary function.

@@ -221,8 +221,8 @@ def polygon_to_cells_experimental(outer, int res, int flags, holes=None):
A ring given by a sequence of lat/lng pairs.
res : int
The resolution of the output hexagons
flags : int
Polygon to cells flags, such as containment mode.
flag : int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem like an idiomatic change to me. While containment modes cannot be combined, I think the intention was to expand flags to cover other cases such as spherical vs Cartesian.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cython function are internal and not part of the externally supported API, so we're free to change this in the future without officially breaking anything.

flags = ContainmentMode[flags]
except KeyError as e:
raise ValueError('Unrecognized flags: ' + flags) from e
if isinstance(flags, ContainmentMode):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would prefer to keep the enum option in addition to the stringly-typed form.

Would typing pick up the available string options and offer them as suggestions? I don't see them in type annotations right now.

Copy link
Contributor Author

@ajfriend ajfriend Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think typing in the future is the more natural way to handle this.

We don't currently have types, but they could be added. We don't use them in the area/length functions, for example, to specify the output units. Thus, the current form is a smaller change that's more consistent with the existing library design, and doesn't preclude adding typing or enums in the future.

Would typing pick up the available string options and offer them as suggestions? I don't see them in type annotations right now.

Yes, that's my understanding, if we were to add the type annotations.

Copy link
Contributor Author

@ajfriend ajfriend Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually, if you're interested in adding support for IDE tooling like autocomplete/suggestions, I think we should do that in a separate issue or PR. That's outside the scope of this PR (or #432), as it affects the library as a whole.

And I'd definitely welcome that discussion, as I think it is needed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what? The function is experimental, so let's just try it and see how it works. I do still think we should separately take a look at typing in the library holistically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants