-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
array_t: overload operator()
for multi-dim indexing
#4723
base: master
Are you sure you want to change the base?
Conversation
(FYI: I just clicked the Approve & Run button for GitHub Actions without actually looking at the code. Will look asap.) |
@rwgk i added a test too, which I am building locally now - will push once i have it |
Sounds good, definitely add tests first. After the new tests are place: The code duplication is now 4x, which is not good. It's essentially only one line, but it's non-trivial. I'd try hard to have only 1x, even if the total number of lines of code stays the same (or even if it increases); the quality of implementation is more important.
Question: Is it guaranteed from the context that Why I'm asking: If that's not guaranteed, I'd look into adding something like
Note for completeness: if Not sure: maybe just that calculation in a new helper function will do. Duplicating I'm also wondering about naming. It's unfortunate that the safe alternative ( Finally, the bounds-checking functions could/should call the non-bounds-checking counterparts. Similar idea as before: clean abstractions & the quality of implementation matter most (in the long run). |
Agree on fixing things like having mutable_at call this etc. But wanted to first discuss the choice of overloading
Changing the name defies the purpose of this, because |
Oh, got it. In that case, what do you think about
I think that could stave off a lot of debugging misery, considering that the out-of-bounds access may be triggered by complicated generic code. |
so you are suggesting doing this? template <typename... Ix>
T &operator()(Ix... index)
{
#ifndef NDEBUG
bounds check here
#endif
return *(static_cast<T *>(array::mutable_data())
+ byte_offset(ssize_t(index)...) / itemsize());
} if so, I agree on this |
Yes. |
agreed, let me modify the PR |
I think I misunderstood the existing code (silly in hindsight). This is just checking the number of dimensions:
But that's not what I would think of as a bounds check. That one happens here:
I.e. if we also want to eliminate the bounds check, we cannot use This makes it a slightly more tricky refactoring, but the work you did already is going a good way towards that. I'd use the |
@rwgk my bad for writing a description that was imprecise. My origianl main intent was to add I added checks for the rank (ndim) and bounds and I believe i also added proper tests. |
I could only glance through quickly right now. What I saw looks great. I'll look carefully in the morning. |
CI failure is unrelated (failed to fetch boost) |
@fnrizzi This looks great, thanks for the addition! One important note is that we technically already had this feature, but it was a poorly (IMHO) designed thing involving proxies. See https://github.com/pybind/pybind11/blob/master/docs/advanced/pycpp/numpy.rst#direct-access What I think we should do is "deprecate" those proxies, remove them from the documentation and explicitly mark them as deprecated and encourage people to use this newer, much simpler and easier to understand API. What do you think @rwgk? At the very least, let's remember to add this new feature to the documentation https://github.com/pybind/pybind11/blob/master/docs/advanced/pycpp/numpy.rst once you and @rwgk think it's ready. |
Oh ... I wasn't aware of that TBH. (I was just looking at this PR from a general best-practice point of view. I never actually used numpy.h myself in a serious way.) What I like about the proxy approach is that it makes the potentially dangerous access explicit: I can go and look for Now, it's evidently (this PR) not obvious that the feature exists for someone coming in new. Could we help with that, but keep the established proxy? For example, could this work?
Similarly for I think that would be far better than introducing a second way to get direct access. Because, if there are two ways, both will get used, the "old" way will take a decade+ (no kidding) to get flushed out 90%, and a virtually infinite amount of time to disappear completely. All the while we have to maintain two solutions instead of one, people coming in new will go huh, then find themselves ~forced to use sometimes one sometimes the other depending on the environment, etc. |
I actually used those before but I remember having some issues. Obviously now I don't remember off the top of my head the issues... Btw, the proxies IMMO adds an additional layer that one has to consider. |
Btw I actually would have proposed a multi index subscript operator [] that would be intuitive and also equivalent to python syntax. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2128r6.pdf I think that would probably be my preferred solution but that is also not super common. |
To explain where my thinking is coming from:
With that background, coming back to this PR, I think we need to balance:
vs
If we had a green field, I'd decide against the proxies, for sure. But we do not have a green field. The proxies are established, they have something akin to the first mover advantage. IMO opinion we need a very strong, very convincing argument to deprecate the proxies and accept the churn.
Could you please try to reproduce and explain here? |
Re https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2128r6.pdf Thanks for pointing out, I wasn't aware of this before. I want to share a long-term experience, FWIW. A very long time ago I implemented an "array family" library, with all bells and whistles, what was possible with C++98. What's called "mdspan" in the referenced paper I covered under array "accessors": https://github.com/cctbx/cctbx_project/tree/master/scitbx/array_family/accessors Later it turned out, the sophisticated accessors (flex_grid.h is the most abstract, c_grid.h more special/common) where hardly ever used in C++ (c_grid.h more than others). I think that's because ultimately the only good reason to put up with C++ is to maximize performance. Consequently, almost every single time, I'd find myself unrolling the loops to cut out the abstraction penalties. |
@rwgk I think you are right here. Seems best to avoid churn by keeping the proxies, but adding those helper asserts to guide people towards the proxies if they need that feature. |
I understand all these viewpoints , but I like to point out that the proxies are (1) not intuitive (2) a deviation from any "linear algebra/container like" thing, and (3) add an additional layer making generic code (see my example at top) even more verbose. So it is not just a matter of divergence, but usability too. Speaking as a user of pybind11, I personally prefer having my fork with operator () than using the proxies. |
I'm a believer in numbers and reason. Personal preferences are are a rare luxury for greenfield situations.
Sorry I'm failing to make the connection, maybe because I never wrote original Eigen- or pybind11/numpy.h-based code. The code currently in the PR description doesn't seem to involve pybind11/numpy.h at all. Could you please provide an example (A) with pybind11 as is, and (B) the equivalent with This is to estimate the percentage of code size reduction enabled by |
What about The proxies allow you to get checked or unchecked access, and allow you to select mutable or immutable access, which is nice. Given the official way to interact with ND arrays is now (PS: personally, I'd love some way to iterate over every element of arbitrary ND arrays - basically to allow making custom ufuncts in C++ easy. I recently found myself needing that to implement float -> int conversion that didn't round |
Description
Currently,
array_t
supports direct subscripting via:both of these always check preconditions on the rank (ndim) and bounds, and are methods named with a specific name.
This PR proposes to overload
operator()
:and checking preconditions only in debug mode.
The main advantage of this PR is that it would make it easier to write generic functions like this:
This is especially useful considering the interoperability of pybind11 and Eigen.
Suggested changelog entry: