Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mechansim for determining whether build-dir is a temporary directory? #979

Open
mabruzzo opened this issue Jan 30, 2025 · 5 comments
Open

Comments

@mabruzzo
Copy link

Does scikit-build-core currently provide a way to determine at build-time (either with the if-conditions or SKLEARN_... CMake Variable) whether or not the build-dir is an automatically defined temporary directory? If not, are there plans to support something like this in the future?

If this isn't something you currently (or have any interest in supporting), it's not a major problem; I can work around it.

I was motivated to ask because I want CMake to infer a sensible default behavior for whether it should both compile some targets used for running tests (but aren't be installed) and record the path to the build-directory within the built package (so the test-suite can find the targets). For concreteness (and in case there's an obvious alternative solution/convention), I have elaborated more on my particular situation in the following collapsible section

Further elaboration on my situation

To elaborate, I co-manage a small project that maintains a C/C++/fortran library and a set of python bindings (the bindings are only built by a subset of user). I recently migrated the respective build-systems of our library & bindings to CMake & scikit-build-core (from bespoke Makefiles & setuptools).

Historically, we've used pytest to run all our tests.

  • A particular set of test-cases didn't migrate gracefully. Each case compiles an example toy-applications (it exist to illustrate the library's API), (ii) run the toy-application, and (iii) confirm that the produced results are consistent with results produced with earlier code-versions.
  • If the build-directory isn't temporary (& a few other conditions are met), I plan to automatically compile the toy-applications during the build-process and encode the path to the build-directory within the resulting python package (the tests would use the path to find the pre-compiled test-applications).

An obvious alternative approach is be to simply maintain separate test-suites for the bindings and the core-library. While we have started to introduce some core-library C++ unit-tests, it is somewhat challenging to transition these particular tests. Moreover, there is a desire among the project developers to easily run all tests at once (with something in the spirit of pytest-cpp). I could only imagine accomplishing something like that with the machinery described here.

If this isn't a feature to be supported, then I'll probably adopt a solution where we always create a persistent build-directory, with a default path configured via the formatting syntax, and have users explicitly opt into building the testing-targets (and recording their location) by setting an env-variable. I just wanted to ask first before pursuing that alternative, (since the lack of a default persistent build-directory is probably the default behavior for a reason and it seems more consistent with CMake)

@mabruzzo mabruzzo changed the title Mechansim for determining whether build-dir is a temporary variable? Mechansim for determining whether build-dir is a temporary directory? Jan 30, 2025
@LecrisUT
Copy link
Collaborator

There are a few topics cramped in this issue.

  • detecting if you are in a temporary build directory. None that I'm aware of. There are also 2 layers of temporary directories: CMake build's and pip build isolation. The former is controlled by scikit-build-core and could be exposed, but the latter is a tricky one I got stuck on also.
  • running test-suites. This depends on how you build it, from scikit-build-core or directly from CMake. If you can design such that the latter is still available then you would have maximum control, otherwise your best option would be to create predictable build directory and manually clean it up.

Other designs you can consider:

  • make the CMake test-suite an independent project that can consume the installed artifacts after pip install
  • add a build target that runs ctest
  • separate the library and python project and tests and run them independently
  • make the python project buildable outside of scikit-build-core environment and run all tests there

@mabruzzo
Copy link
Author

Thanks for the quick and helpful response! I just had a few quick questions:

First, a piece of relevant additional context about my project's build-system:

  • The primary CMake build-system knows nothing about our python bindings (or scikit-build-core). The system only has some build-recipes the core library, our toy-application examples, and a couple of googletest unit-tests. Crucially, our system support consumption of the project in embedded sub-build ( via cmake's add_subdirectory/FetchContent)
  • Our pyproject.toml file redirects to a single a isolated CMakeLists.txt file (stored with our cython & python code) that embeds a subbuild of the core library via add_subdirectory1
    Essentially: the system was designed so that the python bindings & core-library could be trivially split into separate repositories2

  • detecting if you are in a temporary build directory. None that I'm aware of. There are also 2 layers of temporary directories: CMake build's and pip build isolation. The former is controlled by scikit-build-core and could be exposed, but the latter is a tricky one I got stuck on also.

I looked at pip's docs and I just to make sure I understand. pip's build isolation only uses the temporary directory for build-time python-package dependencies, right?

The original plan I outlined doesn't actually need cmake/ctest/ninja after the build-step (the plan would be to precompile all of the necessary binaries and ctest isn't required). Consequently I think my original plan would indeed work if scikit-build-core add the ability to expose whether the CMake build-directory is a temporary directory.

Would the addition of this feature to scikit-build-core be something that you would be receptive to in the future?

I would be happy to take a stab at implementing it. But, I totally understand if this is not a feature that you want to support -- I realize "the stars may align" in a particular way for my project and you might not want to deal with issues that could crop up for other projects. Plus there was a comment in the changelog (related to implementing specifiable build-dirs), in which the tone suggested that the scikit-build-core might use persistent build-directories in the future (in that case, the feature probably doesn't make sense)

  • running test-suites. This depends on how you build it, from scikit-build-core or directly from CMake. If you can design such that the latter is still available then you would have maximum control, otherwise your best option would be to create predictable build directory and manually clean it up.

I'm not sure I totally understand, but I what I described above that is consistent with building it from CMake.

  • make the CMake test-suite an independent project that can consume the installed artifacts after pip install

I think this probably doesn't work for us. I think this would involve us needing to package headers and the CMake package-config information with the package. And I don't really want to go down this path.3

  • make the python project buildable outside of scikit-build-core environment and run all tests there

Out of curiosity, do you mean building the python-package within the CMake project (and using PYTHONPATH to use the module when it isn't as a wheel by scikit-build-core?)

Footnotes

  1. It also supports linking against already-installed copies of the core-library (via find_package), but that's beyond the scope of what I'm interested in making easily testable

  2. We don't actually intend to actually do that since the logic for testing the library's API is all written in python

  3. I feel like this might open a can of worms if we ever start distributing binary wheels on PyPI. If we distribute this information, people might try linking against the packaged copy of our library with, which would be problematic since we'd also have to vendor a copy of hdf5.

@LecrisUT
Copy link
Collaborator

  • The primary CMake build-system knows nothing about our python bindings (or scikit-build-core). The system only has some build-recipes the core library, our toy-application examples, and a couple of googletest unit-tests. Crucially, our system support consumption of the project in embedded sub-build ( via cmake's add_subdirectory/FetchContent)

  • Our pyproject.toml file redirects to a single a isolated CMakeLists.txt file (stored with our cython & python code) that embeds a subbuild of the core library via add_subdirectory1
    Essentially: the system was designed so that the python bindings & core-library could be trivially split into separate repositories2

I would question then what are the unit-tests for. If you can already build the primary project independently and running the unit tests there, these are already covered by the main project's CI. Any background on why you want to run the tests twice?

Consequently I think my original plan would indeed work if scikit-build-core add the ability to expose whether the CMake build-directory is a temporary directory.

Would the addition of this feature to scikit-build-core be something that you would be receptive to in the future?

Yeah, it could be an explicit check if it used scikit-build-core default build directory or a custom one.

I would be happy to take a stab at implementing it. But, I totally understand if this is not a feature that you want to support

It's more that often times if you need a feature, you should also ask why you need it for. It could uncover more fundamental design issues of the build process.

Plus there was a comment in the changelog (related to implementing specifiable build-dirs), in which the tone suggested that the scikit-build-core might use persistent build-directories in the future (in that case, the feature probably doesn't make sense)

That sounds weird, which section is it? If it's about the one in 0.10.5, it's about supporting persistent build directories when also run with build-isolation, which is indeed something we want to support, but it would definitely not be a required workflow. pip build isolation and scikit-build-core's build directory are (mostly) independent of each other.

Out of curiosity, do you mean building the python-package within the CMake project (and using PYTHONPATH to use the module when it isn't as a wheel by scikit-build-core?)

Yes, I think that if you really need to run the test suite in a consistent environment, this would be your best option. But ultimately I believe it is better to keep the two test suites separate and independent 1. Instead maybe what you want is a solution for gluing together multiple test-suites?

Footnotes

  1. Your downstream packagers will be thankful for that.

@mabruzzo
Copy link
Author

I would question then what are the unit-tests for. If you can already build the primary project independently and running the unit tests there, these are already covered by the main project's CI. Any background on why you want to run the tests twice?

This is an excellent question!

In fact most of the primary project's functionality is not tested by the unit-tests. The current testing situation mostly arises because we are trying to modernize the project's legacy codebase. A brief summary is provided in the collapsible section.

Brief Testing History - The project originated as a series of Fortran routines from the mid 90s and for most of its life it existed as a module various kinds of atomic physics within a larger program that ran multi-physics astrophysical simulations. - There initially wasn't much regard for automated testing this on a module-level (I doubt that well known tooling existed for doing this). All testing involved manual printing & spot-checks or running the whole program at once. As time progressed, more features have been added making the internals more messy. They chosen Fortran dialect hasn't helped the situation (e.g. no struct-equivalent was ever used). - A little over 10 years ago, the module was extracted into a library that we now maintain as a separate project. The goal was to reuse the logic as a common component of other similar programs and a C interface was written. Python bindings were also written: primarily to write tests and so that you could use the logic in other contexts (a major impetus was writing a paper about the new project). All tests are written in terms of the python bindings - The library's primary API functions perform calculations that include the effects of a variety of physical processes and consequently, it's hard to come up with simple test-cases that have well-established "correct" answers. Consequently, the python tests just check whether calculation-results drifted from prior versions of the code. (The python package also includes a bunch of logic to make it easy to try to come up with sensible inputs for the calculations). - There is also the class of tests highlighted up above

To summarize: Until recently, all tests of the core library's functionality were written in python. The library's internals were not written in a way that were easily testable.

Fast forward to the present: there have been recent interest in trying to transcribe the code into C/C++. While doing this, some effort is being taken to refactor the internals into a form that we can write unit-tests to check. But these unit-tests currently only cover an extremely tiny area of functionality. As of now, our most robust tests of the core library are still part of the pytest suite.

In an ideal world, we would translate most of the existing tests of the underlying functionality from pytest into our googletest framework and only leave basic tests of the bindings themselves in the pytest framework. Unfortunately, that would take a substantial amount of work and we don't currently have the man power to do that (there are much more pressing priorities).1 It's something I aspire to accomplish in the future. For now, we can only endeavor to make better decisions about where we add new tests.

So, in the short to medium term, locally running2 all tests during development involves an awkward dance where we currently recompile the project with the python bindings to run most of the python-tests (there is some manual intervention for the toy-example tests I originally mentioned in my first issue), and then separately build the code without the bindings to run other the googletest tests. (Essentially, my proposed alternative was designed to simplify the process of testing the core libary whenever we make a change -- particular for other developers who are less familiar with the process than me)


It's more that often times if you need a feature, you should also ask why you need it for. It could uncover more fundamental design issues of the build process.

Oh, I totally agree! The current design of the build process is clearly suboptimal. But, there wasn't an obvious quick-fix to my project's.


That sounds weird, which section is it? If it's about the one in 0.10.5, it's about supporting persistent build directories when also run with build-isolation, which is indeed something we want to support, but it would definitely not be a required workflow. pip build isolation and scikit-build-core's build directory are (mostly) independent of each other.

I went and looked again. It turns out its from a while back (so the sentiment is probably not still relevant) in version 0.2.0. It states: "This version adds local build directory support - you can now set build-dir and reuse build targets. This does not yet default to on, so please test it out."


As for your point, about making things easier for downstream packagers -- we don't currently have those, haha. In my subfield, there is an expectation that all users will download source code and compile it themselves. (It's an extremely imperfect system and I've been slowly modernizing aspects of the project to try to make the code more easily packagable).


After responding to all this, I'm a little less sure what the most pragmatic way to move forward actually is. I'll give it a little more thought. We could probably close this issue.

Footnotes

  1. I also think that the value of having two distinct test suites has not yet been proven to all stakeholders in the project.

  2. The way that the answer-comparison tests are currently written makes the tests take quite a bit longer to run in CI than it does locally. There are some solid reasons this.

@LecrisUT
Copy link
Collaborator

One point that I think would be valuable for you, you can manually set build-dir at the CLI level

$ pip install -e . -Cbuild-dir=build

Then you should be able to re-configure and rebuild the project, keeping the relevant build artifacts if possible.

Ultimately a workflow that I would do is to create a simple CI override:

[[tool.scikit-build.overrides]]
if.env.CI = true
build-dir = "build"
inherit.cmake.define = "append"
cmake.define.MY_LIB_BUILD_TESTS = true

Then you just run each individual test frameworks

$ pytest
$ ctest --test-dir build

In an ideal world, we would translate most of the existing tests of the underlying functionality from pytest into our googletest framework and only leave basic tests of the bindings themselves in the pytest framework. Unfortunately, that would take a substantial amount of work and we don't currently have the man power to do that (there are much more pressing priorities).

I have seen way worse setups, but since it is using only industry standard with pytest and googletest, you are actually in a good situation. I would not pursue migrating everything, but having and maintaining multiple test frameworks is a good approach. Particularly because the testsuites is not only used by the developers, but also the packagers and users. Having a language-native testing suite like pytest or googletest helps downstream packagers a lot.


I went and looked again. It turns out its from a while back (so the sentiment is probably not still relevant) in version 0.2.0. It states: "This version adds local build directory support - you can now set build-dir and reuse build targets. This does not yet default to on, so please test it out."

Ah, probably it was gated by an option or something.


As for your point, about making things easier for downstream packagers -- we don't currently have those, haha. In my subfield, there is an expectation that all users will download source code and compile it themselves.

Initially that is always the case, particularly in academic code. But if you make the project packageable, other projects would be able to use yours and eventually distro packagers would pick it up. I have seen many projects die out primarily because of build-system and packaging neglect, and of course scikit-build-core is a great tool to avoid that ;).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants