Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/faster python build #91

Open
wants to merge 344 commits into
base: master
Choose a base branch
from
Open

Conversation

staleyLANL
Copy link
Contributor

This change may raise some eyebrows. It arguably thumbs its nose at the normal build-system concept of fine-scale recompilation: compile only the individual translation units that have changed since the last build. However, I believe the new system is much better, given the circumstances. (More below.)

The old system is still available with: make -j <threads> GNDStk.python.old.

For the new system, just use this: make GNDStk.python.

No -j is necessary for the new system. It doesn't help.

Timings on the author's office machine, with make -j 24:

 * old: 735.93 seconds.
 * new: 221.25 seconds.

Additional benefits:

 * Less memory, overall, appears to be used.
 * Only one thread is needed. (This also frees up memory.)

Moreover, the resulting .so file is less than half of its original size! On the author's office machine:

 * old size: 221,451,008 bytes.
 * new size: 106,036,816 bytes.

Why does the size improve so much? My guess is that this relates, in part, to lambdas. Consider that two classes, X and Y, may both contain an object of class Foo. Foo's functionality may use lambdas. Internally, compilers handle lambdas by generating anonymous classes with call operators. Compile files individually,and multiple anonymous classes (and thus operators) for Foo may be created. As they're anonymous, we won't get multiply-defined symbol errors when linking. But, the same thing may be repeated in multiple files, whereas more efficiency can be obtained with the new system. (This theory about lambdas being the cause may or may not be correct. Regardless, the new .so file's size is much smaller, at least on my machine.)

So, what's the new system? Well, we made a file,

 python/src/all.python.cpp

that includes all of the *.python.cpp files - a total of 280 files - that were previously compiled individually. Stuffing these together is fine, from a technical standpoint, because of course we have macro guards on all the header files that those source files include, and we don't have file-scope statics or other things that might cause issues.

This process - stuffing C++ files together - flies in the face of the normal build concept, but (if the above stats aren't convincing enough), we believe it may be the best approach here.

 * Users are likely to download GNDStk, compile once, and then not bother
   doing so again until a substantial new release comes along. Enough
   things are likely to change, between substantial releases, that most
   individual files - not just a few - may have changes, and thus need
   recompilation, anyway.

 * Users may now see, for instance, 6 minutes instead of 20. Yay!

 * Most `*.python.cpp` files are autogenerated. If one such files changes,
   it's likely that most others have changed similarly, due to some change
   in the code generator's behavior. In this case, there's no big benefit
   to a fine-scale recompilation.

The big issue is this: GNDStk is a large header-only library, and its header is included in all of the *.python.cpp files. The author's long experience with such situations has been that the processing of the header (including what it additionally includes, such as the JSON library) - translating from code to an AST (abstract syntax tree) - eats up most of the compilation time. Basically, in our old system here, the same processing was being done 280 times, each time followed by typically just a page or two of code in the *.python.cpp file itself! So, a fine-scale-compilation system is dominated by overhead. The same overhead, 280 times!! :-( Stuff the files together, and the header processing happens only once.

I'd suggest that in all but rare (and probably contrived) circumstances, the new system will be faster than the old system, and also use less memory. And, it'll always need only one thread, and give a smaller-sized .so file.

Note: the code generator can be modified so that it automatically puts together the individual source files that currently appear in python/src/all.python.cpp, except for the non-autogenerated ones at the top. In this case, we could then have python/src/all.python.cpp include the non-autogenerated ones, as they do now, plus a single extra file. At the moment, I haven't modified the code generator to do this.

An earlier branch introduced the ability to write HDF5 files in different ways, according to two boolean flags: *reduced* and *typed*.

This branch enhances our JSON capabilities in the same manner, reflecting, with JSON, an analog of the options we have for HDF5.

The above was done in respect to GNDStk's general philosophy of making its support of different file types as consistent with one another as reasonably possible.

Similarly to HDF5, the JSON class now has two static booleans: "reduced" and "typed".

The "reduced" flag tells the JSON writer whether or not to reduce -- basically, to fold into a shorter and simpler representation -- certain special constructs that GNDStk's Node uses internally in order to handle what are called cdata, pcdata, and comment nodes in XML. Basically, if reduced == true, then we make this simplification, which shortens the JSON output. If reduced == false, then the JSON will closely reflect what Node uses internally.

The "typed" flag tells the JSON writer whether or not it should use our type-guessing code to guess whether "strings" in certain contexts (in particular, metadata values and pcdata) actually contain what look like numbers. Examples: "12" looks like an int, "321 476" looks like a vector of ints, "3.14" looks like a double, and "3.14 2.72 1.41" looks like a vector of doubles. If typed == false, we use the original strings, with no type guessing. If typed == true, we apply the type guesser, and, where appropriate, get JSON numbers in place of strings in the JSON output.

This is a work in progress, with a couple of things that still need doing...

Still to do: modify the JSON *reading* code so that it recognizes the various different ways that the JSON *writing* code writes things, and can reliably reverse the writing process and recover a GNDStk Node.

Also still to do: at present, the nlohmann JSON library, which we use under the hood, doesn't provide a way to write JSON numbers (unquoted, as opposed to quoted JSON strings) beginning from an existing string representation that we might provide for the number. Consider this discussion that I started:

     nlohmann/json#3460

This is relevant to us because GNDStk provides very fine control over exactly how numbers, in particular floating-point numbers, are formatted, in terms of the number of significant digits, fixed vs. scientific form, etc. At the time of this writing, if we use typed == true, so that certain strings that look like numbers are written as numbers, the numbers will, unfortunately, be formatted by the JSON library itself. We'd like to have the capability of writing an original string (one that we've already determined looks like a number!) - but writing it as a JSON number (so, without quotes) rather than a JSON string.

In this commit, we also Modified an old Tree/ test that depended on the previous default (but still available, via appropriate flags, even though no longer the default) JSON writing behavior. And, we changed a variable name in json2class.cpp.

We refactored detail-node2hdf5.hpp (for HDF5) considerably, and then reflected this in the modified detail-node2json.hpp (for JSON) that we're primarily working on.

And, we've enhanced the JSON test code so that it tries out the new flags. This is a work in progress as well. In order to finish it, we need to finish the ability to read JSON files that were written not just in our original manner, but with any variations of the above-described flags.
Also some small tweaks and comment changes here and there.
Some tweaks to HDF5-related code.
Extended and improved some comments, to clarify things.
Additional tests for those, too.
There are still some instances of the strings in question, typically appearing amongst other text inside of string literals. At some point I'll look into fixing up those cases too, but that's not a priority right now.
Updated the HDF5 test suite to ensure that we test the modifications.
We'll update the code generator later, and ensure that it does the right thing.
More feature-packed classes Field and FieldPart.
I'm not sure why I made them be converting constructors to begin with. Whatever the reason, it seems (according to the test suite!) that they don't need to be converting constructors. And, I don't think that they should be. Indeed, some new code modifications uncovered this issue, as converting constructors for Node triggered certain ambiguities.
Code generator itself still needs updating.
…in name isn't known.

This was important for the new generated-class Field system to work well, without carrying along a std::string (expensive!) in each class field. On the other hand, such a thing wouldn't necessarily be a memory hog in a well-designed library that mostly contained vectors and such - not a huge number of individual scalar fields.
Improved some SFINAE.
Added more comments, and did a bit of rearrangement.
It now generates classes that use the new system.
Note that it doesn't yet add shortcuts to generated classes; that'll come soon.
Changed the way a map was done. Faster, and leads to better printing of shortcut information.
Some refactoring related to the above.
Miscellaneous cosmetic and clarity-related changes.
Some small printing-related changes and fixes as well.
… offers.

There's lots more to be done in the above respect; this is just a start.

Also, provided some preliminary support for another approach to customization.

The original (and still in place) "customization" system involves inserting code from custom.hpp files directly into generated classes. Inserting additional bits and pieces of code into already-existing classes is actually a somewhat goofy concept. (Imagine someone customizing, say, std::complex by adding a new piece of information. They really should derive from it, or contain it in another class. Not try to mess with the existing class.)

I'm fleshing out *one* new way, at least (with possibly others to come) for doing customizations. There's more to be done, but I'll work with a particular user right now, to see how this works.

JSON specs now allow metadata to have converters. It turns out that this will help with the new customization business. Briefly: we want an ability to insert our own fields into classes (say, to contain data that we'll compute as functions of other fields), but without those new fields playing a role in I/O from/to files. A new (and simple) "Noop" (no-op[eration]) class fills the necessary role of representing the "no conversion is taking place" for I/O on the new fields. Some new if-constexprs, elsewhere in the code, allow convert() functions to return a bool (but they can still return void, as before). If a convert() returns bool, and is used in the context of Node add() metadatum or child node, then a return value of false means, "don't actually add the new metadatum or children." This is all in support of our new customization approach, because we don't necessarily want auxiliary computed fields to be written to files. (But still want them in the multi-query, so they work with print() and such.)

Tweaked some SFINAE. This detail:: stuff really ought to be made more internally consistent, even though it's in detail::. Note: the _v and _t business is modeled after what's in std::. (Examples: enable_if_t, is_convertible_v.)

Adjusted some comments to reflect recent changes.
@staleyLANL staleyLANL requested a review from whaeck March 20, 2023 21:11
@staleyLANL staleyLANL self-assigned this Mar 20, 2023
…terface.

Some code cleanup and minor rearrangements.

Lots more to come in the Python interface, as time allows.

The next thing will probably be access to our extensive capabilities for controlling formatting, in particular as it relates to floating-point precision.
@staleyLANL
Copy link
Contributor Author

Remark. The CI is still failing. With the new system, we could see if a plain make, rather than a make -j 2, succeeds. Only one thread is necessary for GNDStk.python. It may or may not use less memory than the -j 2 case, though.

Another remark. Other NJOY codes that are similar in certain respects (have python bindings, binding files unlikely to change on a fine scale, many such files, each including a relatively long and complex header for which processing overhead is large) may show similar build-time improvements if we do something similar to what I did here. We'd just need to try, in order to know. Aside from build-time improvements, other improvements may include needing only one thread, using less memory, and producing substantially shorter archive files.

staleyLANL and others added 18 commits March 30, 2023 14:31
The code generator builds C++ code, a C-language interface, and Python bindings. Prior to this change, the code that generated the C interface handled only the old "BlockData" manner of handling data nodes. That was a construct that handled the "valueType"-based "run-time defined data type" business that exists in some GNDS nodes. But our format for format specs allows for a fixed data type as well, and it's handled in a different way. For a user who has been using GNDStk's code generator to generate an entirely different format, I needed to update the generation of the C interface in order to handle this properly.
These are largely in preparation for some upcoming, more-substantive changes; ones that should somewhat improve and clarify the interface.
Some work on Component::has() and Component::operator().
Small change: class to struct in a few places.
Additional work regarding Lookup objects, Field, Component, etc.
Cleaned up some SFINAE, and removed some that was no longer needed.
A bit of formatting work in some CMakeLists.txt files, just to make them arguably more readable.
 - Clarified "auto" in some places.
 - Ditto for constness.
Minor changes in some SFINAE to more-closely resemble certain std:: constructs.

Sfome SFINAE sfimplifications.

Removed unnecessary "inline"s. The functions in question were in fact function *templates*, so that "inline" was unnecessary in order to still achieve header-only-ness.

Used "struct" instead of "class" in a few places.

Used "size_t" instead of "int" in a few places. The difference wasn't really important in these places, but I decided that size_t was correct in principle.

C Interface: fixed some previously unnoticed discrepancies that arose after an earlier change. Some function names were changed a while back: color to colors, note to notes, and warning to warnings. However, the *declarations* (for C) hadn't been updated accordingly.

Function name change: detail::getter() to detail::compGetter(). (comp, for class Component.) Too many things were called "getter." This change should make code auditing/analysis a bit easier, going forward.

Miscellaneous other changes. There are still some relevant files to be uploaded, but I need to review them more carefully first.
Split up some longer files.
The new overload and SFINAE were added after I switched from clang 9 to clang 17, and the latter reported overload ambiguities in some old code that hadn't had problems with earlier compilers. It's not clear to me that ambiguities really exist, but in any event, the new overload and the SFINAE disappears the new errors.

Casted to (void) in certain places in some test files, after noticing that a new g++ emitted warnings about the results of some function calls being unused. (We did those calls to test that they're valid and to check out-parameters, not because we intended to use the return values of the functions.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants