Skip to content

[MRG] Austin workflow refactor part 2#134

Open
asoplata wants to merge 48 commits intojonescompneurolab:mainfrom
asoplata:austin-workflow-refactor-part-2
Open

[MRG] Austin workflow refactor part 2#134
asoplata wants to merge 48 commits intojonescompneurolab:mainfrom
asoplata:austin-workflow-refactor-part-2

Conversation

@asoplata
Copy link
Collaborator

This is based on, and continuing from, this PR here dylansdaniels#2 . This includes many large refactors, including changing ordered_page_links to a greatly expanded flat_index, along with many other changes.

If merged, this resolves #125 and #126 .

@asoplata
Copy link
Collaborator Author

I've marked it as WIP, but almost all the remaining changes I plan on making for this PR are documentation changes. There may be some further code changes, but they will not be very large.

@asoplata asoplata force-pushed the austin-workflow-refactor-part-2 branch from 6b9f88e to 6ba91ac Compare October 31, 2025 14:40
@asoplata asoplata changed the title [WIP] Austin workflow refactor part 2 [MRG] Austin workflow refactor part 2 Oct 31, 2025
@asoplata
Copy link
Collaborator Author

This is now at a good stopping point and is ready for review. I did a bunch of the things we talked about in addition to the Issues mentioned previously. Much of this was adding and updating documentation, including fixing a newly found (but preexisting) bug or two

This is the first part of a major refactoring overhaul of how we handle
indexing, input-markdown and output-HTML pathing, and sidebar
generation. This is still a work in progress, and most functionality has
been commented out for debugging/testing.

Major changes:

- `update_page_index.py` has been renamed to `create_indices.py`. This
new file now creates two indices:

1. "hier_index", which is the same as the old "index.json". Its content
has not changed. Instead of being saved to file every time and then
loaded later during sidebar creation, it creates and returns the
dictionary instead, and that is passed to sidebar for usage. I'm not
sure if we need to save it as a file at all, since its value can always
be easily debugged live, and we were not doing any kind of checking to
skip its loading before (i.e. if we have to reconstruct it every time,
then there's no gain from saving a copy to disk).

2. "flat_index". This will eventually replace "ordered_links" and is
greatly expanded from what "ordered_links" does. This is treated like
"hier_index" as an explicit "generate_page_html" function, but will replace
much functionality/data of the rest of the code. This index includes the
ordering of legitimate "pages" (i.e. not "sections"), and for every page
contains a dictionary containing:

A. title of the page
B. absolute path of the "input" markdown file
C. absolute path of the "output" HTML file (including processing
dev_build, and including creation of any parent directories to this
not-yet-created file)
D. "website-root"-relative path of the "output" HTML file (including processing
dev_build)

The next major step is using "flat_index" to replace ordered_links and
output path/filename lookups throughout the rest of generate_page_html.

Minor changes:
- Location of the saved index files has been moved to inside "scripts"
from the root
- Many functions and some script files have been renamed
This continues the use of "flat_index", including in every case of the
main loop, except for the "ordered_links" which are next.

This includes many minor changes as well for formatting (including a
ruff run), documentation, and other small changes.
Among other minor changes, including adding a new optional
"save_indices" CLI arg that does what it sounds like, this finishes
applying the usage of the new "flat_index" to the remaining code, which
was mostly the "ordered" links used in the per-page footers.

Everything appears to be working correctly, including that the sidebar
and footer navigation uses "dev" or "content" properly.

Next up is general cleaning and documenting.
This begins a multi-step transition from the CLI "--build-on-dev"
argument to the combination of "--code-version" (formerly the unused
"--build-type" argument) and "--custom-owner-commit".

The CLI args have already been created, and this commit focuses on
applying their use inside `get_commit_hash.py`.

This also includes a great deal of minor changes `get_commit_hash`,
including better error messages, more explicit variable names, etc.

There is also incomplete support for a new "--code-version=no-check"
option that skips the version checking in case one is debugging or if
one wants to run 'build.py' a lot, but doesn't want to spam HTTP
requests. This use-case is useful in itself, but also related to a weak
hypothesis I had: that the fact that version mismatches in the previous
"default" case of 'build.py' usage provided only *warnings*, instead of
RuntimeErrors, like in all other mismatch cases (see
https://github.com/dylansdaniels/textbook/blob/workflow_refactor/build.py#L610
)
`dev_build` has finished being replaced by a combination of
`code_version` and (either `custom_owner_commit` or `commit_hash`)
depending on the function. Documentation has not yet been updated, but
it was incomplete to begin with.
This comments out what I think is something that may have been added by
accident. Compare this line to
https://github.com/dylansdaniels/textbook/blob/31dbaf09d6d854a4e8527024fa99f2b415820192/scripts/convert_notebooks.py#L923
.

What was originally going on at the time of the above commit is the
following:
`notebook_was_run` was used as whether or not the notebook was attempted
to be run at all, regardless of if it succeeded (I have changed this
variable to `execution_initiated`). `notebook_executed` was used as
whether or not the notebook successfully finished its execution (I have
changed this variable to `execution_successful`).

In the control flow, which has not changed at all, in this particular
case, we are in the case where `notebook_was_run`/`execution_initiated`
is False, meaning no execution was attempted at all. Also in the
original version, `dev_build` would have been activated, meaning we're
in a dev state. However, even if we were in a dev state, but if the
notebook execution was never initiated/attempted, then in this case, I
do not think it makes sense to record the commit hash of the current HNN
version into the notebook. If we did, it would imply that the last dev
execution was using this commit, but that is not the case.
This adds a large description of the building program as a whole to the
CLI for 'build.py' such that it is output when calling

python build.py --help

This includes description of the assumed file structure (which has many,
many assumptions), the overall code path, some examples, and the
important options along the way.

Minor: this also adds a docstring for 'get_hnn_commit_hash', renames the
'write_standalone_html' variable to 'save_standalone_nb_html', and makes
a new CLI argument of a similar name. This documentation was first
generated with Claude AI before undergoing *heavy* modification of every
word.
This partially reverts our post-hnn_commit_hash-validation code's checks
of the "code-version" to use a new var like in the original, called
"is_dev_build". The distinction between a "dev" and non dev build is now
fully documented, apparent, and set in 'build.py'. "is_dev_build" is
also a proper boolean flag (true/false only) and the actual commit hash
to use (when necessary) is handled separately.

More refactoring: this also renames the hnn-core commit hash module and
splits its functions, so that the 'get_hnn_commit_hash' function is ONLY
getting the installed version, and the rest of the code is validation of
the installed version against various other versions.
There are also some minor variable name changes
Previously, we were loading the same notebook from disk twice, once to
calculate the hash, then once to actually use it. It's more efficient to
load from disk once.
This includes a special comment about how
`_read_nb_json_output_metadata` looks in different places for the JSON
output files depending on if you're doing a stable or dev build
You can probably guess where this is going...

Note: this does not affect the Workshop page, since it's raw HTML.
My Claude's filesystem scan was out-of-date, and lightly "hallulcinated"
a `templates/navbar.html` that no longer exists. This removes that.
Appears to work just as well.
@asoplata asoplata force-pushed the austin-workflow-refactor-part-2 branch from 9400138 to 86eb4da Compare November 4, 2025 20:40
@asoplata
Copy link
Collaborator Author

asoplata commented Nov 4, 2025

@dylansdaniels this has now been rebased off of main, and additionally the only merge conflict has been resolved. It's definitely ready for review now, and should be much easier to review.

@asoplata
Copy link
Collaborator Author

asoplata commented Jan 5, 2026

This is complicated enough that we may want to NOT squash these into one big mega commit before merge. Instead, may want to way until we're happy with the current state, then I do a rebase to resolve the merge conflicts, then we merge

@asoplata
Copy link
Collaborator Author

For some reason, even if I re-add then re-delete the (no longer necessary) files for dev/05_erps/erps_in_gui.html and scripts/create_navbar.py, it still complains that these are blocking conflict. Therefore, I'm going to re-add them, but we should manually delete them AFTER this PR is merged.

Everything else should be up to date and work with main.

@dylansdaniels
Copy link
Collaborator

@asoplata reviewed everything and looks good in theory. doing some local testing that the builds are working as expected. I pushed a commit to add "no-check" to the "--code-version" argument, as it was missing .

@asoplata
Copy link
Collaborator Author

Thanks for catching that, when going over this again I forgot that --code-version=no-check and --no-version-validation are not the same thing. In the name of reducing complexity, we may want to discuss merging those two options somehow, but that discussion can wait.

I'll add a new commit that forces no-check to force a dev build (instead of content) and to update the documentation.

@asoplata
Copy link
Collaborator Author

Another refactoring idea is, even though we're already at pretty high CLI complexity, we may want to separate how whether you're doing a content or dev build into its own CLI arg. Especially when it comes to a CLI, we want to minimize how much mental "control flow" the user needs to keep in their head when it comes to secondary consequences of their choices (as opposed to giving them a higher number of choices but which are all independent of each other)

@asoplata
Copy link
Collaborator Author

Done with the addition, now --code-version=no-check causes a dev build (instead of content) and is properly documented (...probably)

build.py Outdated
Comment on lines 221 to 223
be skipped) using the current master (development) branch that you have
installed (WITHOUT checking against the latest master branch on Github), and
build the website in your local `dev` directory (creating it if necessary):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is confusing, because you techically don't need master installed since you're passing --no-version-validation, so it's actually not guaranteed that you're using the current master (dev) branch. I think we should remove example 4, reindex, and add explanation to the current example 8 stating that the --no-version-validation flag supercedes any use of --code-version. That, or we make it such that you can't actually pass --code-version with --no-version-validation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is related to your comment on control flow. But I agree that we don't necessarily have to rethink that entirely in this PR, but I do think we should change the examples at the very least

Copy link
Collaborator Author

@asoplata asoplata Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I def agree it's super confusing. It's possible that I meant to strip out --code-version=no-check in its ENTIRETY, and only left it in (e.g. at the end of process_hnn_commit_hashes.py) by accident. It's just been so long that I don't remember.

What about removing no-check entirely? no-check and --no-version-validation are essentially doing the same thing in process_hnn_commit_hashes.py already (except for --code-version=stable where only --no-version-validation is doing the correct behavior and returning None for the commit instead of the current installed commit).

If we remove it entirely, it's much simpler, and the docs will be so too. --code-version determines a content vs dev build based on either stable or {master, custom}. Then, --no-version-validation overrides the version validation by using the appropriate commit in the metadata (the installed one if dev, or None if content) and executing only using the installed commit.

I'll add that even if we drop no-check, it will be an improvement, but still has a lot of room left for further improvement after the PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm my gut is that it should be the opposite actually - we keep --code-version=no-check and remove --no-version-validation.

Then we entirely avoid the confusion of commands like: python build.py --execution-type=execute-updated-unskipped-notebooks --code-version=master --no-version-validation -> this implies it should be master, but we don't validate, so you can actually run it with stable and it will build anyway. users may thing their build is running on master but it's not

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There would never be a case where you need a code version specified when using --no-version-validation in its current form anyway, it's really an either or situation. so it might as well be an option for --code-version. and if there's no version validation, it should probably always be a dev build

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with removing --no-version-validation and replacing it exclusively with --code-version=no-check, since I agree that it way easier to understand, EXCEPT for one caveat: I think we should simultaneously split dev build logic into its own CLI argument. Reducing the cognitive load of "no check" into a single item is definitely the way to go.

However, we need to be able to control whether we're doing a build into content or dev, independent of that. For example, when I was coding this, I was actively using that situation -- I was trying to test and code the situation where --code-version=stable (which is currently the only way to use the code paths that build into content instead of dev), but I didn't want to be pinging Github everytime I tried to build (since I didn't want my IP to get on the bad list). Therefore, we (moreso you and I, not necessarily the expected user) need to be able to use --code-version=no-check but control whether we're outputting to and building from content or dev.

This has the added benefit of making it very explicit at the CLI level where you're building to. We could still retain the current control flow (i.e. --code-version=stable defaults to building to content, but {master, custom, no-check} default to building to dev) BUT this gives us (and users) a way to force override that preference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I think that's a good plan and I'm on board 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the change in the latest commit

This removes the CLI arg `--no-version-validation` (so that only
`--code-version=no-check` remains), and adds a new CLI arg named
`--build-directory`. This new arg accepts values of `auto` (the default,
which follows our `is_dev_build` "algorithm"), `content` (user forces
website to be built into content dir), and `dev` (user forces website to
be built into `dev`). This also upgrades all the documentation to
reflect these changes, along with some minor other docstring
improvements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG/BUILD: Sidebar links to "content" HTML files instead of "dev" during dev builds

2 participants