Skip to content

Commit

Permalink
add but-hunk-dependency crate for learning about the origin of hunks
Browse files Browse the repository at this point in the history
  • Loading branch information
Byron committed Jan 24, 2025
1 parent 8da088e commit b01c25b
Show file tree
Hide file tree
Showing 5 changed files with 191 additions and 0 deletions.
11 changes: 11 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 18 additions & 0 deletions crates/but-hunk-dependency/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[package]
name = "but-hunk-dependency"
version = "0.0.0"
edition = "2021"
authors = ["GitButler <[email protected]>"]
publish = false

[lib]
doctest = false

[dependencies]
anyhow = "1.0.95"
gix = { workspace = true, features = [] }
but-core.workspace = true

[dev-dependencies]
gix-testtools.workspace = true
insta.workspace = true
111 changes: 111 additions & 0 deletions crates/but-hunk-dependency/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
#![deny(missing_docs, rust_2018_idioms)]
//! ### Terminology
//!
//! * **WorktreeHunk**
//! - A patch if applied to `HEAD^{tree}` would turn that resource into the `WorktreeState`.
//! * **CommitHunk**
//! - A patch generated from a commit and its parent, indicating the change that the commit represents.
//! - If there are multiple parents, only the first one is used for obtaining CommitHunks.
//! * **WorktreeState**
//! - A file at a `Path` as it would be found in the *worktree*.
//! - If that file is compared to the `HEAD^{tree}` we get `WorktreeHunks`.
//! * **CommitState**
//! - A *Blob* in the *Git tree* at a `Path`.
//! * **Blob**
//! - The bytes of a file, ready for storage in Git.
//! * **BranchTip**
//! - The top-most commit in a Git branch.
//! * **BranchBase**
//! - The floor of a Git branch, which itself isn't considered part of the branch anymore.
//! - The *base* is used to compute a `CommitHunk` with its direct descendant commit, but its own `CommitHunk` is never used.
//! * **Branch**
//! - A branch is all commits from a single `BranchTip` that is bounded by one or more `BranchBases`.
//! - Just a Git branch.
//! * **Stack**
//! - A list of `Branches` whose *commits* are naturally connected to each other, so the top-most `Branch` is connected with the bottom-most `Branch`.
//! - These aren't represented directly here, as a `Stack` can be represented as `BranchTip` of the top-most branch to the `BranchBases` of the bottom-most branch,
//! and we use the term `Branch` here for simplification instead of `Stack`.
//! * **BranchCommits**
//! - The *commits* between the `BranchTip` and the `BranchBases`.
//! * **Workspace**
//! - A set of `Stacks` which are all merged together into a single worktree, represented by a `WorkspaceCommit` that is an octopus merge between the `BranchTips` of all `Stacks`.
//! * **WorkspaceCommit**
//! - The commit as the result of the octopus between the `BranchTips` of all `Stacks`.
//! - Its tree is a merge of all `Stacks` and contains all their changes.
//! * **AmendableCommit**
//! - A list of commits to which a `WorktreeHunk` cleanly applies without intersecting with any `CommitHunk`.
//! * **IntroducingCommit**
//! - The first *commit* whose `CommitHunk` intersects with a `WorktreeHunk`. This means the hunk can override the overlapping portion of the `CommitHunk`
//! and now knows the *last commit* (closest to `BranchBase`) that it can apply to without causing conflicts in future commits.
//!
//! ### Purpose
//!
//! This crate helps to associate one or more `WorktreeHunks` to one or more *commits* .
//! There are the following cases to consider, with varying levels of accuracy.
//! This algorithm is *state*-based and produces the `CommitState` for each `AmendableCommit` and `IntroducingCommit` so it contains all applicable `WorktreeHunks`.
//! It starts with the `WorktreeState` available, and the `CommitState` at `BranchTip` as well.
//! It's notable that even if commits would be amended with `WorkreeHunks`, the worktree itself does not change state at all.
//! (*Note that ContextLines aren't relevant here.*)
//!
//! ### Associate all `WorktreeHunks` to their `IntroducingCommits` in a `Workspace` TODO/Still unclear
//!
//! A `Workspace` is the result of a merge of two or more `Branches`. This means its *worktree* is also the combination of two or more branches. If it is only one `Branch`,
//!
//! It seems easiest extract the `WorktreeHunks` (as `UnifiedDiff`) and then apply them one by one onto each candidate `Branch` in the `Workspace` with fuzzy matching
//! to find one that they apply to. When found, proceed with these patches similarly to how it's done with normal `Branches`.
//! This is probably helps with 80% of the `WorktreeHunks` that cleanly apply.
//! And then there are those that need to be split as they are partially in multiple `Branches`.
//!
//! Maybe another way to do this is to…
//!
//! - go through each `Branch`
//! - go through each `Commit` of a `Branch` from `BranchTip` to `BranchBases`
//! - merge in the `BranchTips` of the other `Branches` and cherry-pick the `WorktreeHunks` on top
//!
//! Essentially, perform the same algorithm as with simple `Branches`, but operate on a merge commit instead, simulating the effect of the `Workspace` at all times.
//! The problem here would be that it's very possible that the `Branches` don't merge cleanly in all cases.
//!
//! ### Associate all `WorktreeHunks` to their `IntroducingCommits` in `Branches`
//!
//! In standard Git `Branches`, the worktree matches the `BranchTip` and `WorktreeHunk` represent changes on top of that.
//! Here is an algorithm to associated `WorktreeHunks` with their `IntroducingCommits`.
//!
//! - prior to the walk, filter out all `WorktreeHunks` that aren't in any file that is touched by the `BranchCommits`.
//! - walk down from `BranchTip` to the `BranchBase`, and for each commit do a *three-way merge* such that we revert each commit, but pretend to have added `WorktreeHunk`
//! at the same time. Alternatively, it's like cherry-picking the `WorktreeState` onto the parent of `BranchTip` as first iteration. Then it's like pushing `WorktreeHunk` down the
//! commit-ancestry, starting at the `BranchTip` whose `State` we know with `WorktreeHunk` applied.
//! - If there is a conflict, we know the clashing `CommitHunks` are to be superseded by the overlapping portions of the respective `WorktreeHunk`, which can be similar to choosing
//! the *Ours* strategy. This removes the whole `WorktreeHunk` and if there are no more `WorktreeHunks` to track, we can stop iterating. This is the `IntroducingCommit` to record.
//! - If the merge is without conflicts, we have the `State` of our side for use in the next iteration. Record this commit as `AmendableCommit`.
//! - Keep iterating until all `WorktreeHunks` are associated with an `IntroducingCommit`.
//! - Once the `BranchBase` is reached, stop the iteration
//! - All `WorktreeHunks` that were associated should be applied to the `BranchTip`, adding it as `AmendableCommit`
//! - `WorktreeHunks` that were *not* associated are returned and can be committed separately, for instance on top of the `BranchTip` whose `State` we have returned as well.
//!
//! The algorithm should be run for all hunks at all `Paths` at once to be able to get the most out of diffs between two trees.
//!
//! ### Associating selected `WorktreeHunks`
//!
//! Selected `WorktreeHunks`, as a subset of all available `WorktreeHunks`, are applied onto `HEAD^{tree}` if applying to `Branches` or to the `WorkspaceCommit` if applying to a `Workspace`.
//! This sets the initial `State` to contain only the selected `WorktreeHunks` and their association to `IntroducingCommits` can be performed as normal.
//!
//! ### Committing `WorktreeHunks`
//!
//! The outcome of associating `WorktreeHunks` with *commits* is the `State` of each `Path` with `WorktreeHunks` for each *commit*. Thus, each *commit* knows how it would look like with
//! all applicable `WorktreeHunks` applied.
//!
//! Commits are effectively amended with the new `State` that contains `WorktreeHunks`, from the commit closest to the `BranchBase` moving upwards to the `BranchTip`, inclusive, which
//! means there is no chance for conflict or unexpected behaviour.
//!
//! Unassociated `WorktreeHunks` either belong to another `Branch` of a `Workspace`, or they would be a candidate to be committed with `BranchTip` as parent.
//!
//! ### Watch out!
//!
//! - Worktree State needs to be converted to what would be Git stage, i.e. has to go through filters first!
//!
//! ### Questions
//!
//! #### What to do with multi-parent commits?
//!
//! In theory, would have to merge the parents, and diff it against the commit. That bears the risk of a conflict (that has been resolved in the commit),
//! so in that case it should be fine to fallback to using the first parent.
26 changes: 26 additions & 0 deletions crates/but-hunk-dependency/tests/fixtures/branch-states.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

set -eu -o pipefail

git init 1-2-3-10
(cd 1-2-3-10
seq 1 >file && git add file && git commit -m "one"
seq 2 >file && git commit -am "two"
seq 3 >file && git commit -am "three"
seq 10 >file && git commit -am "to ten"
)

git clone 1-2-3-10 1-2-3-10_two
(cd 1-2-3-10_two
sed -i '' 's/2/two/g' file
)

git clone 1-2-3-10 1-2-3-10-shift_two
(cd 1-2-3-10-shift_two
{
echo "shift-one\nshift-two";
seq 10
} >file && git commit -am "shift all by 2"
sed -i '' 's/2/two/g' file
)

25 changes: 25 additions & 0 deletions crates/but-hunk-dependency/tests/hunk_dependency/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#[test]
fn change_2_to_two_in_second_commit() -> anyhow::Result<()> {
let repo = repo("1-2-3-10_two")?;
let tree_changes = but_core::diff::worktree_changes(&repo)?;
dbg!(&tree_changes);
Ok(())
}

#[test]
fn change_2_to_two_in_second_commit_after_shift_by_two() -> anyhow::Result<()> {
let repo = repo("1-2-3-10-shift_two")?;
let tree_changes = but_core::diff::worktree_changes(&repo)?;
dbg!(&tree_changes);
Ok(())
}

fn repo(name: &str) -> anyhow::Result<gix::Repository> {
let worktree_dir = gix_testtools::scripted_fixture_read_only("branch-states.sh")
.map_err(anyhow::Error::from_boxed)?
.join(name);
Ok(gix::open_opts(
worktree_dir,
gix::open::Options::isolated(),
)?)
}

0 comments on commit b01c25b

Please sign in to comment.