-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: docs/design: Topics #4496
base: main
Are you sure you want to change the base?
WIP: docs/design: Topics #4496
Conversation
ae77041
to
d8c86eb
Compare
8a6ae11
to
6f1f249
Compare
Continues the conversation started in Discord and hopefully concludes it with a final design.
6f1f249
to
56ae00c
Compare
|
||
##### Obsoletion of `[experimental.auto-advance-bookmarks]` | ||
|
||
Since Topics are infectious by nature, they can perfectly map to a Git branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would a topic with multiple heads map to a single Git branch (or maybe multiple branches)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is unclear to me as well. I'm guessing what it's supposed to mean is that topics "perfectly map to" what people expect from Git branches, rather than that they'd actually be implemented as a 1:1 equivalent for Git branches. But since Git branches don't actually work the way Topics are described (i.e. they are not merely "infectious" but actually implicitly apply to all ancestor commits, and they cannot have multiple heads), they don't actually "map perfectly" to Git branches as understood by people who deeply understand the Git model (which is probably the minority of Git users, but certainly a nontrivial contingent).
Certainly this sentence cannot mean that topics are always represented in the git storage backend as branches, since that would be incompatible with both of the differences cited above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would a topic with multiple heads map to a single Git branch (or maybe multiple branches)?
Each separate head will get its own Git branch. There's also the variant of connecting them, but I do not think that is desirable, as the different heads should represent different kind of logical work.
This sentence is unclear to me as well. I'm guessing what it's supposed to mean is that topics "perfectly map to" what people expect from Git branches, rather than that they'd actually be implemented as a 1:1 equivalent for Git branches. But since Git branches don't actually work the way Topics are described (i.e. they are not merely "infectious" but actually implicitly apply to all ancestor commits, and they cannot have multiple heads), they don't actually "map perfectly" to Git branches as understood by people who deeply understand the Git model (which is probably the minority of Git users, but certainly a nontrivial contingent).
That is correct. But sharing the conceptual behavior never meant that we as Jujutsu VCS need to represent them the same internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushing the multiple heads of a topic as foo-1
, foo-2
, etc., is surprising to me. I was expecting topics to work roughly like this:
- A commit can manually be labeled a part of topic "foo".
- A new commit created on top of that commit will automatically be labeled "foo".
- If
heads(topic(foo))
has a single revision, the topic may be pushed as a git branch named "foo" with a command likejj git push -t foo
- If
heads(topic(foo))
has more than one revision, pushing the topic fails with an error (similar to hg refusing to create multiple remote heads)
I'm not sure why I would ever want to push a topic and get branches foo-1
, foo-2
, etc. The only reason might be to create a remote backup of my work, but I don't expect topics to fix this problem (it's something to solve for any anonymous head).
I think forges like Gerrit would support pushing multiple heads in the same topic, but I would expect jj gerrit
to handle this.
I also don't know how creating multiple branches for a topic works as the graph changes over time. Suppose I had commits B and C in a single topic, both descending from A, and I pushed, creating branches foo-1
and foo-2
. Then I create a merge commit D also in the topic. Does pushing delete foo-1
and foo-2
, and replace them with just foo
?
D
|\
B | foo-1@origin
| C foo-2@origin
|/
A
|
||
#### Git Interop | ||
|
||
For all continuous Topics we can just simply export them as Git Branches. For |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guessing this is a typo, especially since it's contrasted with "non-contiguous":
For all continuous Topics we can just simply export them as Git Branches. For | |
For all contiguous Topics we can just simply export them as Git Branches. For |
|
||
For all continuous Topics we can just simply export them as Git Branches. For | ||
Topics which contain non-contiguous parts, we should allow exporting them | ||
one-by-one or with a generated name, such as `<topic-name>-1` for each part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would "one-by-one" mean, and why is it distinct from having a generated name? I'm guessing it means that the CLI would let the user decide to export just one contiguous "section" of a topic without exporting the others, but [a] what would the actual benefit to the user be, and [b] wouldn't there still need to be an auto-generated name in order to prevent future conflict when exporting another "section" of the same topic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would "one-by-one" mean, and why is it distinct from having a generated name?
The distinction is mostly manual.
I'm guessing it means that the CLI would let the user decide to export just one contiguous "section" of a topic without exporting the others, but [a] what would the actual benefit to the user be, and [b] wouldn't there still need to be an auto-generated name in order to prevent future conflict when exporting another "section" of the same topic?
On point A, we give the user the choice of when they want to export another slice of a topic and on B the previous slice's branch may have already been deleted, so we'd be free to choose the name again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is something to handle whenever the topic has multiple heads, not just if the topic is disconnected. For example, the following topography is connected but can't be exported as one git branch:
B topic=foo
|
| C topic=foo
|/
A topic=foo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be export restrictions on Topics, as we cannot represent this as a Git Branch. This will hit the auto-generated name I mentioned previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, we currently skip export of bookmarks when they're conflicted, so it wouldn't be that different to skip exporting topics when they have multiple heads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, we currently skip export of bookmarks when they're conflicted, so it wouldn't be that different to skip exporting topics when they have multiple heads.
Yes, that would also would be a variant of dealing with the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this section now disagrees with the backend-implementation section below:
For the Git backend, we could either embed them in the message, like Arcanist or Gerrit do or store them as Git Notes, if necessary.
Or am I misunderstanding, and topics will be exported to Git as notes and branches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda like the idea of exporting them as both. Notes will avoid modifying the commit hashes while supporting faithfully exporting discontiguous topics. We could perhaps use notes as the source of truth for syncing and treat branches as purely for compatibiltiy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Git backend, we could either embed them in the message, like Arcanist or Gerrit do or store them as Git Notes, if necessary.
Or am I misunderstanding, and topics will be exported to Git as notes and branches?
These are options beside exporting them as branches, which is the desired implementation for Git interop. We should enumerate all options to fully consider the whole design space.
We could perhaps use notes as the source of truth for syncing and treat branches as purely for compatibiltiy?
That's a good idea. I just don't know what that encompasses.
|
||
### Detailed Design | ||
|
||
#### Git Interop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a part of this that seems to be missing: where does a topic "end"? I'm guessing that you would export one git branch for each head commit in the topic, rather than one git branch for every commit in the topic (since that wouldn't really have much advantage compared to jj git push -c
). But git doesn't have a concept of where a branch "starts" except relative to other branches, and then only in the context of commands (such as rebase) that start by finding common ancestors, which implicitly "prune" the branch. So for the "change-archival" use-case, what commits would be included in the archive
topic?
I'm guessing the answer is something like: topic(x) := ::heads(topic) ~ ::trunk()
, i.e. topic x
contains the heads plus all ancestors minus everything on the trunk
"branch". I think this is consistent with the property of hg
topics that they "disappear" as soon as they are merged (though I haven't used hg
and am not totally sure I followed the linked topic
documentation). But whatever the rule is, it should be explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a part of this that seems to be missing: where does a topic "end"? I'm guessing that you would export one git branch for each head commit in the topic, rather than one git branch for every commit in the topic (since that wouldn't really have much advantage compared to
jj git push -c
).
Yes, this is also correct.
So for the "change-archival" use-case, what commits would be included in the
archive
topic?
This is @avamsi's use-case from #4180. This means that you locally add a description like archive:
to your commits and then let jj
infer it in someway or occasionally do it manually with a command.
I'm guessing the answer is something like:
topic(x) := ::heads(topic) ~ ::trunk()
, i.e. topicx
contains the heads plus all ancestors minus everything on thetrunk
"branch".
Yes, that is correct and thanks for the formulation.
But whatever the rule is, it should be explicit.
Yes, I'll write it down.
// | ||
// This avoids rewriting the Change-ID, but must be implemented. | ||
#[ContentHash(ignore)] | ||
topics: HashMap<String, String> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do key
and value
represent here? key = topic_name, value = ??
? Isn't it just a set of topic names?
FWIW, I assume topics are commit metadata, hence the commit id changes when topics are updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do
key
andvalue
represent here?key = topic_name, value = ??
? Isn't it just a set of topic names?
How would a hashset implement a Topic on two different remotes then? It may also be worth it to just rename it to metadata.
WIW, I assume topics are commit metadata, hence the commit id changes when topics are updated.
I choose this option because of #3613 (comment), as you should be able to rewrite those lightweight tags easily. If we can change the underlying commit-id w/o rewriting the Change-ID it should be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, are you saying that topics will not be stored in the commit metadata? I think that needs to be clarified then. It wasn't clear at all to me. (The doc says that topics will be stored in a metadata map, but it doesn't say where that map is stored.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, are you saying that topics will not be stored in the commit metadata?
Where did I say that?
The doc says that topics will be stored in a metadata map, but it doesn't say where that map is stored.)
On the Commit which makes this dance I think necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did I say that?
I was just asking if you did because it wasn't clear what you meant. You said "I choose this option" without saying which option you meant. It was in reply to Yuya's "I assume topics are commit metadata", which made me think it would be stored in the commit, but then you linked to a comment that said "After some discussion with @PhilipMetzger we came to the conclusion that we can't actually store topics on commits", which made me think that you meant that it should not be stored on commits.
On the Commit which makes this dance I think necessary.
I see. I think that information needs to go in the doc too. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would a hashset implement a Topic on two different remotes then? It may also be worth it to just rename it to metadata.
Oh, so topics
is a map of remote_name: topic_name
? I was just wondering why it was a map at all, because I just thought topic is a local thing (and mirrored to Git branch at remote.)
Uh, the part where this was discussed completely escaped me?. I thought we were settled on them being locally available (like the oplog, for example) and for easy editing of topics without actually rewriting commits. And for sharing them through a git remote, roundtripping through git branches - while only really works nice for single-head topics that branch off the trunk - is plenty imo. Interoping with some gerrit features would be then the job of the gerrit integration etc. |
## Overview | ||
|
||
Until now, Jujutsu had no native set of topological branches, just | ||
[Bookmarks][bm] which interact poorly with Git's expectation of branches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it makes sense to refer to "Git's expectation" separately from the expectations of Git users, but if we do want to call out poor interaction between bookmarks and branches, then I think we should be more specific about where the tooling actually diverges in behavior. Git has a concept of a "current branch" that jj
completely lacks, and it auto-advances the current branch, whereas jj
doesn't auto-advance bookmarks except using the experimental feature that this design hopes to deprecate.
[Bookmarks][bm] which interact poorly with Git's expectation of branches. | |
[Bookmarks][bm] which interact poorly with Git's expectation of a "current branch" that is advanced automatically as commits are added. |
I have a few questions which don't seem to be answered yet in the design doc:
|
I'm going to create threads for top-level questions, since GitHub's UI for them isn't great and I expect that this will receive more than a handful of comments. Since this is isn't nearly done yet. |
While this works it falls short of having the metadata synced by multiple | ||
clients, which is something desirable. The prototype thus also avoids rewriting | ||
the Change-ID which is a good thing, but makes them only locally available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, the part where this was discussed completely escaped me?.
Why is "having the metadata synced by multiple clients" something desirable? I mean it surely would be nice, with an actuall jj remote that's yet to exist, but without that we have to resort to annoying and intrusive things like rewriting commit messages or similar.
This Design Doc contains two features which I desire for jj
:
- Topics (Git Interop, a tagging mechanism)
- Commit Metadata (Topics are a part of them)
And we always should design for the native future and something which is compatible with the Google backend.
I only remembered the drawback of keeping them out of band from an earlier Design Doc, but I am sadly unable to find that conversation with Torquestomp.
I thought we were settled on them being locally available (like the oplog, for example) and for easy editing of topics without actually rewriting commits.
Yes, we agreed on that for Topics. But for pure Commit Metadata we never decided on it and it is feasible without to much cost, in my opinion.
And for sharing them through a git remote, roundtripping through git branches - while only really works nice for single-head topics that branch off the trunk - is plenty imo.
Agreed.
Interoping with some gerrit features would be then the job of the gerrit integration etc.
Where did you get that from, I never mentioned something like it? AFAIK, Gerrit Topics are once again some special refs on the Gerrit/Git Server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Design Doc contains two features
Well the confusing part is then that the title says "design for topics", while as you just clarified it's actually for this generic maybe-shareable metadata that topics could be an instance of.
I think combining the two into one doc was a mistake - there were a lot of discussions and some sort of vision for the out-of-band topics, but this is first time I'm hearing about the other thing and the doc doesn't seem like everything was discussed and decided on because of it, at least at the moment.
Where did you get that from
Honestly don't remember 😂
Someone somewhere at some point mentioned somethings like that idk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed through the doc again - you mostly talk about topics specifically in it and a high-level results of the discussions that happened - but then starting on the storage paragraph you suddently derail into this generic commit metadata protobuf thing and it gets confusing - right after saying they could be exported to git as branches, no less
Where did you get that from
And it was probably from the "Prior work" section that mentioned gerrit topics :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions which don't seem to be answered yet in the design doc:
Will jj bookmarks still be able to track remote git branches (meaning they'll be updated on fetch)?
- If so, can a bookmark and a topic track the same remote git branch?
The intent of this work is to remove this behavior and take over the necessary parts from Tracking Branches. Maybe some parts will be left intact to have something equivalent to a Git Tag, but I haven't thought deeply about it.
- Will jj bookmarks still be pushed as git branches?
No, see the first answer.
- Will git branches still be fetched as jj remote bookmarks?
Ditto.
- Can git branches be fetched as topics even if they were created by someone else?
Yes, otherwise this feature would not help with Git interop.
- If so, how does jj know what the base branch is?
This will continue to use Git tooling as a implementation, so it will be the merge-base.
Is there a command to set the base branch for fetching a topic?
I don't understand your question, please elaborate.
- If it defaults to
trunk()
, is it impossible to fetchtrunk()
(or a branch that's an ancestor oftrunk()
) as a topic since there's zero commits in that range?
Thanks for this question, this will need some thought. Since the intended design is to allow free rewriting of commits, they can be preserved by the respective backend/server implementation.
- How does this work for projects with multiple long-lived branches?
I don't understand your question here, since I have no base of your expectations here. Can you elaborate?
- Is there such a thing as a "remote topic"?
See my first answer.
- If so, what determines whether a git branch is fetched as a remote bookmark or a remote topic?
You'll hopefully only be able to fetch a Git Branch as a Topic, so this shouldn't occur.
Or are there only remote bookmarks still, and there's an explicit command to start tracking a remote bookmark as a local topic?
See my first answer for my current plan.
- Can a topic and a bookmark have the same name?
Maybe? This is to be determined on how they reflect on a backend.
- If so, which takes precedence? Do they have different syntax?
This behavior is to be determined here.
- Can a topic be conflicted?
I don't think so, since introducing conflicting metadata seems to be a problem I don't understand.
- If so, how are topic conflicts resolved?
As above.
- If there's a series of commits
trunk(), A, B, C
and I set a topic asB, C
and push it, I assume the remote git branch will be pointing toC
. What happens if a coworker rebases the branch, creating the new series of commitstrunk(), X, A', B', C'
with the remote git branch pointing toC'
now? When I fetch, does my topic get expanded toX, A', B', C'
, or does it becomeB', C'
somehow? Or does it become conflicted?
I expect the first behavior I think, but that depends on how we want to integrate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a command to set the base branch for fetching a topic?
I don't understand your question, please elaborate.
Since a Git branch is just a pointer to a head commit, I believe it's unclear which commits are logically part of a branch unless you're comparing against some base branch/commit, so for fetching new remote branches from Git, I was just wondering how jj
will determine which commits to include in the corresponding topic and whether it's user-configurable. It seems like there's a lot of options, such as:
- Topic contains
..branch_head
- Topic contains
trunk()..branch_head
- Add a new configurable revset alias like
git_topic_base()
, then topic containsgit_topic_base()..branch_head
- Add a new configurable revset alias like
git_topic_revisions(branch_head)
, then topic just containsgit_topic_revisions(branch_head)
- Store a base commit as metadata with every remote topic, then topic contains
base_commit..branch_head
My concern for options 2 and 3 was that it's possible for the imported topic to be resolved to an empty set (if the branch head is an ancestor of the base branch/commit), which would need to be resolved somehow.
How does this work for projects with multiple long-lived branches?
I don't understand your question here, since I have no base of your expectations here. Can you elaborate?
My concern for option 2 is that some repos have multiple long-lived branches (e.g. branches A and B), and then there are branches off of those branches (e.g. C branches off from A, and D branches off from B). Ideally, it would be nice to be able to import C as containing A..C
and D as containing B..D
, but for option 2 they would be imported as trunk()..C
and trunk()..D
respectively. If trunk()
is allowed to contain multiple revisions, maybe this wouldn't be an issue though since you could set "trunk()" = "main@origin | A@origin | B@origin"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share the concern about branch pointers not being sufficient to indicate which commits are part of a topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I've been thinking about it more, and it seems like there could be another problem with the current approach. Currently, trunk()
is generally defined in terms of remote bookmarks. If we get rid of remote bookmarks and replace them with remote topics, then trunk()
would have to be defined based on a remote topic instead. However, from what I've heard, the plan is to exclude commits from a topic if they are present in trunk()
. This leads to a circular definition, where trunk()
depends on remote topics, but remote topics depend on trunk()
.
Maybe it would be better to keep remote bookmarks as-is, but just allow tracking a remote bookmark as a topic with a command like jj topic track <bookmark@remote>
. Then trunk()
could continue to be defined in terms of remote bookmarks, and there would be no circular dependency. This would also allow a command like jj topic track <bookmark@remote> --base <revset>
to specify that the topic should include only commits in <revset>..<bookmark@remote>
instead of the default trunk()..<bookmark@remote>
. I don't think it would even be necessary to store <revset>
anywhere, because for future fetches, the local topic could just be updated to roots(local_topic)-..updated_remote_bookmark
.
Remote topics still could be supported, but they just wouldn't be used for the Git backend. It's conceivable that a native backend might want to support both remote bookmarks and remote topics, just like Git has both tags and branches.
Continues the conversation started in Discord and hopefully concludes it with a final design.
Rendered