Skip to content

Conversation

06393993
Copy link
Contributor

@06393993 06393993 commented Aug 4, 2025

Checklist

N/A:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added/updated tests to cover my changes

@06393993 06393993 requested a review from a team as a code owner August 4, 2025 06:00
@06393993 06393993 force-pushed the gitattr-design branch 4 times, most recently from e71fd26 to c79dd7f Compare August 4, 2025 14:52
@06393993
Copy link
Contributor Author

06393993 commented Aug 4, 2025

The example implementation is at 06393993/jj@gitattr-design...gitattr-example-impl.

@@ -0,0 +1,590 @@
# Git attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only skimmed the document but what I'm missing is an overview of how we're going to support gitattributes. Specifically, will it only affect the working copy just like EOL conversion does (I think that's my preference)? I.e., I hope all the code can be in local_working_copy.rs (helper elsewhere is fine as long as they're only used from local_working_copy.rs). I may very well have missed reasons that that's not possible. If it's not possible to do it in the working copy backend, then where in the architecture will we insert the understanding of gitattributes?

Copy link
Contributor Author

@06393993 06393993 Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it only affect the working copy just like EOL conversion does (I think that's my preference)?

For the only 3 features I know that we want:

  • EOL conversion and smudge/clean filter will only interact with the local_working_copy module
  • The diff feature needs modification outside the local_working_copy. The changes should happen in the diff_util module in the jj-cli crate for built-in diff tools, and in the merge_tools::diff_working_copies crate for external diff tools. The reason for the impossibility is obvious.

There are other git attributes features that may introduce modification outside local_working_copy.rs, e.g., merge, but we don't consider them in the current design.

I have already added that information to the "Context and Scope" section. Thanks for review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the "Context and Scope" section but I don't see how it's explained there.

  • The diff feature needs modification outside the local_working_copy.

I was hoping that we would simply not support that. It seems too specific to Git for my taste. Or are you suggesting that we support gitattributes when using non-Git backends too? Also when running on a server? How will we find the smudge/clean tools? I guess you explain that somewhere. Sorry, I've only skimmed the doc so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No they are not included in the docs.

The diff features are all about the built-in diff tool and the external diff tool. I am Ok that we don't support it. ilyagr@ suggests that "For a minor example, jj diff could respect the linguist-generated attribute (or a differently named attribute) and not show generated files by default, like GitHub." so I include it.

are you suggesting that we support gitattributes when using non-Git backends too?

I am Ok with supporting gitattributes only for git backends, but on the code side, it should probably not behind the git Cargo feature flag, which may complicate the code. But I am also Ok with it.

Also when running on a server?

I am not sure how diff is relevant to a server. Am I missing anything?

How will we find the smudge/clean tools?

Same as git, defined in the user settings. If you don't define the filter, files should be checked out and checked in as is(Haven't checked the git behavior, but I assume so).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, the conversion would be done just before displaying to the user. That at least reduces the scope. I'm still quite worried about the scope of this feature. I'm worried i will impact code in lots of different places. Hopefully I'm wrong.

I am not sure how diff is relevant to a server. Am I missing anything?

No. As long as we don't care about it when merging, I think it won't affect diffing code. I think we will just decide that we don't handle all cases exactly correctly. Git doesn't either, AFAIK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried i will impact code in lots of different places.

No, it won't. At least this design doesn't include such changes and doesn't suggest we need such changes for future features. If there is such impact, it will be in a different design and you will be notified then.

Copy link
Contributor

@PhilipMetzger PhilipMetzger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a bunch of discussion points.

Beside that I like how hard this tries to map Git's .gitattribute design onto Jujutsu with almost no friction. While this a good thing in some ways, it also misses what a rethought interface for jj could or should be and to me this is important. I'm going to think hard about that now.

@@ -0,0 +1,590 @@
# Git attributes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other titles which could work: Git Attributes for Jujutsu, Supporting Git Attributes in Jujutsu


**Authors**: [Kaiyi Li](mailto:[email protected])

**Summary**: This design introduces an API internal to `jj-lib` that future git attributes related features could use to query the git attributes information associated with a file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: While this is a nice summary, I'd expect a briefer overview following it (it'd also would've answered Martin's question).


This design provides the cornerstone of all the previously mentioned features.

In this design, we use the same terminology as the [git attributes document](https://git-scm.com/docs/gitattributes#_description). Some regularly used definitions include pattern, state, and `gitattributes` file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: While a link is fine, it forces you to keep a tab open to it. And since there are a bunch of newcomers to source control here it'd be also worthwhile to copy the relevant terms into this section (it'd also help if the link were unavailable).

* How the `gitattributes` file should be loaded from the `Store` or from the file system.
* When query the attributes associated with a file, how to look up `gitattributes` in all ancestor folders.
* [Macro](`https://git-scm.com/docs/gitattributes#_using_macro_attributes`) resolution.
* The query interface should be thread-safe, because `TreeState::snapshot` currently snapshots files in multiple threads concurrently.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd reword this should as must, since I don't think this is negotiable longterm.

* The query interface should be thread-safe, because `TreeState::snapshot` currently snapshots files in multiple threads concurrently.
* The design should make it easy to optimize the performance of the query interface without having to change the caller code too much, and the initial implementation should be reasonably fast, because any feature that makes use of `gitattributes` is likely to query the git attributes of all the changed files whenever `jj` snapshots files from the file system to the `Store` or updates files from the `Store` to the file system.
* When `gitattributes` is not used, the design should introduce negligible runtime and memory overhead, if not 0, i.e. if none of git attributes related features is enabled through the user settings.
* Include git built-in `gitattribute` macros. Currently there is only [`binary`](https://github.com/git/git/blob/master/attr.c#L639).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor wording nit: s/Include/Support all (there's only one and that's a detail)


`GitAttributes::search` may try to read the `gitattributes` file, and if the read fails for a reason other than file not found, or `Store` fails to retrieve a tree, `GitAttributes::search` fails, i.e., returns an `Err`.

Note that we are using the `gix_attributes::State` type from the external `gix-attributes` crate in the return type, so we restrict the accessibility of the `GitAttributes::search` interface to `pub(crate)`. If we want to promote it to `pub`, we either need to define our own `gix_attributes::Assignment` type or re-export the `gix_attributes::Assignment` type. Promotion to `pub` is likely, because we will need `DiffRenderer` in the `jj-cli` crate to query the `text`, `linguist-generated`, and `diff` git attributes to tell whether a file is binary, generated or how to generate the diff texts respectively. But for this design, we still just keep it as `pub(crate)` for simplicity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd prefer to export our own type since we're able to change the underlying implementation depending on the projects needs. Also I assume its highly likely that it will be pub at some point anyway, so we can guard against misuses.

Comment on lines +209 to +214
```rust
#[async_trait]
trait FileLoader: Send + Sync {
async fn load(&self, path: &RepoPath) -> Result<Option<Box<dyn AsyncRead + Send + Unpin>>>;
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Have you considered a design which allows you to nest FileLoaders then fetching from Disk or a external cache could be a implementation detail?

Comment on lines +312 to +327
```rust
struct GitAttributesNode {
disk_first: OnceCell<
Arc<(
gix_attributes::Search,
gix_attributes::search::MetadataCollection,
)>,
>,
store_first: OnceCell<
Arc<(
gix_attributes::Search,
gix_attributes::search::MetadataCollection,
)>,
>,
disk_file_loader: Arc<dyn FileLoader>,
store_file_loader: Arc<dyn FileLoader>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be in a linked list and know where they're assigned from (i.e Disk or Store)? Isn't it simpler to just have a GitAttributesNode which is lazily loaded and make the cache responsible for path nesting and management with something like a Vec or LruCache?


### Integration with EOL conversion

We add 2 fields `TreeState::git_attributes: Arc<GitAttributes>` and `TreeState::git_attributes: Arc<GitAttributes>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think one of these should be TreeState::disk_attributes: Arc<GitAttributes>

@@ -169,6 +169,7 @@ nav: # This lists all the files that become part of the documentation
- Sparse patterns v2: 'design/sparse-v2.md'
- Tracking branches: 'design/tracking-branches.md'
- Copy tracking and tracing: 'design/copy-tracking.md'
- gitattributes cornerstone: 'design/gitattributes.md'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also could use a better title.

@06393993 06393993 mentioned this pull request Aug 12, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants