-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Register a dbt package from a sub directory of a repository #95
Comments
We'd really love this feature as well! We're an open-source project and we're maintaining two repositories, one of which is the Dbt package itself, and the other is a Python package that relies on it. Managing the two repositories is quite challenging in the sense that we have to sync features and logic between the repositories instead of having a single feature branch that would affect both. We'd love to merge those repos if it would've been possible. |
@turbo1912 and @elongl thanks for opening this, and sorry that it took a while to get back to you! Conceptually, I'm very supportive of this, especially if it doesn't impact how dbt-core downloads things. The things that are giving me pause are that we don't have any testing around the project (other than a couple of semver tests that @dbeatty10 added in #133) at the moment so can't really guarantee that changes don't cause downstream issues. I would say go for it, with the disclaimer that code review might be slow as we try to rustle up someone who knows how hubcap works. We'll also probably wind up dragging in someone from @dbt-labs/core to double check that there aren't any flow-on effects. |
Awesome! Glad to hear you and the team are on board. |
I definitely see the value here! I think this may be tricky for us to do in the current implementation of Hubcap + Hub site (hub.getdbt.com), though not impossible. The Hub site does not actually store/mirror specific files—it's just a pointer to a GitHub tarball URL, containing a zipped version of all files from the repo. E.g. for If we wanted to proceed with this as an extension of the current implementation, I think it would need to include:
Note that will still require downloading the entire contents of the repo, and then quickly deleting the contents we don't care about. That could still pose a risk on containerized / disk-limited file systems, if the overall repo is truly massive. A longer-term answer probably looks like the Hub graduating to support its own file-hosting capabilities, rather than using GitHub as a backend. In that future, something like this should be much simpler to implement, and better: the filtering can happen during package registration / upload, rather than in every single package download. Related issue: dbt-labs/dbt-core#4868 |
Hi @jtcohen6 Too bad Github doesn't provide a way to only download a specific file path within a repository 😞 I imagined it as something like that: "<organization>": [
"<repo>/<subdirectory>"
] For instance, "elementary-data": [
"dbt-data-reliability/dbt_project"
] so basically if there's a subdirectory, specify a conditional path within the repository with leading |
would using |
Interesting suggestion! I don't think we'll be able to entirely solve the problem using it since that in order to use it you need to have the repository already cloned which at this point you've already downloaded all the files. But it might be a cleaner way to exclude the necessary files rather than deleting everything else. |
We use "sparse checkout" for package subdirectories installed via the The installation mechanism for the |
A pleasant side effect: if this ticket was done, experimental packages (such as insert_by_period) could stay on the hub instead of needing to be specified as git subdirectories |
@jtcohen6 @joellabes |
@elongl Thanks for the bump! That feels doable, but we'd end up splitting on the [
{
"org_name": "<org_name>",
"packages": [
{
"repo_name": "<repo_name>",
"subdirectory": "<subdirectory>"
},
],
},
] So for instance: [
{
"org_name": "elementary-data",
"packages": [
{
"repo_name": "dbt-data-reliability",
"subdirectory": "dbt_project"
},
],
},
] |
@jtcohen6 Yes, of course! It also makes much more sense to me with a slight change. {
"organizations": {
"elementary-data": {
"packages": [
{
"repo_name": "dbt-data-reliability",
"subdirectory": "dbt_project"
}
]
}
}
} |
@elongl Fair point! I can't think of another org-level property that we'd need to specify. We could opt for as much conciseness as possible, while still allowing for structure where we need it. The org name could be the key, and its value would be a list of packages, with type {
"elementary-data": [
"dbt-repo-non-subdirectory",
{
"repo_name": "dbt-data-reliability",
"subdirectory": "dbt_project"
},
]
}
I'm realizing this would also offer a (roundabout) way of resolving dbt-labs/dbt-core#4868 (a way to ignore/exclude large unnecessary files). Package maintainers could move the "essential" components of the package to a subdirectory, and then specify that subdirectory in |
We're also looking to setup this tool in our monorepo. Is this now possible? If so, what are the steps we should follow. |
Thanks for letting us know your interest @domenic-donato. This is still a feature request that hasn't been implemented or released. |
Hi friends,
We are about to release a new dbt package and we wanted to keep everything as a morepo in our project. That would mean instead of pointing to a github repository we would have to point it to a subdirectory in our existing repository for example; https://github.com/fal-ai/fal/feature-store.
I don't think this is possible right now, but we are happy to contribute if this is something you would like to see implemented.
I just had a quick glance at the code and looks like a change like this is self contained in the hubcap repository and I don't have to touch the logic how
dbt-core
downloads dependencies.The text was updated successfully, but these errors were encountered: