-
Notifications
You must be signed in to change notification settings - Fork 26.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add private
to trainer.push_to_hub
. Add _update_repo_visibility
to trainer.
#33511
base: main
Are you sure you want to change the base?
Conversation
135ea1e
to
cbe5b69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clean PR, thanks for working on this, could you rebase on the main branch so all the tests could turn to green
private (`bool`, *optional*, defaults to `args.hub_private_repo`): | ||
Controls repo visibility at creation and changes visiblity of existing repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quite a nice addition, thanks for adding this
e379608
to
519f4f9
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
519f4f9
to
f99889d
Compare
friendly tagging @SunMarc for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this ! Left a few comments. Can you have a second look @muellerzr ?
src/transformers/trainer.py
Outdated
def _update_repo_visibility(self, token: Optional[str] = None): | ||
if self.hub_model_id is None: | ||
raise ValueError("`_update_repo_visiblity` should be used after `init_hf_repo` to ensure repo exists.") | ||
update_repo_visibility( | ||
repo_id=self.hub_model_id, | ||
private=self.args.hub_private_repo, | ||
token=token, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would inform the user that a change of visibility of the repo will occur using the model_info api.
Still, I don't think this is a good default behavior. hub_private_repo
default is False
, so if someone launches a script without setting this, his repository will change to public...
Maybe only allow to change the visibility for the repo with the private
attribute or simply raise an error/warning asking the user to change it on the hub by himself ? cc @muellerzr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an info message to inform the user about the change of visibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use logger.warning
and also check that the repo visibility really changed with model_info
API. Also, the use of update_repo_visibility
with self.args.hub_private_repo
is still not decided, so let's not resolve this conversation.
@@ -4518,6 +4527,7 @@ def push_to_hub( | |||
blocking: bool = True, | |||
token: Optional[str] = None, | |||
revision: Optional[str] = None, | |||
private: Optional[bool] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would that really be helpful to have this here instead of using hub_private_repo
in TrainingArguments
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seemed to be one of the expected use cases in the linked issues.
def test_push_to_hub_private(self): | ||
with tempfile.TemporaryDirectory() as tmp_dir: | ||
trainer = get_regression_trainer( | ||
output_dir=os.path.join(tmp_dir, "test-trainer-private"), | ||
push_to_hub=True, | ||
hub_token=self._token, | ||
) | ||
|
||
trainer.push_to_hub(private=True) | ||
|
||
info = model_info(f"{USER}/test-trainer-private", token=self._token) | ||
|
||
self.assertTrue(info.private) | ||
|
||
trainer.push_to_hub(private=False) | ||
|
||
info = model_info(f"{USER}/test-trainer-private", token=self._token) | ||
|
||
self.assertFalse(info.private) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
f99889d
to
42e244d
Compare
42e244d
to
899d982
Compare
Not sure about changing the visibility if the repo is created, I want mueller feedback
What does this PR do?
Currently setting
hub_private_repo
training argument only allows control over repo visibility at creation.This PR adds
private
parameter totrainer.push_to_hub
, this allows control over repo visiblity at creation in casehub_private_repo
was not set in training arguments, and allows updating visibility of existing repos.Fixes #33492
Fixes #32909
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
cc @muellerzr