Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add private to trainer.push_to_hub. Add _update_repo_visibility to trainer. #33511

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hlky
Copy link
Contributor

@hlky hlky commented Sep 16, 2024

What does this PR do?

Currently setting hub_private_repo training argument only allows control over repo visibility at creation.

This PR adds private parameter to trainer.push_to_hub, this allows control over repo visiblity at creation in case hub_private_repo was not set in training arguments, and allows updating visibility of existing repos.

Fixes #33492
Fixes #32909

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

cc @muellerzr

@hlky hlky force-pushed the private-trainer-push-to-hub branch 2 times, most recently from 135ea1e to cbe5b69 Compare September 16, 2024 15:39
Copy link
Contributor

@not-lain not-lain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean PR, thanks for working on this, could you rebase on the main branch so all the tests could turn to green

Comment on lines +4487 to +4559
private (`bool`, *optional*, defaults to `args.hub_private_repo`):
Controls repo visibility at creation and changes visiblity of existing repo.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite a nice addition, thanks for adding this

@hlky hlky force-pushed the private-trainer-push-to-hub branch 3 times, most recently from e379608 to 519f4f9 Compare September 18, 2024 12:16
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@hlky hlky force-pushed the private-trainer-push-to-hub branch from 519f4f9 to f99889d Compare October 3, 2024 11:29
@not-lain
Copy link
Contributor

not-lain commented Oct 3, 2024

friendly tagging @SunMarc for review

SunMarc
SunMarc previously approved these changes Oct 3, 2024
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this ! Left a few comments. Can you have a second look @muellerzr ?

Comment on lines 4368 to 4389
def _update_repo_visibility(self, token: Optional[str] = None):
if self.hub_model_id is None:
raise ValueError("`_update_repo_visiblity` should be used after `init_hf_repo` to ensure repo exists.")
update_repo_visibility(
repo_id=self.hub_model_id,
private=self.args.hub_private_repo,
token=token,
)

Copy link
Member

@SunMarc SunMarc Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would inform the user that a change of visibility of the repo will occur using the model_info api.
Still, I don't think this is a good default behavior. hub_private_repo default is False, so if someone launches a script without setting this, his repository will change to public...
Maybe only allow to change the visibility for the repo with the private attribute or simply raise an error/warning asking the user to change it on the hub by himself ? cc @muellerzr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an info message to inform the user about the change of visibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use logger.warning and also check that the repo visibility really changed with model_info API. Also, the use of update_repo_visibility with self.args.hub_private_repo is still not decided, so let's not resolve this conversation.

@@ -4518,6 +4527,7 @@ def push_to_hub(
blocking: bool = True,
token: Optional[str] = None,
revision: Optional[str] = None,
private: Optional[bool] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that really be helpful to have this here instead of using hub_private_repo in TrainingArguments ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seemed to be one of the expected use cases in the linked issues.

Comment on lines +4097 to +4116
def test_push_to_hub_private(self):
with tempfile.TemporaryDirectory() as tmp_dir:
trainer = get_regression_trainer(
output_dir=os.path.join(tmp_dir, "test-trainer-private"),
push_to_hub=True,
hub_token=self._token,
)

trainer.push_to_hub(private=True)

info = model_info(f"{USER}/test-trainer-private", token=self._token)

self.assertTrue(info.private)

trainer.push_to_hub(private=False)

info = model_info(f"{USER}/test-trainer-private", token=self._token)

self.assertFalse(info.private)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@hlky hlky force-pushed the private-trainer-push-to-hub branch from f99889d to 42e244d Compare October 3, 2024 16:11
@hlky hlky force-pushed the private-trainer-push-to-hub branch from 42e244d to 899d982 Compare October 3, 2024 16:29
@SunMarc SunMarc dismissed their stale review October 3, 2024 21:30

Not sure about changing the visibility if the repo is created, I want mueller feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants