Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2170: Add unit and Integration tests for model and dataset initializers #2323

Merged
merged 2 commits into from
Jan 18, 2025

Conversation

seanlaii
Copy link
Contributor

@seanlaii seanlaii commented Nov 9, 2024

What this PR does / why we need it:
I added unit tests and integration tests for model and dataset initializers.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #2305

Checklist:

  • Docs included if any changes are user facing

Comment on lines 59 to 70
# Private HuggingFace dataset test
# (
# "HuggingFace - Private dataset",
# "huggingface",
# {
# "storage_uri": "hf://username/private-dataset",
# "use_real_token": True,
# "expected_files": ["config.json", "dataset.safetensors"],
# "expected_error": None
# }
# ),
# Invalid HuggingFace dataset test
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an access token for testing login and downloading resources from private repo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, maybe we can track this in a separate issue that we should create Kubeflow-owned account in HF for the Token.

@coveralls
Copy link

coveralls commented Nov 9, 2024

Pull Request Test Coverage Report for Build 12840565631

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall first build on initializer-test at 100.0%

Totals Coverage Status
Change from base Build 12834026562: 100.0%
Covered Lines: 85
Relevant Lines: 85

💛 - Coveralls

@seanlaii seanlaii force-pushed the initializer-test branch 4 times, most recently from 8930b80 to c6e0a83 Compare November 9, 2024 18:17
@seanlaii
Copy link
Contributor Author

seanlaii commented Nov 26, 2024

Hi @andreyvelich ,

Could you help review this PR? I have some questions. Once the SDK's PR gets approved, I will modify it accordingly.

Thank you!

@andreyvelich
Copy link
Member

@seanlaii Sorry for the delay, sure, I will review it today

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this effort @seanlaii!
I left my initial thoughts.
Please take a look @Electronic-Waste @deepanker13 @kubeflow/wg-training-leads @varshaprasad96 @akshaychitneni @saileshd1402

.github/workflows/integration-tests.yaml Outdated Show resolved Hide resolved
.github/workflows/test-python.yaml Outdated Show resolved Hide resolved
pkg/initializer_v2/test/unit/dataset/test_dataset.py Outdated Show resolved Hide resolved
pkg/initializer_v2/test/unit/model/test_model_config.py Outdated Show resolved Hide resolved
pkg/initializer_v2/test/unit/model/test_model.py Outdated Show resolved Hide resolved
pkg/initializer_v2/test/unit/test_utils.py Outdated Show resolved Hide resolved
@seanlaii seanlaii force-pushed the initializer-test branch 6 times, most recently from 08fbd57 to d867237 Compare December 21, 2024 19:34
@seanlaii
Copy link
Contributor Author

Hi @andreyvelich , could you help review the PR? I addressed the comments. Thank you!

@andreyvelich
Copy link
Member

Sorry for the delay @seanlaii!
I will review it this week.

Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaii Thanks for your contributions! I left some comments for you.

As for the e2e test's pattern, we can discuss later with @andreyvelich :)

pkg/initializer_v2/dataset/huggingface_test.py Outdated Show resolved Hide resolved
pkg/initializer_v2/model/huggingface_test.py Outdated Show resolved Hide resolved
@seanlaii seanlaii changed the title KEP-2170: Add unit and E2E tests for model and dataset initializers KEP-2170: Add unit and Integration tests for model and dataset initializers Jan 14, 2025
@seanlaii
Copy link
Contributor Author

Hi @andreyvelich @Electronic-Waste , I addressed the comments. Please help review the PR when you are available. Thank you!

pkg/initializer_v2/model/main_test.py Outdated Show resolved Hide resolved
pkg/initializer_v2/dataset/main_test.py Outdated Show resolved Hide resolved
test/integration/initializer_v2/__init__.py Outdated Show resolved Hide resolved
@seanlaii seanlaii force-pushed the initializer-test branch 2 times, most recently from 7f06cf1 to f9d9ef2 Compare January 16, 2025 07:26
test/__init__.py Outdated Show resolved Hide resolved
@@ -0,0 +1,75 @@
import os
import runpy
from test.integration.initializer_v2.utils import setup_temp_path # noqa: F401
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add # noqa: F401 for bypassing lint error
F401 'pkg.initializer_v2.utils.utils_test.mock_env_vars' imported but unused in flake8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really unused ?
I thought, we use it here:

self.temp_dir = setup_temp_path("DATASET_PATH")

Copy link
Contributor Author

@seanlaii seanlaii Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a known thing for flake8: https://stackoverflow.com/questions/75647682/how-can-i-resolve-flake8-unused-import-error-for-pytest-fixture-imported-from.
If we would like to remove this, I can implement the suggestion of creating a conftest.py file to include all fixtures.

Copy link
Member

@andreyvelich andreyvelich Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.
If you could move them to conftest.py as suggested by pytest that would be better, I think.
I guess, this file should live under test/integration/initializer_v2, right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also created one under pkg/initialier_v2 to include the shared fixture used in the tests under this directory.

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates @seanlaii!
Just a few small comments from me.
/assign @kubeflow/wg-training-leads @Electronic-Waste @astefanutti

pkg/initializer_v2/dataset/main_test.py Outdated Show resolved Hide resolved
pkg/initializer_v2/model/main_test.py Outdated Show resolved Hide resolved
@@ -0,0 +1,75 @@
import os
import runpy
from test.integration.initializer_v2.utils import setup_temp_path # noqa: F401
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really unused ?
I thought, we use it here:

self.temp_dir = setup_temp_path("DATASET_PATH")

.github/workflows/integration-tests.yaml Outdated Show resolved Hide resolved
@seanlaii seanlaii force-pushed the initializer-test branch 2 times, most recently from 70dfc5b to 02d64b8 Compare January 18, 2025 03:18
Signed-off-by: wei-chenglai <[email protected]>
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this great contribution @seanlaii 🎉
/lgtm
/approve

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit e47d8f7 into kubeflow:master Jan 18, 2025
56 checks passed
@seanlaii seanlaii deleted the initializer-test branch January 18, 2025 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KEP-2170: Add unit and integration tests for model and dataset initializers
6 participants