Releases: AI-Hypercomputer/xpk
Releases · AI-Hypercomputer/xpk
v1.3.0
What's Changed
Improvements
- Record used command flags in telemetry by @scaliby in #1019
- CommandsTester autopatching setup by @jamOne- in #1025
- refactor: Introduce ReservationLink dataclasses and update function signatures by @jamOne- in #1029
- make verify: remove installation by @jamOne- in #1033
- Fix pylint by @jamOne- in #1035
Bug fixes
Full Changelog: v1.2.0...v1.3.0
v1.2.0
What's Changed
Improvements
Bug fixes
- Fix Super-slicing nume aware conflict by @jamOne- in #1001
- Fix Super-slicing on 1 cube by @jamOne- in #1002
- Upgrade any existing nodepools after Lustre driver installation by @SikaGrr in #1005
- Set big controller resources for Super-slicing by @jamOne- in #1007
- Fix workload resources in pathways and super-slicing with v7x by @jamOne- in #1009
- Limit the coreDNS replica count to the desired number of default pool… by @SikaGrr in #1010
Full Changelog: v1.1.0...v1.2.0
v1.1.2
Full Changelog: v1.1.1...v1.1.2
v1.1.1
Full Changelog: v1.1.0...v1.1.1
v1.1.0
What's Changed
New Features
- Use gcloud beta for resource policies by @jamOne- in #962
- Add multi-container support by @zxhe-sean in #880
- feat: Introduce super-slicing inspection to the
xpk inspectorby @jamOne- in #969 - Remove 1:1 workload-nodepool annotation for sub/super-slicing by @jamOne- in #970
- Golden recipes script by @scaliby in #971
- feat: Support multiple reservations (super-slicing sub-block targeting) by @jamOne- in #980
- SUPER_SLICING_ENABLED=True by default by @jamOne- in #983
- Bump Kueue to 0.15.2 and Superslicing: Remove cube state nodeLabel by @jamOne- in #990
Improvements
- Support passing project number as an argument by @Aixile in #934
- re-add single-host golden by @jamOne- in #944
- Remove obsolete warning about single-host single-slice from documentation. by @SikaGrr in #945
- remove obsolete note by @SikaGrr in #947
- Remove unused makefile variables by @jamOne- in #950
- Fix GPU provisioning by @FIoannides in #940
- GPU e2e test by @FIoannides in #948
- Docs: add "Dependencies" and "Get involved" sections by @jamOne- in #951
- Fix spellcheck errors by @scaliby in #952
- fix DNS credentials retry by @FIoannides in #953
- Remove unused user_input.py (code was moved to console.py) by @jamOne- in #954
- Add unit test for get credentials by @FIoannides in #955
- Add CoreDNS to dependencies and fix the return code by @jamOne- in #958
- Telemetry log is_tester by @scaliby in #957
- Add note that workload create works only on XPK clusters by @scaliby in #959
- Stamp telemetry trash execution by @scaliby in #963
- Silent credentials check by @scaliby in #964
- Add a shared memory mount for rdma_decorator by @kenmcheng in #965
- Remove legacy integration tests by @scaliby in #973
- Add copyright to recipes by @scaliby in #976
- Allow for run only blocks in recipes by @scaliby in #979
- Remove dws flex cluster ingration test as it is constantly failing by @scaliby in #972
- github: Never mark issues as stale by @jamOne- in #993
- Add RayCluster golden by @jamOne- in #996
- Migrate to recipes by @scaliby in #994
- Remove goldens script by @scaliby in #998
Bug fixes
- Fix cluster adapt by @scaliby in #960
- Fix RayCluster parser after enabling the SuperSlicing feature flag by @jamOne- in #988
- kueue_manager: Use configure_super_slicing instead of checking system capabilities by @jamOne- in #989
- Recipes exit status code by @scaliby in #995
- Remove Super-slicing block validation by @jamOne- in #997
- Github: remove "-nodepools" suffix from e2e cluster names by @jamOne- in #999
- Fix kueuectl calls in xpk info by @jamOne- in #1000
New Contributors
- @Aixile made their first contribution in #934
- @kenmcheng made their first contribution in #965
- @zxhe-sean made their first contribution in #880
Full Changelog: v1.0.0...v1.1.0
v1.0.0
What's Changed
Breaking Changes
New Features
- feat: Add super-slicing workload topology validation by @jamOne- in #933
- Update super-slicing annotation by @jamOne- in #936
Improvements
- Enable IPAM and Dranet by @FIoannides in #916
- Remove slurm integration tests by @scaliby in #918
- Allow release-breaking as a label by @scaliby in #921
- Remove kjob from cluster commands by @scaliby in #922
- Remove kjob from system characteristics by @scaliby in #923
- Remove kjob from storage commands by @scaliby in #924
- Remove kjob from github actions by @scaliby in #925
- Remove kjob from makefile by @scaliby in #926
- Remove kjob from docs by @scaliby in #929
- Remove redundant file by @scaliby in #932
- Fix dry run by @scaliby in #938
- Docs: add ml diganostics usage guide by @Shuang-cnt in #937
- Telemetry launch by @scaliby in #935
Bug fixes
- Fix onPodConditions serialazation issue by @FIoannides in #928
- Fix integration tests by @scaliby in #927
New Contributors
- @Shuang-cnt made their first contribution in #937
Full Changelog: v0.17.0...v1.0.0
v0.17.3
Full Changelog: v0.17.2...v0.17.3
v0.17.2
Full Changelog: v0.17.1...v0.17.2
v0.16.1
Full Changelog: v0.16.0...v0.16.1
v0.17.1
Full Changelog: v0.17.0...v0.17.1