DAOS-18487 object: control EC rebuild resource consumption#17441
DAOS-18487 object: control EC rebuild resource consumption#17441gnailzenh wants to merge 14 commits intorelease/2.6from
Conversation
A degraded EC read will allocate and register an extra buffer to recover data, which may cause ENOMEM in some cases. this workaround does not prevent dynamic buffer allocation and registration, it does provide relatively precise control over the resources consumed by degraded EC reads. Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
|
Errors are Unable to load ticket data |
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
|
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17441/2/testReport/ |
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
For data migration, after being waken up, the ULT should try to wake up another ULT if there is still available resource. Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17441/4/testReport/ |
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
9664eb4
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17441/5/testReport/ |
|
Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17441/5/execution/node/1541/log |
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
- Add resource bucket so overall resource consumption wouldn't grow on system configured with more targets - Track demanded resource and waitq for blocked ULT, and wakeup as many waiters as resource(being released) allowed - Code cleanup Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
increase default resource limit Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
|
Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17441/12/execution/node/1535/log |
|
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17441/12/execution/node/1515/log |
|
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17441/12/execution/node/1525/log |
|
Test stage Functional Hardware Large completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17441/13/execution/node/531/log |
|
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17441/13/execution/node/449/log |
|
Test stage Functional Hardware Medium completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17441/13/execution/node/521/log |
A degraded EC read will allocate and register an extra buffer to recover data, which may cause ENOMEM in some cases.
this workaround does not prevent dynamic buffer allocation and registration, it does provide relatively precise control over the resources consumed by degraded EC reads.
Steps for the author:
After all prior steps are complete: