Skip to content

DAOS-18487 object: control EC rebuild resource consumption#17441

Open
gnailzenh wants to merge 14 commits intorelease/2.6from
liang/b2_6_ec_res
Open

DAOS-18487 object: control EC rebuild resource consumption#17441
gnailzenh wants to merge 14 commits intorelease/2.6from
liang/b2_6_ec_res

Conversation

@gnailzenh
Copy link
Contributor

A degraded EC read will allocate and register an extra buffer to recover data, which may cause ENOMEM in some cases.

this workaround does not prevent dynamic buffer allocation and registration, it does provide relatively precise control over the resources consumed by degraded EC reads.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

A degraded EC read will allocate and register an extra buffer
to recover data, which may cause ENOMEM in some cases.

this workaround does not prevent dynamic buffer allocation and
registration, it does provide relatively precise control over the
resources consumed by degraded EC reads.

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
@gnailzenh gnailzenh requested review from a team as code owners January 24, 2026 03:00
@github-actions
Copy link

Errors are Unable to load ticket data
https://daosio.atlassian.net/browse/DAOS-18487

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
@daosbuild3
Copy link
Collaborator

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
For data migration, after being waken up, the ULT should try
to wake up another ULT if there is still available resource.

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
wangshilong
wangshilong previously approved these changes Jan 29, 2026
@daosbuild3
Copy link
Collaborator

NiuYawei
NiuYawei previously approved these changes Feb 2, 2026
liuxuezhao
liuxuezhao previously approved these changes Feb 2, 2026
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
@gnailzenh gnailzenh dismissed stale reviews from liuxuezhao, NiuYawei, and wangshilong via 9664eb4 February 3, 2026 12:50
@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

gnailzenh and others added 8 commits February 6, 2026 16:54
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
- Add resource bucket so overall resource consumption wouldn't
  grow on system configured with more targets
- Track demanded resource and waitq for blocked ULT, and wakeup
  as many waiters as resource(being released) allowed
- Code cleanup

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
increase default resource limit

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17441/12/execution/node/1515/log

@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

@daosbuild3
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17441/13/execution/node/449/log

@daosbuild3
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

6 participants