Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue: 4203242 dump wqe to log in case of ecqe #273

Open
wants to merge 1 commit into
base: vNext
Choose a base branch
from

Conversation

tomerdbz
Copy link
Collaborator

@tomerdbz tomerdbz commented Dec 9, 2024

dump wqe's content for further debug in case of ecqe.

Description

Following https://redmine.mellanox.com/issues/4183221
We created a patch to dump the wqe that had the invalid state.

No reason to not have it in our vNext as well.

What

Subject: what this PR is doing in one line.

Why ?

Further debug ability.

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

@tomerdbz
Copy link
Collaborator Author

tomerdbz commented Dec 9, 2024

based on the following patch:
debug_mkey.patch

const auto wqe = static_cast<const uint8_t *>(m_hqtx_ptr->m_mlx5_qp.sq.buf) + 64 * index;

std::ostringstream oss;
for (uint8_t i = 0; i < m_hqtx_ptr->m_mlx5_qp.cap.max_inline_data; ++i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using max_inline_data is not good here. Wqe may vary in size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have the credits field inside sq_wqe_prop which should specify the number of weqbbs this wqe consumed. Comment there says 'usually' so need to verify.

@tomerdbz
Copy link
Collaborator Author

TODO - squash

@tomerdbz tomerdbz force-pushed the dump_wqe_4203242 branch 2 times, most recently from c3407af to 3b59cb9 Compare December 17, 2024 11:47
std::ostringstream oss;
// see `calculate_credits` - credits is give or take the amount of WQEBBs per WQE
for (uint32_t wqebb_i = 0; wqebb_i < credits; ++wqebb_i) {
const uint32_t wqebb_wrapped_index = wqebb_i & (m_hqtx_ptr->m_tx_num_wr - 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure this will work

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we need to wrap current_wqebb_begin

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would say wrap(wqe_begin + wqebb_i * WQEBB_SIZE)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the wrapping less readable imho.

The PRM says how to wrap WQEBB index - not actual byte index.

I think we should conform to the PRM reader and not do a complex logic.
Though if you think this doesn't look right, let me know and I'll change.

@tomerdbz
Copy link
Collaborator Author

tomerdbz commented Dec 17, 2024 via email

@galnoam
Copy link
Collaborator

galnoam commented Dec 23, 2024

@AlexanderGrissik, can you approve?
Wrap looks OK, is there anything else pending approval?

@tomerdbz
Copy link
Collaborator Author

bot:retest

dump wqe's content for further debug in case of ecqe.

Signed-off-by: Tomer Cabouly <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants