Skip to content

Conversation

@Nancheng-11
Copy link
Collaborator

No description provided.

py_forward_method_(inputs);
auto attn_pyobj = graph_instances_[key].mem_hold_.attn_pyobj_;
attn_pyobj.attr("prepare")(inputs);
py_forward_method_(inputs, attn_pyobj);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥跑两次这里

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JackTan25 为了结果更稳定跑了两次warm up

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

更稳定的结果

data=data,
start_row=0,
start_col=0,
m=1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该是m?

.def(pybind11::init<>())
.def_readwrite("kv_cache_offset", &TRTAttn::kv_cache_offset)
.def(
"__cpp_ptr__",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥需要拿cppptr

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cudagraph调试的时候对比地址用的

from rtp_llm.ops.compute_ops import PyModelInputs


class AttnPyObj:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉不需要他,直接把方法放到fmhabase里?

@Nancheng-11 Nancheng-11 force-pushed the feat/refactor_cudagraph branch from 2ccba55 to 9590b0e Compare December 17, 2025 11:02
def support_cuda_graph(self) -> bool:
return False

def prepare_cuda_graph(self, attn_inputs: PyAttentionInputs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有调用过?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是预留接口吧

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我放在cudagraph里显式调用哇

@Nancheng-11 Nancheng-11 force-pushed the feat/refactor_cudagraph branch 8 times, most recently from 4b44c8c to c44bda9 Compare December 23, 2025 09:27
@Nancheng-11 Nancheng-11 force-pushed the feat/refactor_cudagraph branch from c44bda9 to bc1713c Compare December 25, 2025 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants