initial commit for aot support #79

chengmengli06 · 2024-12-31T09:51:54Z

No description provided.

CLAassistant · 2024-12-31T09:52:00Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

杨熙 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

tiankongdeguiji · 2025-01-02T02:36:55Z

scripts/debug_export_aot.sh

@@ -0,0 +1,7 @@
+rm -rf experiments/multi_tower_din_taobao_local/export


remove the script in git, and add ENABLE_AOT doc in usage/export.md

tiankongdeguiji · 2025-01-02T02:40:37Z

tzrec/acc/aot_utils.py

+
+    gm = gm.cuda()
+
+    print(gm)


Is print gm the same as write gm.code

tiankongdeguiji · 2025-01-02T02:53:33Z

tzrec/acc/aot_utils.py

+                dynamic_shapes[key] = {0: batch}
+        elif key == "batch_size":
+            dynamic_shapes[key] = {}
+        elif data[key].dtype == torch.float32 and "__" not in key:


"__" might be present in regular feature names, not in sequence features only.

tiankongdeguiji · 2025-01-02T02:55:52Z

tzrec/acc/aot_utils.py

+    exported_gm = torch.export.export(
+        gm, args=(data,), dynamic_shapes=(dynamic_shapes,)
+    )
+    print(exported_gm)


do not print, already write to exported_gm.code

tiankongdeguiji · 2025-01-02T02:57:25Z

tzrec/acc/aot_utils.py

+                )
+            dynamic_shapes[key] = {0: tmp_val_dim}
+
+    exported_gm = torch.export.export(


exported_gm -> exported_program is better

tiankongdeguiji · 2025-01-02T03:04:11Z

tzrec/main.py

@@ -920,6 +927,16 @@ def export(
        )
        for asset in assets:
            shutil.copy(asset, os.path.join(export_dir, "model"))
+    elif is_aot():


move to line 891

InferWrapper = ExportWrapper if is_aot() else ScriptWrapper:

and use InferWrapper later

tiankongdeguiji · 2025-01-02T03:04:43Z

docs/source/conf.py

@@ -1,4 +1,4 @@
-# Copyright (c) 2024, Alibaba Group;
+# Copyright (c) 2024-2025, Alibaba Group;


revert the copyright

tiankongdeguiji · 2025-01-02T03:04:56Z

tzrec/acc/_decompositions.py

@@ -20,6 +20,7 @@
 from torch_tensorrt.dynamo.conversion.converter_utils import get_positive_dim
 from torch_tensorrt.dynamo.utils import to_torch_device

+


tiankongdeguiji · 2025-01-02T03:06:27Z

tzrec/models/model.py

@@ -236,3 +236,24 @@ def forward(
        """
        batch = self.get_batch(data, device)
        return self.model.predict(batch)
+
+
+class ScriptWrapperAOT(ScriptWrapper):


may be ExportWrapper is better

tiankongdeguiji · 2025-01-02T03:07:43Z

tzrec/models/model.py

+
+
+class ScriptWrapperAOT(ScriptWrapper):
+    """Model inference wrapper for aot export."""


Model inference wrapper for torch.export

tiankongdeguiji · 2025-01-02T03:18:29Z

tzrec/acc/aot_utils.py

@@ -0,0 +1,139 @@
+# Copyright (c) 2024, Alibaba Group;


add an AOT test in tzrec/tests/rank_integration_test.py

yjjinjie · 2025-01-02T06:48:29Z

tzrec/acc/aot_utils.py

+
+    exported_gm_path = os.path.join(save_dir, "debug_exported_gm.py")
+    with open(exported_gm_path, "w") as fout:
+        fout.write(str(exported_gm))


the exported_gm.code == debug_exported_gm,py ,they all save the str(exported_gm)

yjjinjie · 2025-01-02T06:51:34Z

tzrec/main.py

@@ -746,6 +750,9 @@ def _script_model(
            logger.info(f"Model Outputs: {result_info}")

            export_model_trt(model, data_cuda, save_dir)
+        elif is_aot():
+            data_cuda = batch.to_dict(sparse_dtype=torch.int64)


the data type is same in cpu and gpu, it can use the same data_cuda in export/trt_export/aot_export

yjjinjie · 2025-01-02T06:56:07Z

tzrec/models/model.py

+    # pyre-ignore [14]
+    def forward(
+        self,
+        data: Dict[str, torch.Tensor],


the aot model predict may not support device, it need to workaround in predict such as https://github.com/alibaba/TorchEasyRec/blob/master/tzrec/main.py#L1076

it is not fully done, currently just commit the export part, the compile part and prediction part are waited to be done

yjjinjie · 2025-01-02T06:59:28Z

tzrec/main.py

@@ -737,6 +739,8 @@ def _script_model(
            logger.info("quantize embeddings...")
            quantize_embeddings(model, dtype=torch.qint8, inplace=True)

+        if is_aot():
+            model = model.cuda()


just model.cuda() is correct？when i use gloo to load device_state_dict = state_dict_to_device(
model.state_dict(), pg=checkpoint_pg, device=torch.device("cpu")
)
model = model.to_empty(device="cpu")
model = model.to("cuda:0") may be incorrect when I run forward model(data_cuda)

I have test it, there is no problem

initial commit for aot support

9816c4d

杨熙 added 3 commits December 31, 2024 18:05

fix code style

421568f

add fix to torch and torchrec for aot export in tzrec

265b845

fix code style bug

67c0e39

chengmengli06 force-pushed the add_aot branch from ce3bbee to 67c0e39 Compare January 2, 2025 02:45

tiankongdeguiji requested changes Jan 2, 2025

View reviewed changes

tiankongdeguiji reviewed Jan 2, 2025

View reviewed changes

yjjinjie reviewed Jan 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial commit for aot support #79

initial commit for aot support #79

chengmengli06 commented Dec 31, 2024

CLAassistant commented Dec 31, 2024

tiankongdeguiji Jan 2, 2025

tiankongdeguiji Jan 2, 2025

tiankongdeguiji Jan 2, 2025

tiankongdeguiji Jan 2, 2025

tiankongdeguiji Jan 2, 2025

tiankongdeguiji Jan 2, 2025

tiankongdeguiji Jan 2, 2025

tiankongdeguiji Jan 2, 2025

tiankongdeguiji Jan 2, 2025

tiankongdeguiji Jan 2, 2025

tiankongdeguiji Jan 2, 2025

yjjinjie Jan 2, 2025

yjjinjie Jan 2, 2025

yjjinjie Jan 2, 2025

chengmengli06 Jan 2, 2025

yjjinjie Jan 2, 2025

chengmengli06 Jan 2, 2025

		@@ -0,0 +1,7 @@
		rm -rf experiments/multi_tower_din_taobao_local/export

		@@ -1,4 +1,4 @@
		# Copyright (c) 2024, Alibaba Group;
		# Copyright (c) 2024-2025, Alibaba Group;

		@@ -20,6 +20,7 @@
		from torch_tensorrt.dynamo.conversion.converter_utils import get_positive_dim
		from torch_tensorrt.dynamo.utils import to_torch_device



		class ScriptWrapperAOT(ScriptWrapper):
		"""Model inference wrapper for aot export."""

initial commit for aot support #79

Are you sure you want to change the base?

initial commit for aot support #79

Conversation

chengmengli06 commented Dec 31, 2024

CLAassistant commented Dec 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment