-
Notifications
You must be signed in to change notification settings - Fork 345
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e0a8a1d
commit bc9f2f5
Showing
16 changed files
with
501 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
# 使用自定义 OP | ||
|
||
当内置的tf算子不能满足业务需求,或者通过组合现有算子实现需求的性能较差时,可以考虑自定义tf的OP。 | ||
|
||
1. 实现自定义算子,编译为动态库 | ||
- 参考官方示例:[TensorFlow Custom Op](https://github.com/tensorflow/custom-op/) | ||
- 注意:自定义Op的编译依赖tf版本需要与执行时的tf版本保持一致 | ||
- 您可能需要为离线训练 与 在线推理服务 编译两个不同依赖环境的动态库 | ||
- 在PAI平台上需要依赖 tf 1.12 版本编译 | ||
- 在EAS的 [EasyRec Processor](https://help.aliyun.com/zh/pai/user-guide/easyrec) 中使用自定义Op需要依赖 tf 2.10.1 编译 | ||
2. 在`EasyRec`中使用自定义Op的步骤 | ||
1. 下载EasyRec的最新[源代码](https://github.com/alibaba/EasyRec) | ||
2. 把上一步编译好的动态库放到`easy_rec/python/ops/${tf_version}`目录,注意版本要子目录名一致 | ||
3. 开发一个使用自定义Op的组件 | ||
- 新组件的代码添加到 `easy_rec/python/layers/keras/custom_ops.py` | ||
- `custom_ops.py` 提供了一个自定义Op组件的示例 | ||
- 声明新组件,在`easy_rec/python/layers/keras/__init__.py`文件中添加导出语句 | ||
4. 编写模型配置文件,使用组件化的方式搭建模型,包含新定义的组件(参考下文) | ||
5. 运行`pai_jobs/deploy_ext.sh`脚本,打包EasyRec,并把打好的资源包(`easy_rec_ext_${version}_res.tar.gz`)上传到MaxCompute项目空间 | ||
6. (在DataWorks里 or 用odpscmd客户端工具) 训练 & 评估 & 导出 模型 | ||
|
||
## 导出自定义Op的动态库到 saved_model 的 assets 目录 | ||
|
||
```bash | ||
pai -name easy_rec_ext | ||
-Dcmd='export' | ||
-Dconfig='oss://cold-start/EasyRec/custom_op/pipeline.config' | ||
-Dexport_dir='oss://cold-start/EasyRec/custom_op/export/final_with_lib' | ||
-Dextra_params='--asset_files oss://cold-start/EasyRec/config/libedit_distance.so' | ||
-Dres_project='pai_rec_test_dev' | ||
-Dversion='0.7.5' | ||
-Dbuckets='oss://cold-start/' | ||
-Darn='acs:ram::XXXXXXXXXX:role/aliyunodpspaidefaultrole' | ||
-DossHost='oss-cn-beijing-internal.aliyuncs.com' | ||
; | ||
``` | ||
|
||
**注意**: | ||
1. 在 训练、评估、导出 命令中需要用`-Dres_project`指定上传easyrec资源包的MaxCompute项目空间名 | ||
2. 在 训练、评估、导出 命令中需要用`-Dversion`指定资源包的版本 | ||
3. asset_files参数指定的动态库会被线上推理服务加载,因此需要在与线上推理服务一致的tf版本上编译。(目前是EAS平台的EasyRec Processor依赖 tf 2.10.1版本)。 | ||
- 如果 asset_files 参数还需要指定其他文件路径(比如 fg.json),多个路径之间用英文逗号隔开。 | ||
4. 再次强调一遍,**导出的动态库依赖的tf版本需要与推理服务依赖的tf版本保持一致** | ||
|
||
## 自定义Op的示例 | ||
|
||
```protobuf | ||
feature_config: { | ||
... | ||
features: { | ||
feature_name: 'raw_genres' | ||
input_names: 'genres' | ||
feature_type: PassThroughFeature | ||
} | ||
features: { | ||
feature_name: 'raw_title' | ||
input_names: 'title' | ||
feature_type: PassThroughFeature | ||
} | ||
} | ||
model_config: { | ||
model_class: 'RankModel' | ||
model_name: 'MLP' | ||
feature_groups: { | ||
group_name: 'text' | ||
feature_names: 'raw_genres' | ||
feature_names: 'raw_title' | ||
wide_deep: DEEP | ||
} | ||
feature_groups: { | ||
group_name: 'features' | ||
feature_names: 'user_id' | ||
feature_names: 'movie_id' | ||
feature_names: 'gender' | ||
feature_names: 'age' | ||
feature_names: 'occupation' | ||
feature_names: 'zip_id' | ||
feature_names: 'movie_year_bin' | ||
wide_deep: DEEP | ||
} | ||
backbone { | ||
blocks { | ||
name: 'text' | ||
inputs { | ||
feature_group_name: 'text' | ||
} | ||
raw_input { | ||
} | ||
} | ||
blocks { | ||
name: 'edit_distance' | ||
inputs { | ||
block_name: 'text' | ||
} | ||
keras_layer { | ||
class_name: 'EditDistance' | ||
} | ||
} | ||
blocks { | ||
name: 'mlp' | ||
inputs { | ||
feature_group_name: 'features' | ||
} | ||
inputs { | ||
block_name: 'edit_distance' | ||
} | ||
keras_layer { | ||
class_name: 'MLP' | ||
mlp { | ||
hidden_units: [256, 128] | ||
} | ||
} | ||
} | ||
} | ||
model_params { | ||
l2_regularization: 1e-5 | ||
} | ||
embedding_regularization: 1e-6 | ||
} | ||
``` | ||
|
||
1. 如果自定义Op需要处理原始输入特征,则在定义特征时指定 `feature_type: PassThroughFeature` | ||
- 非 `PassThroughFeature` 类型的特征会在预处理阶段做一些变换,组件代码里拿不到原始值 | ||
2. 自定义Op需要处理的原始输入特征按照顺序放置到同一个`feature group`内 | ||
3. 配置一个类型为`raw_input`的输入组件,获取原始输入特征 | ||
- 这是目前EasyRec支持的读取原始输入特征的唯一方式 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
# 序列化组件的配置方式 | ||
|
||
序列模型(DIN、BST)的组件化配置方式需要把输入特征放置在同一个`feature_group`内。 | ||
|
||
序列模型一般包含 `history behavior sequence` 与 `target item` 两部分,且每部分都可能包含多个属性(子特征)。 | ||
|
||
在序列组件输入的`feature_group`内,**按照顺序**定义 `history behavior sequence` 与 `target item`的各个子特征。 | ||
|
||
框架按照特征定义的类型`feature_type`字段来识别某个具体的特征是属于 `history behavior sequence` 还是 `target item`。 | ||
所有 `SequenceFeature` 类型的子特征都被识别为`history behavior sequence`的一部分; 所有非`SequenceFeature` 类型的子特征都被识别为`target item`的一部分。 | ||
|
||
**两部分的子特征的顺序需要保持一致**。在下面的例子中, | ||
- `concat([cate_id,brand], axis=-1)` 是`target item`最终的embedding(2D); | ||
- `concat([tag_category_list, tag_brand_list], axis=-1)` 是`history behavior sequence`最终的embedding(3D) | ||
|
||
```protobuf | ||
model_config: { | ||
model_name: 'DIN' | ||
model_class: 'RankModel | ||
... | ||
feature_groups: { | ||
group_name: 'sequence' | ||
feature_names: "cate_id" | ||
feature_names: "brand" | ||
feature_names: "tag_category_list" | ||
feature_names: "tag_brand_list" | ||
wide_deep: DEEP | ||
} | ||
backbone { | ||
blocks { | ||
name: 'seq_input' | ||
inputs { | ||
feature_group_name: 'sequence' | ||
} | ||
input_layer { | ||
output_seq_and_normal_feature: true | ||
} | ||
} | ||
blocks { | ||
name: 'DIN' | ||
inputs { | ||
block_name: 'seq_input' | ||
} | ||
keras_layer { | ||
class_name: 'DIN' | ||
din { | ||
attention_dnn { | ||
hidden_units: 32 | ||
hidden_units: 1 | ||
activation: "dice" | ||
} | ||
need_target_feature: true | ||
} | ||
} | ||
} | ||
... | ||
} | ||
} | ||
``` | ||
|
||
使用序列组件时,必须配置一个`input_layer`类型的`block`,并且配置`output_seq_and_normal_feature: true`参数,如下。 | ||
|
||
```protobuf | ||
blocks { | ||
name: 'seq_input' | ||
inputs { | ||
feature_group_name: 'sequence' | ||
} | ||
input_layer { | ||
output_seq_and_normal_feature: true | ||
} | ||
} | ||
``` | ||
|
||
## 完整的例子 | ||
|
||
- [DIN](../models/din.md) | ||
- [BST](../models/bst.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.