You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 30, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: docs/add_customized_pattern.md
+15-15Lines changed: 15 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,17 +5,17 @@
5
5
-[Fuse Pattern and Set Attributes of New Pattern after Fusion](#fuse-pattern-and-set-attributes-of-new-pattern-after-fusion)
6
6
7
7
## Introduction
8
-
The `Neural Engine` in `Intel® Extension for Transformers` support user to add customized pattern of model, which means you can compile your own pretrained model to `Neural Engine` IR (Intermediate Representation) just by adding the specific patterns which the [`compile`](/intel_extension_for_transformers/backends/neural_engine/compile) does not contain.
8
+
The `Neural Engine` in `Intel® Extension for Transformers` support user to add customized pattern of model, which means you can compile your own pretrained model to `Neural Engine` IR (Intermediate Representation) just by adding the specific patterns which the [`compile`](/intel_extension_for_transformers/llm/runtime/compile) does not contain.
9
9
10
10
The intermediate graph in `Neural Engine` can be treated as a `list` that stores all nodes of the model under control flow. Some certain nodes may compose a pattern which needs to be fused for speeding up inference. For simplifying the network structure, we also design different attributes attached to fused nodes. To aim at adding a customized pattern, there are three steps: **1. register the nodes' op_types; 2. set the pattern mapping config and register the pattern; 3. fuse pattern and set attributes of the new pattern after fusion.**
11
11
12
12

13
13
14
-
Above is a `LayerNorm` pattern in the `Distilbert_Base` onnx model. Assume it is a customized pattern in your model that need to be added in [`compile`](/intel_extension_for_transformers/backends/neural_engine/compile). Follow the steps below to make `Neural Engine` support this pattern, and fuse these 9 nodes to one node called `LayerNorm`.
14
+
Above is a `LayerNorm` pattern in the `Distilbert_Base` onnx model. Assume it is a customized pattern in your model that need to be added in [`compile`](/intel_extension_for_transformers/llm/runtime/compile). Follow the steps below to make `Neural Engine` support this pattern, and fuse these 9 nodes to one node called `LayerNorm`.
15
15
16
16
## Register the Nodes' Op Types
17
17
18
-
First, you should check whether the nodes' op_types in the pattern are registered in `Engine` or not. If not, you need to add the op_type class for [`compile`](/intel_extension_for_transformers/backends/neural_engine/compile) loading and extracting the origin model. All the ops can be found from the [`compile.ops`](/intel_extension_for_transformers/backends/neural_engine/compile/ops). For quick check, use the commands below.
18
+
First, you should check whether the nodes' op_types in the pattern are registered in `Engine` or not. If not, you need to add the op_type class for [`compile`](/intel_extension_for_transformers/llm/runtime/compile) loading and extracting the origin model. All the ops can be found from the [`compile.ops`](/intel_extension_for_transformers/llm/runtime/compile/ops). For quick check, use the commands below.
19
19
20
20
```python
21
21
# make sure you have cloned intel_extension_for_transformers repo and installed intel_extension_for_transformers
@@ -30,11 +30,11 @@ The print result will show all registered ops, for example:
These ops can be roughly divided into two categories, the one is without attributes, like `Mul`, the other one is with attributes, for example, `Reshape` has the attributes `dst_shape`. You can look through the [`executor`](/intel_extension_for_transformers/backends/neural_engine/executor) for more info about the `Neural Engine` ops' attribute settings.
33
+
These ops can be roughly divided into two categories, the one is without attributes, like `Mul`, the other one is with attributes, for example, `Reshape` has the attributes `dst_shape`. You can look through the [`executor`](/intel_extension_for_transformers/llm/runtime/executor) for more info about the `Neural Engine` ops' attribute settings.
34
34
35
-
Assume the `Sqrt` and `ReduceMean` in `LayerNorm` pattern are new op_types for [`compile`](/intel_extension_for_transformers/backends/neural_engine/compile). Here are the examples that show how to register them.
35
+
Assume the `Sqrt` and `ReduceMean` in `LayerNorm` pattern are new op_types for [`compile`](/intel_extension_for_transformers/llm/runtime/compile). Here are the examples that show how to register them.
36
36
37
-
`Sqrt` has no attributes. You can add this op class in [`compile.ops.empty_ops`](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/backends/neural_engine/compile/ops/empty_ops.py).
37
+
`Sqrt` has no attributes. You can add this op class in [`compile.ops.empty_ops`](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/compile/ops/empty_ops.py).
38
38
39
39
```python
40
40
# register the 'Sqrt' class in OPERATORS
@@ -47,9 +47,9 @@ class Sqrt(Operator):
47
47
48
48
`ReduceMean` has `keep_dims` and `axis` two attributes, you need to set them by extracting the node from the origin model.
49
49
50
-
Create a python file (for example, name can be `reduce_mean.py`) in [`compile.ops`](/intel_extension_for_transformers/backends/neural_engine/compile/ops) and add the `ReduceMean` op class.
50
+
Create a python file (for example, name can be `reduce_mean.py`) in [`compile.ops`](/intel_extension_for_transformers/llm/runtime/compile/ops) and add the `ReduceMean` op class.
51
51
52
-
In this `LayerNorm` pattern, the `ReduceMean` node in origin onnx model just has `axes` value which is a list, that is the value of `axis` attribute comes from. The `keep_dims` attribute is `False` by default in [`executor`](/intel_extension_for_transformers/backends/neural_engine/executor), so if the `ReduceMean` node has the `keep_dims` attribute, you should extract and set it. Otherwise, you can just ignore it.
52
+
In this `LayerNorm` pattern, the `ReduceMean` node in origin onnx model just has `axes` value which is a list, that is the value of `axis` attribute comes from. The `keep_dims` attribute is `False` by default in [`executor`](/intel_extension_for_transformers/llm/runtime/executor), so if the `ReduceMean` node has the `keep_dims` attribute, you should extract and set it. Otherwise, you can just ignore it.
53
53
54
54
```python
55
55
from .op import Operator, operator_registry
@@ -99,9 +99,9 @@ If nothing wrong, the output result should be `True`.
99
99
100
100
## Set the Pattern Mapping Config and Register the Pattern
101
101
102
-
In `Neural Engine`, we treat the pattern fusion as the process of pattern mapping: from a group nodes to another group nodes. In this step, you need to provide a config for `pattern_mapping` function and register your pattern, in order to make sure the [`compile`](/intel_extension_for_transformers/backends/neural_engine/compile) implements pattern fusion correctly.
102
+
In `Neural Engine`, we treat the pattern fusion as the process of pattern mapping: from a group nodes to another group nodes. In this step, you need to provide a config for `pattern_mapping` function and register your pattern, in order to make sure the [`compile`](/intel_extension_for_transformers/llm/runtime/compile) implements pattern fusion correctly.
103
103
104
-
- Create a python file (for example, name can be `layer_norm.py`) in [`compile.sub_graph`](/intel_extension_for_transformers/backends/neural_engine/compile/sub_graph) and add the `LayerNorm` pattern mapping config.
104
+
- Create a python file (for example, name can be `layer_norm.py`) in [`compile.sub_graph`](/intel_extension_for_transformers/llm/runtime/compile/sub_graph) and add the `LayerNorm` pattern mapping config.
105
105
106
106
For the above `LayerNorm` pattern, the config example can be like this:
107
107
@@ -139,7 +139,7 @@ In `Neural Engine`, we treat the pattern fusion as the process of pattern mappin
139
139
}
140
140
```
141
141
142
-
The dict in the config will guide the `pattern_mapping` function on how to find all the group nodes that belong to `LayerNorm` pattern in intermediate graph and how to replace them with new pattern. We use this config to store many dicts because different models (even the same model) could have different representations for a certain pattern. If you want to delve into it, please see [pattern_recognize](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/backends/neural_engine/docs/pattern_recognize.md) and [graph_fusion](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/backends/neural_engine/docs/graph_fusion.md) docs for more details.
142
+
The dict in the config will guide the `pattern_mapping` function on how to find all the group nodes that belong to `LayerNorm` pattern in intermediate graph and how to replace them with new pattern. We use this config to store many dicts because different models (even the same model) could have different representations for a certain pattern. If you want to delve into it, please see [pattern_recognize](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/docs/pattern_recognize.md) and [graph_fusion](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/docs/graph_fusion.md) docs for more details.
143
143
144
144
- Register the `LayerNorm` pattern
145
145
@@ -213,9 +213,9 @@ In `Neural Engine`, we treat the pattern fusion as the process of pattern mappin
213
213
214
214
- Define the pattern fusion order
215
215
216
-
Fusing patterns should follow specific order if a model has multiple patterns. For example, if the model has A pattern (nodes: a-->b) and B pattern (nodes: a-->b-->c), and B pattern is actually equivalent to A pattern + c node. So you should fuse A pattern first, then B pattern (more info and details please see the [graph_fusion](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/backends/neural_engine/docs/graph_fusion.md)).
216
+
Fusing patterns should follow specific order if a model has multiple patterns. For example, if the model has A pattern (nodes: a-->b) and B pattern (nodes: a-->b-->c), and B pattern is actually equivalent to A pattern + c node. So you should fuse A pattern first, then B pattern (more info and details please see the [graph_fusion](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/docs/graph_fusion.md)).
217
217
218
-
There is a list called `supported_patterns` in [`compile.sub_graph.pattern`](/intel_extension_for_transformers/backends/neural_engine/compile/sub_graph/pattern.py). It controls the order of pattern fusion. You need to add your customized pattern name (the `pattern_type` you register in step 2) into `supported_patterns` at appropriate location (If a pattern does not influence other patterns, you can put it at an arbitrary location).
218
+
There is a list called `supported_patterns` in [`compile.sub_graph.pattern`](/intel_extension_for_transformers/llm/runtime/compile/sub_graph/pattern.py). It controls the order of pattern fusion. You need to add your customized pattern name (the `pattern_type` you register in step 2) into `supported_patterns` at appropriate location (If a pattern does not influence other patterns, you can put it at an arbitrary location).
219
219
220
220
For example, change the `supported_patterns` like:
221
221
@@ -245,7 +245,7 @@ In `Neural Engine`, we treat the pattern fusion as the process of pattern mappin
245
245
246
246
- Set the attributes of new pattern
247
247
248
-
Every new pattern generated after fusion could have its attributes (when we talk about pattern attributes, it stands for the operator's attributes in the pattern, which are defined by the [`executor`](/intel_extension_for_transformers/backends/neural_engine/executor) ). As for `LayerNorm` pattern, the above 9 nodes are fused to one node with op_type `LayerNorm`. This operation has an attribute `epsilon` in [`executor`](/intel_extension_for_transformers/backends/neural_engine/executor), which is a value added to the denominator for numerical stability.
248
+
Every new pattern generated after fusion could have its attributes (when we talk about pattern attributes, it stands for the operator's attributes in the pattern, which are defined by the [`executor`](/intel_extension_for_transformers/llm/runtime/executor) ). As for `LayerNorm` pattern, the above 9 nodes are fused to one node with op_type `LayerNorm`. This operation has an attribute `epsilon` in [`executor`](/intel_extension_for_transformers/llm/runtime/executor), which is a value added to the denominator for numerical stability.
249
249
250
250
We recommend to write a `_set_attr` function and call it after pattern mapping to set the nodes' attributes. Here is the example for `LayerNorm` pattern.
251
251
@@ -335,7 +335,7 @@ class LayerNorm(Pattern):
335
335
return model
336
336
```
337
337
338
-
After finishing these three steps in [`compile`](/intel_extension_for_transformers/backends/neural_engine/compile), reinstall `intel_extension_for_transformers` and then use [`compile`](/intel_extension_for_transformers/backends/neural_engine/compile) function would compile your model with the customized pattern.
338
+
After finishing these three steps in [`compile`](/intel_extension_for_transformers/llm/runtime/compile), reinstall `intel_extension_for_transformers` and then use [`compile`](/intel_extension_for_transformers/llm/runtime/compile) function would compile your model with the customized pattern.
0 commit comments