-
Notifications
You must be signed in to change notification settings - Fork 531
Add adapter #1545
base: master
Are you sure you want to change the base?
Add adapter #1545
Conversation
backbone = Model.from_cfg(cfg) | ||
# Load local backbone parameters if backbone_path provided. | ||
# Otherwise, download backbone parameters from gluon zoo. | ||
|
||
backbone_params_path = backbone_path if backbone_path else download_params_path | ||
if checkpoint_path is None: | ||
backbone.load_parameters(backbone_params_path, ignore_extra=True, | ||
backbone.load_parameters(backbone_params_path, ignore_extra=True, allow_missing=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the following be safer?
backbone.load_parameters(backbone_params_path, ignore_extra=True, allow_missing=True, | |
backbone.load_parameters(backbone_params_path, ignore_extra=True, allow_missing=(method == 'adapter'), | |
src/gluonnlp/layers.py
Outdated
@@ -28,6 +28,8 @@ | |||
import numpy as _np | |||
from typing import Union, Optional, List, Dict | |||
from .op import relative_position_bucket | |||
#from .attention_cell import MultiHeadAttentionCell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a circular import, as attention_cell also imports layers.
gluon-nlp/src/gluonnlp/attention_cell.py
Lines 25 to 27 in 65c3047
from .layers import SinusoidalPositionalEmbedding,\ | |
BucketPositionalEmbedding,\ | |
LearnedPositionalEmbedding |
To solve this, two options are to either move SinusoidalPositionalEmbedding,
BucketPositionalEmbedding,
LearnedPositionalEmbedding out of the layers.py into a new file and change the import in attention_cell. Or you can move AdapterModule into a new file. You can also come up with other solutions
parser.add_argument('--method', type=str, default='full', choices=['full', 'bias', 'subbias', 'adapter'], | ||
help='different finetune method') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like to edit the README file to include results for (at least some of) the different choices (and references to the papers)?
Codecov Report
@@ Coverage Diff @@
## master #1545 +/- ##
==========================================
- Coverage 82.20% 81.48% -0.72%
==========================================
Files 68 68
Lines 8540 8432 -108
==========================================
- Hits 7020 6871 -149
- Misses 1520 1561 +41
Continue to review full report at Codecov.
|
@@ -626,39 +623,33 @@ def layout(self): | |||
def forward(self, inputs, token_types, valid_length, | |||
masked_positions): | |||
"""Getting the scores of the masked positions. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is required.
Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description. The summary line may be used by automatic indexing tools; it is important that it fits on one line and is separated from the rest of the docstring by a blank line. The summary line may be on the same line as the opening quotes or on the next line. The entire docstring is indented the same as the quotes at its first line (see example below).
https://www.python.org/dev/peps/pep-0257/#multi-line-docstrings
out = self.down_proj(data) | ||
out = self.activate(out) | ||
out = self.up_proj(out) | ||
return out + residual |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may not need a separate argument "residual
" here. The residual connection described in the paper refers to doing return out + data
, where data
is the original input before down projection, activation function and up projection.
out = self.ffn_2(out) | ||
out = self.dropout_layer(out) | ||
if self._use_adapter and 'location_1' in self._adapter_config: | ||
out = self.adapter_layer_ffn(out, residual) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on your implementation of BasicAdapter
, you'd need to call this layer as self.adapter_layer_ffn(out, out)
.
def forward(self, query, key, value): | ||
#query bs, length, unit | ||
#key bs, length, num_adapters, unit | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi xingjian @sxjscience ,could you please check these lines with the utilization of einsum? I show the original implementation in comment. And if you want to look, the purpose of these lines are similiar to https://github.com/Adapter-Hub/adapter-transformers/blob/0fe1c19f601b7785273e173d30a9392e407823d1/src/transformers/adapters/modeling.py#L211 from line 211 to line 223
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. One improvement is that there is no need to transpose anymore. You can rely on einsum to fuse these operations in a single op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Thanks!
Description
add adapter and bias-finetune to finetune-script
Checklist
Essentials
Changes
Comments
cc @dmlc/gluon-nlp-team