Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0 MHA found #2093

Open
tatiana-iazykova opened this issue Dec 16, 2024 · 1 comment
Open

0 MHA found #2093

tatiana-iazykova opened this issue Dec 16, 2024 · 1 comment

Comments

@tatiana-iazykova
Copy link

Hi!
I tried to prune my model (mistralai/Mistral-7B-v0.1) with the following config

    pruning_config = WeightPruningConfig(
        pruning_type="snip_momentum_progressive",
        start_step=0,
        end_step=15,
        sparsity_decay_type='exp',
        pruning_op_types=["Linear"],
        op_names=['.*.self_attn'],
        excluded_op_names=["lm_head", "embed_tokens"],
        max_sparsity_ratio_per_op=0.98,
        pruning_scope="global",
    )   

However when I try to slim the model, the logger says that 0 MHA found

@tatiana-iazykova
Copy link
Author

If I try to slim_model in bfloat16 the log is the following

2024-12-16 10:49:21 [WARNING][logger.py:132] You are using model slim methods, some weight channels will be removed permanently.
2024-12-16 10:49:21 [INFO][logger.py:114] Generating static graph from original model using auto dummy input: start.
/usr/local/lib/python3.11/dist-packages/transformers/modeling_utils.py:5005: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
/usr/local/lib/python3.11/dist-packages/transformers/models/mistral/modeling_mistral.py:256: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
2024-12-16 10:55:03 [INFO][logger.py:114] Generating static graph from original model using auto dummy input: success.
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.0.mlp.down_proj', 'target_frontier_linears': ['model.layers.0.mlp.gate_proj', 'model.layers.0.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.1.mlp.down_proj', 'target_frontier_linears': ['model.layers.1.mlp.gate_proj', 'model.layers.1.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.2.mlp.down_proj', 'target_frontier_linears': ['model.layers.2.mlp.gate_proj', 'model.layers.2.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.3.mlp.down_proj', 'target_frontier_linears': ['model.layers.3.mlp.gate_proj', 'model.layers.3.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.4.mlp.down_proj', 'target_frontier_linears': ['model.layers.4.mlp.gate_proj', 'model.layers.4.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.5.mlp.down_proj', 'target_frontier_linears': ['model.layers.5.mlp.gate_proj', 'model.layers.5.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.6.mlp.down_proj', 'target_frontier_linears': ['model.layers.6.mlp.gate_proj', 'model.layers.6.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.7.mlp.down_proj', 'target_frontier_linears': ['model.layers.7.mlp.gate_proj', 'model.layers.7.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.8.mlp.down_proj', 'target_frontier_linears': ['model.layers.8.mlp.gate_proj', 'model.layers.8.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.9.mlp.down_proj', 'target_frontier_linears': ['model.layers.9.mlp.gate_proj', 'model.layers.9.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.10.mlp.down_proj', 'target_frontier_linears': ['model.layers.10.mlp.gate_proj', 'model.layers.10.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.11.mlp.down_proj', 'target_frontier_linears': ['model.layers.11.mlp.gate_proj', 'model.layers.11.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.12.mlp.down_proj', 'target_frontier_linears': ['model.layers.12.mlp.gate_proj', 'model.layers.12.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.13.mlp.down_proj', 'target_frontier_linears': ['model.layers.13.mlp.gate_proj', 'model.layers.13.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.14.mlp.down_proj', 'target_frontier_linears': ['model.layers.14.mlp.gate_proj', 'model.layers.14.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.15.mlp.down_proj', 'target_frontier_linears': ['model.layers.15.mlp.gate_proj', 'model.layers.15.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.16.mlp.down_proj', 'target_frontier_linears': ['model.layers.16.mlp.gate_proj', 'model.layers.16.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.17.mlp.down_proj', 'target_frontier_linears': ['model.layers.17.mlp.gate_proj', 'model.layers.17.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.18.mlp.down_proj', 'target_frontier_linears': ['model.layers.18.mlp.gate_proj', 'model.layers.18.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.19.mlp.down_proj', 'target_frontier_linears': ['model.layers.19.mlp.gate_proj', 'model.layers.19.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.20.mlp.down_proj', 'target_frontier_linears': ['model.layers.20.mlp.gate_proj', 'model.layers.20.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.21.mlp.down_proj', 'target_frontier_linears': ['model.layers.21.mlp.gate_proj', 'model.layers.21.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.22.mlp.down_proj', 'target_frontier_linears': ['model.layers.22.mlp.gate_proj', 'model.layers.22.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.23.mlp.down_proj', 'target_frontier_linears': ['model.layers.23.mlp.gate_proj', 'model.layers.23.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.24.mlp.down_proj', 'target_frontier_linears': ['model.layers.24.mlp.gate_proj', 'model.layers.24.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.25.mlp.down_proj', 'target_frontier_linears': ['model.layers.25.mlp.gate_proj', 'model.layers.25.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.26.mlp.down_proj', 'target_frontier_linears': ['model.layers.26.mlp.gate_proj', 'model.layers.26.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.27.mlp.down_proj', 'target_frontier_linears': ['model.layers.27.mlp.gate_proj', 'model.layers.27.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.28.mlp.down_proj', 'target_frontier_linears': ['model.layers.28.mlp.gate_proj', 'model.layers.28.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.29.mlp.down_proj', 'target_frontier_linears': ['model.layers.29.mlp.gate_proj', 'model.layers.29.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.30.mlp.down_proj', 'target_frontier_linears': ['model.layers.30.mlp.gate_proj', 'model.layers.30.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] {'root_linear': 'model.layers.31.mlp.down_proj', 'target_frontier_linears': ['model.layers.31.mlp.gate_proj', 'model.layers.31.mlp.up_proj']}
2024-12-16 10:55:07 [INFO][logger.py:114] Found 32 linear2linear structures
2024-12-16 10:55:07 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:07 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:07 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:08 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:08 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:08 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:09 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:09 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:09 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:09 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:09 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:09 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:10 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:10 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:10 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:11 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:11 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:11 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:11 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:11 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:11 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:12 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:12 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:12 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:13 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:13 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:13 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:13 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:13 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:13 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:14 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:14 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:14 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:15 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:15 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:15 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:15 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:15 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:15 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:16 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:16 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:16 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:17 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:17 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:17 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:17 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:17 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:17 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:18 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:18 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:18 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:19 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:19 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:19 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:19 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:19 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:19 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:20 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:20 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:20 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:21 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:21 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:21 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:21 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:21 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:21 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:22 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:22 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:22 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:23 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:23 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:23 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:23 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:23 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:23 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:24 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:24 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:24 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:25 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:25 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:25 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:25 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:25 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:25 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:26 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:26 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:26 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:27 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:27 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:27 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:27 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:27 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:27 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:28 [INFO][logger.py:114] linear compression: [4096, 14336] -> [4096, 14336]
2024-12-16 10:55:28 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:28 [INFO][logger.py:114] linear compression: [14336, 4096] -> [14336, 4096]
2024-12-16 10:55:28 [INFO][logger.py:114] Post pruning model slim finished.
2024-12-16 10:55:29 [WARNING][logger.py:132] You are using model slim methods, some attention heads will be removed permanently.
2024-12-16 10:55:29 [INFO][logger.py:114] Generating static graph from original model using auto dummy input: start.
2024-12-16 11:01:10 [INFO][logger.py:114] Generating static graph from original model using auto dummy input: success.
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.0.self_attn.q_proj', 'model.layers.0.self_attn.k_proj', 'model.layers.0.self_attn.v_proj'], 'ffn': ['model.layers.0.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.1.self_attn.q_proj', 'model.layers.1.self_attn.k_proj', 'model.layers.1.self_attn.v_proj'], 'ffn': ['model.layers.1.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.2.self_attn.q_proj', 'model.layers.2.self_attn.k_proj', 'model.layers.2.self_attn.v_proj'], 'ffn': ['model.layers.2.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.3.self_attn.q_proj', 'model.layers.3.self_attn.k_proj', 'model.layers.3.self_attn.v_proj'], 'ffn': ['model.layers.3.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.4.self_attn.q_proj', 'model.layers.4.self_attn.k_proj', 'model.layers.4.self_attn.v_proj'], 'ffn': ['model.layers.4.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.5.self_attn.q_proj', 'model.layers.5.self_attn.k_proj', 'model.layers.5.self_attn.v_proj'], 'ffn': ['model.layers.5.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.6.self_attn.q_proj', 'model.layers.6.self_attn.k_proj', 'model.layers.6.self_attn.v_proj'], 'ffn': ['model.layers.6.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.7.self_attn.q_proj', 'model.layers.7.self_attn.k_proj', 'model.layers.7.self_attn.v_proj'], 'ffn': ['model.layers.7.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.8.self_attn.q_proj', 'model.layers.8.self_attn.k_proj', 'model.layers.8.self_attn.v_proj'], 'ffn': ['model.layers.8.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.9.self_attn.q_proj', 'model.layers.9.self_attn.k_proj', 'model.layers.9.self_attn.v_proj'], 'ffn': ['model.layers.9.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.10.self_attn.q_proj', 'model.layers.10.self_attn.k_proj', 'model.layers.10.self_attn.v_proj'], 'ffn': ['model.layers.10.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.11.self_attn.q_proj', 'model.layers.11.self_attn.k_proj', 'model.layers.11.self_attn.v_proj'], 'ffn': ['model.layers.11.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.12.self_attn.q_proj', 'model.layers.12.self_attn.k_proj', 'model.layers.12.self_attn.v_proj'], 'ffn': ['model.layers.12.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.13.self_attn.q_proj', 'model.layers.13.self_attn.k_proj', 'model.layers.13.self_attn.v_proj'], 'ffn': ['model.layers.13.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.14.self_attn.q_proj', 'model.layers.14.self_attn.k_proj', 'model.layers.14.self_attn.v_proj'], 'ffn': ['model.layers.14.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.15.self_attn.q_proj', 'model.layers.15.self_attn.k_proj', 'model.layers.15.self_attn.v_proj'], 'ffn': ['model.layers.15.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.16.self_attn.q_proj', 'model.layers.16.self_attn.k_proj', 'model.layers.16.self_attn.v_proj'], 'ffn': ['model.layers.16.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.17.self_attn.q_proj', 'model.layers.17.self_attn.k_proj', 'model.layers.17.self_attn.v_proj'], 'ffn': ['model.layers.17.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.18.self_attn.q_proj', 'model.layers.18.self_attn.k_proj', 'model.layers.18.self_attn.v_proj'], 'ffn': ['model.layers.18.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.19.self_attn.q_proj', 'model.layers.19.self_attn.k_proj', 'model.layers.19.self_attn.v_proj'], 'ffn': ['model.layers.19.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.20.self_attn.q_proj', 'model.layers.20.self_attn.k_proj', 'model.layers.20.self_attn.v_proj'], 'ffn': ['model.layers.20.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.21.self_attn.q_proj', 'model.layers.21.self_attn.k_proj', 'model.layers.21.self_attn.v_proj'], 'ffn': ['model.layers.21.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.22.self_attn.q_proj', 'model.layers.22.self_attn.k_proj', 'model.layers.22.self_attn.v_proj'], 'ffn': ['model.layers.22.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.23.self_attn.q_proj', 'model.layers.23.self_attn.k_proj', 'model.layers.23.self_attn.v_proj'], 'ffn': ['model.layers.23.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.24.self_attn.q_proj', 'model.layers.24.self_attn.k_proj', 'model.layers.24.self_attn.v_proj'], 'ffn': ['model.layers.24.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.25.self_attn.q_proj', 'model.layers.25.self_attn.k_proj', 'model.layers.25.self_attn.v_proj'], 'ffn': ['model.layers.25.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.26.self_attn.q_proj', 'model.layers.26.self_attn.k_proj', 'model.layers.26.self_attn.v_proj'], 'ffn': ['model.layers.26.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.27.self_attn.q_proj', 'model.layers.27.self_attn.k_proj', 'model.layers.27.self_attn.v_proj'], 'ffn': ['model.layers.27.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.28.self_attn.q_proj', 'model.layers.28.self_attn.k_proj', 'model.layers.28.self_attn.v_proj'], 'ffn': ['model.layers.28.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.29.self_attn.q_proj', 'model.layers.29.self_attn.k_proj', 'model.layers.29.self_attn.v_proj'], 'ffn': ['model.layers.29.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.30.self_attn.q_proj', 'model.layers.30.self_attn.k_proj', 'model.layers.30.self_attn.v_proj'], 'ffn': ['model.layers.30.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] {'qkv': ['model.layers.31.self_attn.q_proj', 'model.layers.31.self_attn.k_proj', 'model.layers.31.self_attn.v_proj'], 'ffn': ['model.layers.31.self_attn.o_proj']}
2024-12-16 11:01:10 [INFO][logger.py:114] Found 32 MHA modules
2024-12-16 11:01:10 [INFO][logger.py:114] Following attributes are hooked and might be modified: {'head_nums': 'num_heads', 'head_size': 'head_dim', 'hidden_size': 'hidden_size'}
2024-12-16 11:01:10 [INFO][logger.py:114] head indice to be slim: []
2024-12-16 11:01:10 [WARNING][logger.py:132] model MHA slim failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant