Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[version] support transformers 4.48 & Byebye python 3.8 #6628

Merged
merged 1 commit into from
Jan 30, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,10 @@ jobs:
fail-fast: false
matrix:
python-version:
- "3.8" # TODO: remove py38 in next transformers release
- "3.9"
- "3.10"
- "3.11"
- "3.12"
os:
- "ubuntu-latest"
- "windows-latest"
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -377,11 +377,11 @@ huggingface-cli login

| Mandatory | Minimum | Recommend |
| ------------ | ------- | --------- |
| python | 3.8 | 3.11 |
| python | 3.9 | 3.10 |
| torch | 1.13.1 | 2.4.0 |
| transformers | 4.41.2 | 4.43.4 |
| datasets | 2.16.0 | 2.20.0 |
| accelerate | 0.30.1 | 0.32.0 |
| transformers | 4.41.2 | 4.45.2 |
| datasets | 2.16.0 | 3.2.0 |
| accelerate | 0.34.0 | 1.2.1 |
| peft | 0.11.1 | 0.12.0 |
| trl | 0.8.6 | 0.9.6 |

Expand All @@ -390,8 +390,8 @@ huggingface-cli login
| CUDA | 11.6 | 12.2 |
| deepspeed | 0.10.0 | 0.14.0 |
| bitsandbytes | 0.39.0 | 0.43.1 |
| vllm | 0.4.3 | 0.5.0 |
| flash-attn | 2.3.0 | 2.6.3 |
| vllm | 0.4.3 | 0.6.6 |
| flash-attn | 2.3.0 | 2.7.2 |

### Hardware Requirement

Expand Down
12 changes: 6 additions & 6 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -379,11 +379,11 @@ huggingface-cli login

| 必需项 | 至少 | 推荐 |
| ------------ | ------- | --------- |
| python | 3.8 | 3.11 |
| python | 3.9 | 3.10 |
| torch | 1.13.1 | 2.4.0 |
| transformers | 4.41.2 | 4.43.4 |
| datasets | 2.16.0 | 2.20.0 |
| accelerate | 0.30.1 | 0.32.0 |
| transformers | 4.41.2 | 4.45.2 |
| datasets | 2.16.0 | 3.2.0 |
| accelerate | 0.34.0 | 1.2.1 |
| peft | 0.11.1 | 0.12.0 |
| trl | 0.8.6 | 0.9.6 |

Expand All @@ -392,8 +392,8 @@ huggingface-cli login
| CUDA | 11.6 | 12.2 |
| deepspeed | 0.10.0 | 0.14.0 |
| bitsandbytes | 0.39.0 | 0.43.1 |
| vllm | 0.4.3 | 0.5.0 |
| flash-attn | 2.3.0 | 2.6.3 |
| vllm | 0.4.3 | 0.6.6 |
| flash-attn | 2.3.0 | 2.7.2 |

### 硬件依赖

Expand Down
9 changes: 5 additions & 4 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
transformers>=4.41.2,<=4.46.1
datasets>=2.16.0,<=3.1.0
accelerate>=0.34.0,<=1.0.1
transformers>=4.41.2,<=4.45.2;python_version<'3.10'
transformers>=4.41.2,<=4.48.1,!=4.46.*,!=4.47.*,!=4.48.0;python_version>='3.10'
datasets>=2.16.0,<=3.2.0
accelerate>=0.34.0,<=1.2.1
peft>=0.11.1,<=0.12.0
trl>=0.8.6,<=0.9.6
tokenizers>=0.19.0,<0.20.4
tokenizers>=0.19.0,<=0.21.0
gradio>=4.38.0,<=5.12.0
pandas>=2.0.0
scipy
Expand Down
6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def get_console_scripts() -> List[str]:
"torch": ["torch>=1.13.1"],
"torch-npu": ["torch==2.1.0", "torch-npu==2.1.0.post3", "decorator"],
"metrics": ["nltk", "jieba", "rouge-chinese"],
"deepspeed": ["deepspeed>=0.10.0,<=0.14.4"],
"deepspeed": ["deepspeed>=0.10.0,<=0.16.2"],
"liger-kernel": ["liger-kernel"],
"bitsandbytes": ["bitsandbytes>=0.39.0"],
"hqq": ["hqq"],
Expand Down Expand Up @@ -92,7 +92,7 @@ def main():
url="https://github.com/hiyouga/LLaMA-Factory",
package_dir={"": "src"},
packages=find_packages("src"),
python_requires=">=3.8.0",
python_requires=">=3.9.0",
install_requires=get_requires(),
extras_require=extra_require,
entry_points={"console_scripts": get_console_scripts()},
Expand All @@ -104,10 +104,10 @@ def main():
"License :: OSI Approved :: Apache Software License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
],
)
Expand Down
10 changes: 5 additions & 5 deletions src/llamafactory/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,17 +20,17 @@
Dependency graph:
main:
transformers>=4.41.2,<=4.46.1
datasets>=2.16.0,<=3.1.0
accelerate>=0.34.0,<=1.0.1
transformers>=4.41.2,<=4.48.1,!=4.46.*,!=4.47.*,!=4.48.0
datasets>=2.16.0,<=3.2.0
accelerate>=0.34.0,<=1.2.1
peft>=0.11.1,<=0.12.0
trl>=0.8.6,<=0.9.6
attention:
transformers>=4.42.4 (gemma+fa2)
longlora:
transformers>=4.41.2,<=4.46.1
transformers>=4.41.2,<4.48.0
packing:
transformers>=4.43.0,<=4.46.1
transformers>=4.43.0,<=4.48.1
Disable version checking: DISABLE_VERSION_CHECK=1
Enable VRAM recording: RECORD_VRAM=1
Expand Down
9 changes: 6 additions & 3 deletions src/llamafactory/extras/misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
from transformers.utils.versions import require_version

from . import logging
from .packages import is_transformers_version_greater_than


_is_fp16_available = is_torch_npu_available() or is_torch_cuda_available()
Expand Down Expand Up @@ -93,11 +94,13 @@ def check_dependencies() -> None:
r"""
Checks the version of the required packages.
"""
check_version("transformers>=4.41.2,<=4.46.1")
check_version("datasets>=2.16.0,<=3.1.0")
check_version("accelerate>=0.34.0,<=1.0.1")
check_version("transformers>=4.41.2,<=4.48.1,!=4.46.0,!=4.46.1,!=4.46.2,!=4.46.3,!=4.47.0,!=4.47.1,!=4.48.0")
check_version("datasets>=2.16.0,<=3.2.0")
check_version("accelerate>=0.34.0,<=1.2.1")
check_version("peft>=0.11.1,<=0.12.0")
check_version("trl>=0.8.6,<=0.9.6")
if is_transformers_version_greater_than("4.46.0") and not is_transformers_version_greater_than("4.48.1"):
logger.warning_rank0_once("There are known bugs in transformers v4.46.0-v4.48.0, please use other versions.")


def calculate_tps(dataset: Sequence[Dict[str, Any]], metrics: Dict[str, float], stage: Literal["sft", "rm"]) -> float:
Expand Down
5 changes: 0 additions & 5 deletions src/llamafactory/extras/packages.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,11 +87,6 @@ def is_transformers_version_greater_than(content: str):
return _get_package_version("transformers") >= version.parse(content)


@lru_cache
def is_transformers_version_equal_to_4_46():
return version.parse("4.46.0") <= _get_package_version("transformers") <= version.parse("4.46.1")


def is_uvicorn_available():
return _is_package_available("uvicorn")

Expand Down
2 changes: 1 addition & 1 deletion src/llamafactory/model/model_utils/longlora.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,7 @@ def shift(state: "torch.Tensor") -> "torch.Tensor":


def _apply_llama_patch() -> None:
check_version("transformers>=4.41.2,<=4.46.1")
check_version("transformers>=4.41.2,<4.48.0")
LlamaAttention.forward = llama_attention_forward
LlamaFlashAttention2.forward = llama_flash_attention_2_forward
LlamaSdpaAttention.forward = llama_sdpa_attention_forward
Expand Down
2 changes: 1 addition & 1 deletion src/llamafactory/model/model_utils/packing.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,6 @@ def configure_packing(model_args: "ModelArguments", is_trainable: bool) -> None:
if not is_trainable or not model_args.block_diag_attn:
return

check_version("transformers>=4.43.0,<=4.46.1")
check_version("transformers>=4.43.0,<=4.48.1")
transformers.modeling_flash_attention_utils._get_unpad_data = get_unpad_data
logger.info_rank0("Using block diagonal attention for sequence packing without cross-attention.")
17 changes: 5 additions & 12 deletions src/llamafactory/train/dpo/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
from typing_extensions import override

from ...extras.constants import IGNORE_INDEX
from ...extras.packages import is_transformers_version_equal_to_4_46, is_transformers_version_greater_than
from ...extras.packages import is_transformers_version_greater_than
from ..callbacks import SaveProcessorCallback
from ..trainer_utils import create_custom_optimizer, create_custom_scheduler, get_batch_logps, nested_detach

Expand Down Expand Up @@ -282,19 +282,12 @@ def compute_loss(
self, model: "PreTrainedModel", inputs: Dict[str, "torch.Tensor"], return_outputs: bool = False, **kwargs
) -> Union["torch.Tensor", Tuple["torch.Tensor", List["torch.Tensor"]]]:
r"""
Fixes the loss value. See https://github.com/huggingface/transformers/pull/35438 for details.
Subclass and override to accept extra kwargs.
"""
loss = super().compute_loss(model, inputs, return_outputs)
if is_transformers_version_equal_to_4_46() and kwargs.get("num_items_in_batch"):
if return_outputs:
loss = (loss[0] / self.args.gradient_accumulation_steps, *loss[1:])
else:
loss = loss / self.args.gradient_accumulation_steps

return loss
return super().compute_loss(model, inputs, return_outputs)

@override
def log(self, logs: Dict[str, float]) -> None:
def log(self, logs: Dict[str, float], *args, **kwargs) -> None:
r"""
Log `logs` on the various objects watching training, including stored metrics.
"""
Expand All @@ -318,4 +311,4 @@ def log(self, logs: Dict[str, float]) -> None:
if not key.startswith("dummy_"):
logs[key] = metric

return Trainer.log(self, logs)
return Trainer.log(self, logs, *args, **kwargs)
17 changes: 5 additions & 12 deletions src/llamafactory/train/kto/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
from typing_extensions import override

from ...extras.constants import IGNORE_INDEX
from ...extras.packages import is_transformers_version_equal_to_4_46, is_transformers_version_greater_than
from ...extras.packages import is_transformers_version_greater_than
from ..callbacks import SaveProcessorCallback
from ..trainer_utils import create_custom_optimizer, create_custom_scheduler, get_batch_logps, nested_detach

Expand Down Expand Up @@ -256,19 +256,12 @@ def compute_loss(
self, model: "PreTrainedModel", inputs: Dict[str, "torch.Tensor"], return_outputs: bool = False, **kwargs
) -> Union["torch.Tensor", Tuple["torch.Tensor", List["torch.Tensor"]]]:
r"""
Fixes the loss value. See https://github.com/huggingface/transformers/pull/35438 for details.
Subclass and override to accept extra kwargs.
"""
loss = super().compute_loss(model, inputs, return_outputs)
if is_transformers_version_equal_to_4_46() and kwargs.get("num_items_in_batch"):
if return_outputs:
loss = (loss[0] / self.args.gradient_accumulation_steps, *loss[1:])
else:
loss = loss / self.args.gradient_accumulation_steps

return loss
return super().compute_loss(model, inputs, return_outputs)

@override
def log(self, logs: Dict[str, float]) -> None:
def log(self, logs: Dict[str, float], *args, **kwargs) -> None:
r"""
Log `logs` on the various objects watching training, including stored metrics.
"""
Expand Down Expand Up @@ -304,4 +297,4 @@ def log(self, logs: Dict[str, float]) -> None:
if not key.startswith("dummy_"):
logs[key] = metric

return Trainer.log(self, logs)
return Trainer.log(self, logs, *args, **kwargs)
22 changes: 2 additions & 20 deletions src/llamafactory/train/pt/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# limitations under the License.

from types import MethodType
from typing import TYPE_CHECKING, Dict, List, Optional, Tuple, Union
from typing import TYPE_CHECKING, Optional

import torch
from transformers import Trainer
Expand All @@ -25,7 +25,7 @@


if TYPE_CHECKING:
from transformers import PreTrainedModel, ProcessorMixin
from transformers import ProcessorMixin

from ...hparams import FinetuningArguments

Expand Down Expand Up @@ -72,21 +72,3 @@ def _get_train_sampler(self) -> Optional["torch.utils.data.Sampler"]:
return torch.utils.data.SequentialSampler(self.train_dataset)

return super()._get_train_sampler()

@override
def compute_loss(
self, model: "PreTrainedModel", inputs: Dict[str, "torch.Tensor"], return_outputs: bool = False, **kwargs
) -> Union["torch.Tensor", Tuple["torch.Tensor", List["torch.Tensor"]]]:
r"""
Fixes the loss value. See https://github.com/huggingface/transformers/pull/35438 for details.

It should be removed after https://github.com/huggingface/transformers/pull/35651 is merged.
"""
loss = super().compute_loss(model, inputs, return_outputs, **kwargs)
if kwargs.get("num_items_in_batch") and not getattr(self, "model_accepts_loss_kwargs", False):
if return_outputs:
loss = (loss[0] / self.args.gradient_accumulation_steps, *loss[1:])
else:
loss = loss / self.args.gradient_accumulation_steps

return loss
6 changes: 1 addition & 5 deletions src/llamafactory/train/rm/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from typing_extensions import override

from ...extras import logging
from ...extras.packages import is_transformers_version_equal_to_4_46, is_transformers_version_greater_than
from ...extras.packages import is_transformers_version_greater_than
from ..callbacks import FixValueHeadModelCallback, SaveProcessorCallback
from ..trainer_utils import create_custom_optimizer, create_custom_scheduler

Expand Down Expand Up @@ -107,10 +107,6 @@ def compute_loss(
chosen_scores, rejected_scores = chosen_scores.squeeze(), rejected_scores.squeeze()

loss = -torch.nn.functional.logsigmoid(chosen_scores.float() - rejected_scores.float()).mean()

if is_transformers_version_equal_to_4_46() and kwargs.get("num_items_in_batch"):
loss /= self.args.gradient_accumulation_steps # fixes the loss value for transformers 4.46.0-4.46.1

if return_outputs:
return loss, (loss, chosen_scores, rejected_scores)
else:
Expand Down
20 changes: 1 addition & 19 deletions src/llamafactory/train/sft/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@

if TYPE_CHECKING:
from torch.utils.data import Dataset
from transformers import PreTrainedModel, PreTrainedTokenizer, ProcessorMixin
from transformers import PreTrainedTokenizer, ProcessorMixin
from transformers.trainer import PredictionOutput

from ...hparams import FinetuningArguments
Expand Down Expand Up @@ -88,24 +88,6 @@ def _get_train_sampler(self) -> Optional["torch.utils.data.Sampler"]:

return super()._get_train_sampler()

@override
def compute_loss(
self, model: "PreTrainedModel", inputs: Dict[str, "torch.Tensor"], return_outputs: bool = False, **kwargs
) -> Union["torch.Tensor", Tuple["torch.Tensor", List["torch.Tensor"]]]:
r"""
Fixes the loss value. See https://github.com/huggingface/transformers/pull/35438 for details.

It should be removed after https://github.com/huggingface/transformers/pull/35651 is merged.
"""
loss = super().compute_loss(model, inputs, return_outputs, **kwargs)
if kwargs.get("num_items_in_batch") and not getattr(self, "model_accepts_loss_kwargs", False):
if return_outputs:
loss = (loss[0] / self.args.gradient_accumulation_steps, *loss[1:])
else:
loss = loss / self.args.gradient_accumulation_steps

return loss

@override
def prediction_step(
self,
Expand Down
4 changes: 2 additions & 2 deletions src/llamafactory/webui/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@

from ..extras.constants import LLAMABOARD_CONFIG, PEFT_METHODS, TRAINING_STAGES
from ..extras.misc import is_gpu_or_npu_available, torch_gc, use_ray
from ..extras.packages import is_gradio_available, is_transformers_version_equal_to_4_46
from ..extras.packages import is_gradio_available
from .common import (
DEFAULT_CACHE_DIR,
DEFAULT_CONFIG_DIR,
Expand Down Expand Up @@ -180,7 +180,7 @@ def _parse_train_args(self, data: Dict["Component", Any]) -> Dict[str, Any]:
plot_loss=True,
trust_remote_code=True,
ddp_timeout=180000000,
include_num_input_tokens_seen=False if is_transformers_version_equal_to_4_46() else True, # FIXME
include_num_input_tokens_seen=True,
)
args.update(json.loads(get("train.extra_args")))

Expand Down
3 changes: 3 additions & 0 deletions tests/model/model_utils/test_attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@

import os

import pytest
from transformers.utils import is_flash_attn_2_available, is_torch_sdpa_available

from llamafactory.extras.packages import is_transformers_version_greater_than
from llamafactory.train.test_utils import load_infer_model


Expand All @@ -27,6 +29,7 @@
}


@pytest.mark.xfail(is_transformers_version_greater_than("4.48"), reason="Attention refactor.")
def test_attention():
attention_available = ["disabled"]
if is_torch_sdpa_available():
Expand Down