Skip to content

Inappropriate dataset structure for the TextClassification model #7

@AbSsEnT

Description

@AbSsEnT

I tested OpenAssistant/reward-model-deberta-v3-large-v2 model. Despite the mode having TextClassification type, related datasets do not have the structure of the 'classification' dataset. Thus, during feature mapping (_get_feature_mapping method) stage next errors are happened, depending on the dataset:

openai/summarize_from_feedback, Dahoas/instruct-synthetic-prompt-responses

/Users/mykytaalekseiev/Work/GiskardPipVersion/venv/bin/python /Users/mykytaalekseiev/Work/cicd/cli.py --loader huggingface --model OpenAssistant/reward-model-deberta-v3-large-v2 --dataset openai/summarize_from_feedback --dataset_split train --dataset_config comparisons --output ${model_name}__default_scan_with__${dataset_name}.html 
Traceback (most recent call last):
  File "/Users/mykytaalekseiev/Work/cicd/cli.py", line 43, in <module>
    report = runner.run(**runner_kwargs)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/pipeline/runner.py", line 35, in run
    gsk_model, gsk_dataset = loader.load_giskard_model_dataset(**kwargs)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 53, in load_giskard_model_dataset
    feature_mapping = self._get_feature_mapping(hf_model, hf_dataset)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 128, in _get_feature_mapping
    raise RuntimeError(msg)
RuntimeError: Could not find a suitable mapping for feature for `label`.

openai/webgpt_comparisons

Traceback (most recent call last):
  File "/Users/mykytaalekseiev/Work/cicd/cli.py", line 43, in <module>
    report = runner.run(**runner_kwargs)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/pipeline/runner.py", line 35, in run
    gsk_model, gsk_dataset = loader.load_giskard_model_dataset(**kwargs)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 53, in load_giskard_model_dataset
    feature_mapping = self._get_feature_mapping(hf_model, hf_dataset)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 123, in _get_feature_mapping
    candidates = [f for f in available_features if dataset_features[f].dtype == expected_type]
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 123, in <listcomp>
    candidates = [f for f in available_features if dataset_features[f].dtype == expected_type]
AttributeError: 'dict' object has no attribute 'dtype'

Anthropic/hh-rlhf

Traceback (most recent call last):
  File "/Users/mykytaalekseiev/Work/cicd/cli.py", line 43, in <module>
    report = runner.run(**runner_kwargs)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/pipeline/runner.py", line 35, in run
    gsk_model, gsk_dataset = loader.load_giskard_model_dataset(**kwargs)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 53, in load_giskard_model_dataset
    feature_mapping = self._get_feature_mapping(hf_model, hf_dataset)
  File "/Users/mykytaalekseiev/Work/cicd/giskard_cicd/loaders/huggingface_loader.py", line 128, in _get_feature_mapping
    raise RuntimeError(msg)
RuntimeError: Could not find a suitable mapping for feature for `text`.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions