Support for multi-class classification #431

Mahima-ai · 2021-05-12T13:40:15Z

Hi,

I have been working on a multi-class classification problem. I used the following metrics with the pos_class parameter for all the classes one by one but the values are same for all the classes.

AbsCV
Accuracy
Average Odds Difference
Balanced Classification Rate
CV
F1-score
NMI
RenyiCorrelation
SklearnMetric (for few metrics)
Theil Index
Yanovich

On further investigation, I figured out that the Theil index is calculated for the class having maximum instances while other metrics are calculated for Y=1.

Please let me know whether the above mentioned metrics support multi-class classification or not. If yes, then how can I use it and if not, are you planning to have it in the next release.

Mahima-ai · 2021-05-13T14:46:51Z

Also, computing the basic metrics FPR, TPR, FNR, TNR etc... results in error as the labels are calculated implicitly based on the y_true (actual labels). This should be asked explicitly from the user, as it is possible for a subset to leave out some class label which results in sci-kit learn's error.

def confusion_matrix(
    prediction: Prediction, actual: DataTuple, pos_cls: int
) -> Tuple[int, int, int, int]:
    """Apply sci-kit learn's confusion matrix."""
    actual_y: np.ndarray = actual.y.to_numpy(dtype=np.int32)
    labels: np.ndarray = np.unique(actual_y)
    if labels.size == 1:
        labels = np.array([0, 1], dtype=np.int32)
    conf_matr: np.ndarray = conf_mtx(y_true=actual_y, y_pred=prediction.hard, labels=labels)

olliethomas · 2021-05-16T15:38:10Z

Hi @Mahima-ai - thanks for raising an issue. These models should definitely get fixed if they're not right! Sorry if this has caused you problems, we just haven't got any datasets in EthicML where we do multi-class classification so we haven't really thought about it very much. That said, it is something I want us to do right - FairML work in general is far too focussed on binary classification problems. Do you have a dataset that we should add? (I understand that this is a long shot - datasets are not so open)

You a good point about the confusion matrix based metrics. One option is for the user to specify the label values, but as this is a property of the dataset, it would be nice if this information could be taken from there directly.

If you'd like to open a PR, you're more than welcome. Alternatively I will try to address this. Looking through the tests, we don't have enough that use the non_binary_toy dataset.

Sorry again for causing you problems, and thanks by the way, for using EthicML

Mahima-ai · 2022-01-12T07:04:35Z

Hi @olliethomas,

I looked into this further and found that the metric_per_sensitive_attribute method creates a subset of the dataset. This subset has fewer rows than the actual dataset. Here, the number of unique labels differ, it gets reduced. This results in the LabelOutOfBounds Error or ValueError. The method definition is at https://github.com/predictive-analytics-lab/EthicML/blob/main/ethicml/metrics/per_sensitive_attribute.py#L23

def metric_per_sensitive_attribute(
    prediction: Prediction, actual: DataTuple, metric: Metric, use_sens_name: bool = True
) -> Dict[str, float]:

I am attaching a notebook (in the zip folder) on iris dataset to reproduce the same scenario for your perusal.
Multiclass Classification Support.zip

olliethomas · 2022-01-12T12:39:09Z

Thanks for the notebook - I've tweaked it slightly and moved it to colab. It's available here

The error that you are seeing was due to the possible target values being inferred from the dataset.

There are actually two solutions to this.

The first, I've added in #489 - which is the option to define the labels for a metric. so em.TPR(pos_class=1) becomes em.TPR(pos_class=1, labels=[0,1,2]). This new parameter has the default value of None, and when None is passed, the default behaviour of inferring the labels remains.

The second solution is probably closer to what most users want to do. The problem is the s definition from the block

true_data=em.DataTuple(x=pd.DataFrame(data['sepal length (cm)']),
                       s=pd.DataFrame(data['sepal length (cm)']),
                       y=pd.DataFrame(data['target']))

What is happening here is that the protected attribute, s, is being set to a float where the number of unique s-values is, in this case, 35. The number of unique y-values is 3.
Given that the dataset is 150 samples, the chance of all targets being present in every unique s-group is low.

In the case of the notebook - if you set the sensitive attribute to some binary value, then everything works without having to define the labels.

true_data=em.DataTuple(x=pd.DataFrame(data['sepal length (cm)']),
                       s=pd.DataFrame(np.random.randint(0,2,size=(len(data), 1)), columns=["s"]),
                       y=pd.DataFrame(data['target']))

Hopefully this solve the issue for now - we'll make a new release soon. If I've misunderstood, or this doesn't solve the problem, please let me know. Thanks again.

tmke8 added this to the EthicML 2.0 milestone Mar 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for multi-class classification #431

Support for multi-class classification #431

Mahima-ai commented May 12, 2021

Mahima-ai commented May 13, 2021

olliethomas commented May 16, 2021

Mahima-ai commented Jan 12, 2022 •

edited

Loading

olliethomas commented Jan 12, 2022

Support for multi-class classification #431

Support for multi-class classification #431

Comments

Mahima-ai commented May 12, 2021

Mahima-ai commented May 13, 2021

olliethomas commented May 16, 2021

Mahima-ai commented Jan 12, 2022 • edited Loading

olliethomas commented Jan 12, 2022

Mahima-ai commented Jan 12, 2022 •

edited

Loading