Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding intersectional bias mitigation to AIF360 #538

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ckalousi
Copy link

@ckalousi ckalousi commented Aug 26, 2024

@mnagired @hoffmansc

We have implemented an intersectional bias mitigation algorithm based on https://doi.org/10.1007/978-3-030-87687-6_5 (see also https://doi.org/10.48550/arXiv.2010.13494 for the arxiv version) as discussed further on issue #537. Additional details are available in the demo notebook.

@ckalousi
Copy link
Author

According to pytest, our code does not reach the desired coverage of 80%. This happens because our code is multi threaded and it was not obvious to us how pytest can support such kind of code. Nonetheless, we have checked that all functions of the main algorithm file are called during our tests.

Copy link

@RahulVadisetty91 RahulVadisetty91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

In the init method, self.model is set to None, but there's no mechanism for initializing or assigning a model. While the abstract fit method may be responsible for setting the model, it's not clear how this works.

…bar support of the main algorithm added

Signed-off-by: Kalousios <[email protected]>
@ckalousi
Copy link
Author

ckalousi commented Nov 2, 2024

@RahulVadisetty91

Dear Rahul,

we are very grateful for taking the time to review our code and offer some very valuable comments.

We are also extremely sorry we couldn't address your comments earlier as we had to observe some critical deadlines in our work and also coordinate our actions regarding our pull request.

About your comment on the init method. This is a great catch. Indeed self.model is not needed in the current setting. We had only put it there in case we wanted in the future to expand support of our Intersectional Fairness to more algorithms. Under current circumstances it makes sense to comment out this line (line 27 of your screenshot).

You have also made some very valuable comments in your first version of your comment (before the edit). We considered all of them, and although they are all very important we could only use current resources to address some of them.

More specifically, although it would be nice to switch to TensorFlow 2 for future compatibility, our code is based on the code of the Adversarial Debiasing algorithm found in AIF360 which in turn is based on TensorFlow 1. It is very difficult to modify our code to support TensorFlow 2 the time Adversarial Debiasing uses TensorFlow 1. If in the future the original algorithm is updated we would be happy to also update our code.

Following one of your comments we have now implemented evaluation progress bars in our algorithm.

Once again thank you for your time and we would be happy to further discuss any of your suggestions or concerns.

Best regards,
Chrysostomos

Copy link
Collaborator

@hoffmansc hoffmansc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience here! My comments are mostly concerned with cleaning up the code and reducing redundancies but overall it's in pretty good shape. Good work!

Comment on lines +31 to +71
def calc_di(self, df, protected_attr_info, label_info):
"""
Calculate Disparate Impact score

Parameters
----------
df : DataFrame
DataFrame containing sensitive attributes and label
sensitive : dictionary
Privileged group (sensitive attribute name : attribute value)
e.g. {'Gender':1.0,'Race':'black'}
label_info : dictionary
Label definition (label attribute name : attribute values)
e.g. {'denied':1.0}

Returns
-------
return value : float
Disparete Impact score
"""
df_bunshi, df_bunbo = self.calc_privilege_group(df, protected_attr_info)

if (len(df_bunshi) == 0):
return np.nan

if (len(df_bunbo) == 0):
return np.nan

label = list(label_info.keys())[0]
privileged_value = list(label_info.values())[0]

a = len(df_bunshi[df_bunshi[label] == privileged_value])
b = len(df_bunbo[df_bunbo[label] == privileged_value])

bunshi_rate = a / len(df_bunshi)
bunbo_rate = b / len(df_bunbo)

if bunbo_rate == 0:
return np.nan

return (bunshi_rate/bunbo_rate)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to use the built-in disparate_impact_ratio() here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually calculate disparate impact as unprivileged rate / privileged rate so this will be the inverse of what aif360 will return

Comment on lines +27 to +65
def calc_intersectionalbias(dataset, metric="DisparateImpact"):
"""
Calculate intersectional bias(DisparateImpact) by more than one sensitive attributes

Parameters
----------
dataset : StructuredDataset
A dataset containing more than one sensitive attributes

metric : str
Fairness metric name
["DisparateImpact"]

Returns
-------
df_result : DataFrame
Intersectional bias(DisparateImpact)
"""

df = dataset.convert_to_dataframe()[0]
label_info = {dataset.label_names[0]: dataset.favorable_label}

if metric == "DisparateImpact":
fs = DisparateImpact()
else:
raise ValueError("metric name not in the list of allowed metrics")

df_result = pd.DataFrame(columns=[metric])
for multi_group_label in create_multi_group_label(dataset)[0]:
protected_attr_info = multi_group_label[0]
di = fs.bias_predict(df,
protected_attr_info=protected_attr_info,
label_info=label_info)
name = ''
for k, v in protected_attr_info.items():
name += k + " = " + str(v) + ","
df_result.loc[name[:-1]] = di

return df_result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to use the built-in one_vs_rest() here?

y = df.set_index(dataset.protected_attribute_names)[dataset.label_names]
one_vs_rest(disparate_impact_ratio, y)

import matplotlib.cm as cm


class DisparateImpact():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be a class as opposed to individual functions?

Comment on lines +112 to +161
def calc_intersectionalbias_matrix(dataset, metric="DisparateImpact"):
"""
Comparison drawing of intersectional bias in heat map

Parameters
----------
dataset : StructuredDataset
Dataset containing two sensitive attributes
metric : str
Fairness metric name
["DisparateImpact"]

Returns
-------
df_result : DataFrame
Intersectional bias(DisparateImpact)
"""

protect_attr = dataset.protected_attribute_names

if len(protect_attr) != 2:
raise ValueError("specify 2 sensitive attributes.")

if metric == "DisparateImpact":
fs = DisparateImpact()
else:
raise ValueError("metric name not in the list of allowed metrics")

df = dataset.convert_to_dataframe()[0]
label_info = {dataset.label_names[0]: dataset.favorable_label}

protect_attr0_values = list(set(df[protect_attr[0]]))
protect_attr1_values = list(set(df[protect_attr[1]]))

df_result = pd.DataFrame(columns=protect_attr1_values)

for val0 in protect_attr0_values:
tmp_li = []
col_list = []
for val1 in protect_attr1_values:
di = fs.bias_predict(df,
protected_attr_info={protect_attr[0]: val0, protect_attr[1]: val1},
label_info=label_info)
tmp_li += [di]
col_list += [protect_attr[1]+"="+str(val1)]

df_result.loc[protect_attr[0]+"="+str(val0)] = tmp_li
df_result = df_result.set_axis(col_list, axis=1)

return df_result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems largely redundant with calc_intersectionalbias() above but pivoted. can't we just accomplish this in pandas?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these necessary for the algorithm? they seem non-specific and I don't really see them used anywhere.

Comment on lines +52 to +53
scale_orig = StandardScaler()
X = scale_orig.fit_transform(ds_train.features)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary inside fit()? can't the user just apply scaling before passing the dataset?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any way we can just use the AdversarialDebiasing class directly instead of this wrapper? this doesn't seem to be doing much.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before -- can we get rid of these wrapper classes?

from aif360.algorithms.isf_helpers.postprocessing.reject_option_based_classification import RejectOptionClassification
from aif360.algorithms.isf_helpers.postprocessing.equalized_odds_postprocessing import EqualizedOddsPostProcessing

from logging import getLogger, StreamHandler, ERROR, Formatter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get rid of the debugging lines or if they're still useful just add a verbose flag?

Comment on lines +81 to +92
def _read_modelanswer(self, s_result_singleattr, s_result_combattr):
# load of model answer
ma_singleattr_bias = pd.read_csv(MODEL_ANSWER_PATH + s_result_singleattr, index_col=0)
ma_combattr_bias = pd.read_csv(MODEL_ANSWER_PATH + s_result_combattr, index_col=0)
return ma_singleattr_bias, ma_combattr_bias

def _comp_dataframe(self, df1, df2):
try:
assert_frame_equal(df1, df2)
except AssertionError:
return False
return True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants