Counterfactual Fairness

  • A fairness metric that checks whether a classifier produces the same result for one individual as it does for another individual who is identical to the first, except with respect to one or more sensitive attributes. Evaluating a classifier for counterfactual fairness is one method for surfacing potential sources of bias in a model.
  • See “When Worlds Collide Integrating Different Counterfactual Assumptions in Fairness” for a more detailed discussion of counterfactual fairness.