MADD: a fairness metric for predictive student models

fairness

learning analytics

pattern: live + frozen

Pattern (a) — live computation from an external repo, frozen for fast, reproducible site builds.

Published

1 June 2024

Modified

20 June 2026

Integration pattern (a): live & frozen

This page runs code from an external project. The project lives in its own repository, pulled in here as a Git submodule under _external/, and is executed using that project’s own environment — never the website’s. After rendering once locally, the results are stored in _freeze/ and committed, so rebuilding the whole site (locally or in CI) needs neither the submodule’s code nor its dependencies.

Set up the submodule (one time):

git submodule add https://github.com/melinaverger/MADD.git _external/madd

Render this page in the project’s environment, then commit the freeze:

scripts/render-project.sh projects/madd-fairness.qmd /path/to/madd/.venv
git add _freeze projects/madd-fairness.qmd && git commit -m "build(projects): refresh MADD fairness figure"

What it is

MADD (Model Absolute Density Distance) is a metric to evaluate the algorithmic fairness of predictive student models independently of their predictive performance, introduced with Mélina Verger, Sébastien Lallé and Vanda Luengo. This page reproduces a small illustrative figure directly from the project code.

Reproducible figure

# -------------------------------------------------------------------------
# REAL USAGE (uncomment once the submodule is in place):
#
#   from madd import madd, plot_group_densities          # from _external/madd
#   import pandas as pd
#   df = pd.read_csv("../_external/madd/data/mooc_sample.csv")
#   score = madd(df, group_col="gender", proba_col="p")
#   plot_group_densities(df, group_col="gender", proba_col="p")
#
# Below is a SELF-CONTAINED stand-in so the skeleton renders out of the box.
# Replace it with the import above. It uses only numpy/matplotlib, which belong
# to the *project* environment, not the website environment.
# -------------------------------------------------------------------------
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(0)
group_a = rng.beta(2.5, 4.0, 4000)   # predicted P(success) for group A
group_b = rng.beta(2.0, 5.0, 4000)   # predicted P(success) for group B
grid = np.linspace(0, 1, 200)

def density(x):
    from numpy import histogram
    h, edges = histogram(x, bins=40, range=(0, 1), density=True)
    centers = (edges[:-1] + edges[1:]) / 2
    return centers, h

ca, da = density(group_a)
cb, db = density(group_b)
madd_score = np.trapz(np.abs(np.interp(grid, ca, da) - np.interp(grid, cb, db)), grid)

fig, ax = plt.subplots(figsize=(6.2, 3.6))
ax.fill_between(ca, da, alpha=.35, label="Group A")
ax.fill_between(cb, db, alpha=.35, label="Group B")
ax.set_xlabel("Predicted probability of success")
ax.set_ylabel("Density")
ax.set_title(f"Illustrative MADD ≈ {madd_score:.3f}")
ax.legend(frameon=False)
fig.tight_layout()
plt.show()

/tmp/ipykernel_476848/1057510357.py:30: DeprecationWarning: `trapz` is deprecated. Use `trapezoid` instead, or one of the numerical integration functions in `scipy.integrate`.
  madd_score = np.trapz(np.abs(np.interp(grid, ca, da) - np.interp(grid, cb, db)), grid)

Figure 1: Predicted-probability densities for two groups; MADD measures the area between them.

What it is

Reproducible figure

Links