Download data and study materials from OSF
Principal investigators:
Rebecca Johnson
Georgetown University
Simone Zhang
New York University
Sample size: 5643
Field period: 06/04/2021-01/24/2022
The main hypotheses were as follows, with the full set available in the pre-analysis plan:
H1: Across school contexts, respondents will rate algorithms as significantly more fair than each of the status quo methods
H2: Across school contexts, presenting parent requests as the status quo method will lead respondents to rank algorithms as less fair than presenting any of the other three status quo methods.
H3: Across school contexts, minoritized respondents will have differ- ent views about the fairness of algorithms than non-minoritized respondents.
H4: Across school contexts, current and former parents of school-age children will have a smaller gap in favorability between parent requests and algorithms.
The main outcome variables were:
1. Binary DV: "Which method for deciding which students get tutors is fairer?"
2. Continuous DV (secondary): When comparing [inserts other method] to the predictive model, how would you rate how certain you are about which is fairer?” Answer choices: 1 = [inserts other method] is definitely more fair; 2 = [inserts other method] is probably more fair; 3 = I’m not sure which is more fair; 4 = The predictive model is probably more fair; 5 = The predictive model is definitely more fair
3. Open-ended response: "“Explain why you think [inserts method they chose as more fair] is fairer than [inserts method they said was less fair]. Please provide a thoughtful response"
4. Change in views after update about algorithmic bias: "With that update in mind, which method should the school district use to select which students get tutors?"
H1: evidence supported this hypothesis. In the full sample, averaging over school contexts, the algorithm was rated as significantly more fair than each of the other methods via a two-sided chi-square test of differences in proportions: (1) parent requests (test statistic = 44.24, p < 0.0001); (2) simple rule (test statistic = 134.59, p < 0.0001); (3) counselor discretion (test statistic = 37.66, p < 0.0001); and (4) weighted lottery (test statistic = 1086.4, p < 0.0001)
H2: we failed to find evidence supporting this hypothesis. Compared to randomization to parent requests, randomization to the simple rule or weighted lottery caused the algorithm to be rated as MORE fair (p = 0.007, p < 0.0001 respectively). But there was no difference in ratings of algorithmic fairness between randomization to parent requests and counselor discretion.
H3: evidence supported this hypothesis. Compared to non-hispanic White respondents, Black and Hispanic respondents rated algorithms as less fair; when we subset to parents and the contrast between parent requests and algorithms, Asian parents are somewhat more likely to rate the algorithm as fair compared to white, non-hispanic parents (p = 0.05); Black parents rate the algorithm as significantly less fair (p = 0.004), Hispanic as somewhat less fair (p = 0.08).
H4: evidence supported this hypothesis. Subsetting to those randomized to parent requests as the status quo condition, we see that current parents are significantly more likely to rate parent requests as fairer than non-parents (beta = 0.465, p < 0.01) and that ever parents are also significantly more likely to rate parent requests as fairer than non-parents (beta = 0.334, p < 0.05).