1
00:00:00,000 --> 00:00:03,360
[MUSIC PLAYING]

2
00:00:06,563 --> 00:00:08,230
AMIT GANDHI: Hi, my
name is Amit Gandhi.

3
00:00:08,230 --> 00:00:10,510
And I'm a graduate
researcher at MIT.

4
00:00:10,510 --> 00:00:12,370
Welcome to this course
on exploring fairness

5
00:00:12,370 --> 00:00:15,070
in machine learning for
international development.

6
00:00:15,070 --> 00:00:16,882
In this video, we
will examine bias

7
00:00:16,882 --> 00:00:19,090
in machine learning models
through a pulmonary health

8
00:00:19,090 --> 00:00:20,620
diagnostic case study.

9
00:00:20,620 --> 00:00:22,570
In particular, we will
explore the influence

10
00:00:22,570 --> 00:00:27,740
of representative data on
accuracy when building a model.

11
00:00:27,740 --> 00:00:31,940
Pulmonary diseases, including
asthma COPD, allergic rhinitis,

12
00:00:31,940 --> 00:00:34,760
and others, can have significant
detrimental health impacts

13
00:00:34,760 --> 00:00:36,620
if undetected.

14
00:00:36,620 --> 00:00:39,330
In remote areas with limited
access to health care,

15
00:00:39,330 --> 00:00:41,948
they can often go
undiagnosed and untreated.

16
00:00:41,948 --> 00:00:44,240
The motivation for this work
was to develop a screening

17
00:00:44,240 --> 00:00:46,010
tool for community
health workers

18
00:00:46,010 --> 00:00:48,290
to determine if patients
who were presenting symptoms

19
00:00:48,290 --> 00:00:50,945
of pulmonary disease actually
have pulmonary disease.

20
00:00:53,530 --> 00:00:55,930
To develop the tool,
data was collected

21
00:00:55,930 --> 00:00:58,600
from 303 patients who
sought medical care

22
00:00:58,600 --> 00:01:04,087
at health clinics between
2015 and 2018 in Kuna, India.

23
00:01:04,087 --> 00:01:05,920
Patient data was collected
at health clinics

24
00:01:05,920 --> 00:01:08,770
from two exams administered
by researchers--

25
00:01:08,770 --> 00:01:10,660
a mobile health
diagnostic kit developed

26
00:01:10,660 --> 00:01:13,300
by Dr. Fletcher's group
and a set of measurements

27
00:01:13,300 --> 00:01:15,860
from a pulmonary
function test lab.

28
00:01:15,860 --> 00:01:18,290
Health diagnoses were
performed by medical staff

29
00:01:18,290 --> 00:01:21,200
with a focus on asthma,
allergic rhinitis, and COPD.

30
00:01:23,930 --> 00:01:26,480
The overall disease
distribution among the patients

31
00:01:26,480 --> 00:01:27,980
is shown in the plot.

32
00:01:27,980 --> 00:01:32,120
The data included 175 patients
with pulmonary diseases

33
00:01:32,120 --> 00:01:34,460
and 87 healthy patients.

34
00:01:34,460 --> 00:01:37,460
Patients may also have
multiple pulmonary diseases--

35
00:01:37,460 --> 00:01:39,335
for example, asthma and COPD.

36
00:01:42,550 --> 00:01:44,710
The exploration of
representative sampling

37
00:01:44,710 --> 00:01:48,790
on accuracy was conducted across
two protected variables, gender

38
00:01:48,790 --> 00:01:49,990
and income.

39
00:01:49,990 --> 00:01:52,300
The population distributions
for the two variables

40
00:01:52,300 --> 00:01:53,950
can be seen in the slides.

41
00:01:53,950 --> 00:01:55,630
For income
considerations, patients

42
00:01:55,630 --> 00:01:58,240
were categorized as either
low income or high income.

43
00:02:00,760 --> 00:02:02,380
The overall approach
to the bias study

44
00:02:02,380 --> 00:02:05,040
was to divide the data set
into a larger training data

45
00:02:05,040 --> 00:02:07,430
superset in a test data set.

46
00:02:07,430 --> 00:02:10,180
A logistic regression model
with L2 regularization

47
00:02:10,180 --> 00:02:12,640
was used to make
predictions on disease.

48
00:02:12,640 --> 00:02:14,830
To train the model,
training data subsets

49
00:02:14,830 --> 00:02:16,630
were randomly sampled
from the superset

50
00:02:16,630 --> 00:02:18,790
that intentionally
introduce imbalances

51
00:02:18,790 --> 00:02:21,300
along protected variables.

52
00:02:21,300 --> 00:02:23,590
For example, with
regards to income,

53
00:02:23,590 --> 00:02:26,270
training data subsets
ranged from 50 percent 50%

54
00:02:26,270 --> 00:02:32,730
and 50% low income to 87.5% high
income and 12.5% low income.

55
00:02:32,730 --> 00:02:34,890
To account for stochastic
error, this process

56
00:02:34,890 --> 00:02:37,350
was run 1,000 times
for each test.

57
00:02:37,350 --> 00:02:40,200
The area under the curve
of the receiver operating

58
00:02:40,200 --> 00:02:45,210
characteristic curve was used as
a metric bracket for accuracy.

59
00:02:45,210 --> 00:02:47,430
Starting with gender
bias analysis,

60
00:02:47,430 --> 00:02:51,300
our training data sets and test
data set were divided as shown.

61
00:02:51,300 --> 00:02:54,840
Male-female representativeness
was varied from 50-50

62
00:02:54,840 --> 00:02:59,370
to 87.5 to 12.5.

63
00:02:59,370 --> 00:03:01,150
The results for
predictive accuracy

64
00:03:01,150 --> 00:03:03,810
for allergic rhinitis,
asthma, and COPD

65
00:03:03,810 --> 00:03:05,520
are shown on the slide.

66
00:03:05,520 --> 00:03:08,840
The data shows no significant
decrease in algorithm accuracy

67
00:03:08,840 --> 00:03:11,753
as gender imbalances are
introduced in the data.

68
00:03:11,753 --> 00:03:13,170
This may be
surprising considering

69
00:03:13,170 --> 00:03:15,628
how we have highlighted the
principle of representativeness

70
00:03:15,628 --> 00:03:17,620
in data throughout this course.

71
00:03:17,620 --> 00:03:20,190
However, it is important to note
that protective variables do

72
00:03:20,190 --> 00:03:22,380
not necessarily affect
outcome variables

73
00:03:22,380 --> 00:03:23,820
and the lack of
representativeness

74
00:03:23,820 --> 00:03:27,690
may not always introduce
bias or fairness into models.

75
00:03:27,690 --> 00:03:29,430
Looking at our
results, we also notice

76
00:03:29,430 --> 00:03:31,500
that our algorithm is more
accurate at predicting

77
00:03:31,500 --> 00:03:33,880
COPD in women than men.

78
00:03:33,880 --> 00:03:35,590
Exploring the
results further, we

79
00:03:35,590 --> 00:03:40,000
look at other variables in
the correlation with gender.

80
00:03:40,000 --> 00:03:42,370
In our data set, we found
that smoking heavily

81
00:03:42,370 --> 00:03:44,100
correlated with gender.

82
00:03:44,100 --> 00:03:47,380
55% of men reported that
they were nonsmokers

83
00:03:47,380 --> 00:03:51,400
whereas 100% of women reported
that they were nonsmokers.

84
00:03:51,400 --> 00:03:53,290
As a result, the
population of women

85
00:03:53,290 --> 00:03:58,640
was more homogeneous, allowing
for higher predictive accuracy.

86
00:03:58,640 --> 00:04:01,250
Moving on to the
income bias analysis,

87
00:04:01,250 --> 00:04:03,110
the training data sets
and test data sets

88
00:04:03,110 --> 00:04:04,880
were divided as shown.

89
00:04:04,880 --> 00:04:07,400
Similar to the gender
study, representativeness

90
00:04:07,400 --> 00:04:11,510
based on income was varied
for the training data set.

91
00:04:11,510 --> 00:04:13,100
The results were
predictive accuracy

92
00:04:13,100 --> 00:04:15,500
for allergic rhinitis,
asthma, and COPD

93
00:04:15,500 --> 00:04:16,890
are shown on the slide.

94
00:04:16,890 --> 00:04:18,890
Again, we see very little
difference in accuracy

95
00:04:18,890 --> 00:04:22,370
as we change representativeness
within the sample.

96
00:04:22,370 --> 00:04:25,260
COPD is the most sensitive
to socioeconomic status,

97
00:04:25,260 --> 00:04:28,520
with a 4% difference in model
accuracy for high income

98
00:04:28,520 --> 00:04:30,730
and low income populations.

99
00:04:30,730 --> 00:04:32,330
Asthma and allergic
rhinitis show

100
00:04:32,330 --> 00:04:36,010
no difference in performance.

101
00:04:36,010 --> 00:04:38,250
In summary, we found
that representativeness

102
00:04:38,250 --> 00:04:41,010
across the protected
variables of gendered income

103
00:04:41,010 --> 00:04:43,410
do not play a large
role in model accuracy

104
00:04:43,410 --> 00:04:46,815
for this example on
pulmonary diseases in India.

105
00:04:46,815 --> 00:04:48,690
As part of building a
machine learning model,

106
00:04:48,690 --> 00:04:51,510
it is always important to
check what effects, if any,

107
00:04:51,510 --> 00:04:54,000
attentive attributes
may have on the model.

108
00:04:54,000 --> 00:04:56,490
In the real world, it will be
impossible to find perfectly

109
00:04:56,490 --> 00:04:57,653
balanced data sets.

110
00:04:57,653 --> 00:04:59,070
And test such as
the one described

111
00:04:59,070 --> 00:05:02,160
can be used to check for the
effect of representativeness

112
00:05:02,160 --> 00:05:05,985
across protected variables
on data and model accuracy.

113
00:05:05,985 --> 00:05:07,860
It is important to
understand these tradeoffs

114
00:05:07,860 --> 00:05:10,402
so that you can make informed
decisions when building models.

115
00:05:13,038 --> 00:05:15,330
Thank you for taking the time
to watch this case study.

116
00:05:15,330 --> 00:05:17,288
And we hope that you'll
watch the other content

117
00:05:17,288 --> 00:05:19,550
in the series.

118
00:05:19,550 --> 00:05:26,200
[MUSIC PLAYING]