1
00:00:11,800 --> 00:00:12,850
AMIT GANDHI: Hi.

2
00:00:12,850 --> 00:00:13,960
My name is Amit Gandhi.

3
00:00:13,960 --> 00:00:16,210
And I'm a graduate
researcher at MIT.

4
00:00:16,210 --> 00:00:17,800
Welcome to the
series on exploring

5
00:00:17,800 --> 00:00:20,748
fairness and machine learning
for international development.

6
00:00:20,748 --> 00:00:23,290
In this module, we will cover
the appropriate usage framework

7
00:00:23,290 --> 00:00:28,090
developed by the US Agency
for International Development.

8
00:00:28,090 --> 00:00:30,470
The Center for Digital
Development at USAID

9
00:00:30,470 --> 00:00:32,470
has been studying the
appropriate use of machine

10
00:00:32,470 --> 00:00:35,050
learning and developing
country contexts.

11
00:00:35,050 --> 00:00:36,760
Among other activities,
this research

12
00:00:36,760 --> 00:00:38,950
is involved engaging
stakeholders, conducting

13
00:00:38,950 --> 00:00:41,680
case studies, and developing and
publishing and appropriate use

14
00:00:41,680 --> 00:00:43,030
framework.

15
00:00:43,030 --> 00:00:44,830
The work done by
the MIT CITE team

16
00:00:44,830 --> 00:00:47,080
builds on certain aspects
of this report, which can be

17
00:00:47,080 --> 00:00:50,770
found in the linked materials.

18
00:00:50,770 --> 00:00:53,620
In this section, we will be
presenting some characteristics

19
00:00:53,620 --> 00:00:56,262
for the appropriate application
of machine learning.

20
00:00:56,262 --> 00:00:57,970
Please keep these
characteristics in mind

21
00:00:57,970 --> 00:01:00,020
as you think about the
projects you were working with.

22
00:01:00,020 --> 00:01:01,947
They are intended to
help practitioners think

23
00:01:01,947 --> 00:01:04,030
through the ethical and
appropriate use of machine

24
00:01:04,030 --> 00:01:08,140
learning in international
development.

25
00:01:08,140 --> 00:01:10,860
The first criterion
is relevance.

26
00:01:10,860 --> 00:01:13,530
Is the use of machine
learning in this context

27
00:01:13,530 --> 00:01:15,900
solving an appropriate problem?

28
00:01:15,900 --> 00:01:18,180
As machine learning
becomes more of a trend,

29
00:01:18,180 --> 00:01:20,070
we are seeing more
and more organizations

30
00:01:20,070 --> 00:01:21,577
seeking to apply
it to their work

31
00:01:21,577 --> 00:01:23,160
in an effort to
distinguish themselves

32
00:01:23,160 --> 00:01:24,870
from their competitors
or to increase

33
00:01:24,870 --> 00:01:26,730
their appeal to funders.

34
00:01:26,730 --> 00:01:28,410
Many of these
organizations may try

35
00:01:28,410 --> 00:01:31,080
to implement prepackaged or
off the shelf machine learning

36
00:01:31,080 --> 00:01:32,730
solutions without
understanding if it

37
00:01:32,730 --> 00:01:34,313
is the right tool
for the problem they

38
00:01:34,313 --> 00:01:36,430
are trying to solve.

39
00:01:36,430 --> 00:01:39,370
For an example of relevance,
let us consider the tracking

40
00:01:39,370 --> 00:01:41,260
of human assistance delivery.

41
00:01:41,260 --> 00:01:44,050
Organizations may want to
create systems to monitor people

42
00:01:44,050 --> 00:01:46,000
in refugee camps and
make sure they're only

43
00:01:46,000 --> 00:01:47,500
getting the food
or other supplies

44
00:01:47,500 --> 00:01:49,210
that they're entitled to.

45
00:01:49,210 --> 00:01:52,690
However, the major losses
often happen further upstream,

46
00:01:52,690 --> 00:01:55,820
such as people diverting
whole trucks full of supplies,

47
00:01:55,820 --> 00:01:58,800
not individuals taking two
bags of rice instead of one.

48
00:01:58,800 --> 00:02:01,480
An AI solution to keep a
closer eye on individual aid

49
00:02:01,480 --> 00:02:03,645
recipients would fail
the relevancy test

50
00:02:03,645 --> 00:02:06,020
because it is not addressing
a major issue in the system.

51
00:02:09,600 --> 00:02:12,740
The second criterion
is representativeness.

52
00:02:12,740 --> 00:02:14,810
Is the data used to train
the machine learning

53
00:02:14,810 --> 00:02:17,030
models appropriately selected?

54
00:02:17,030 --> 00:02:18,957
In order to evaluate
representativeness,

55
00:02:18,957 --> 00:02:21,290
the organization should
consider if the machine learning

56
00:02:21,290 --> 00:02:24,530
model uses data representative
to the context in which it will

57
00:02:24,530 --> 00:02:26,660
be deployed and
which strategies are

58
00:02:26,660 --> 00:02:28,430
important for
ensuring models can be

59
00:02:28,430 --> 00:02:30,720
trained with appropriate data.

60
00:02:30,720 --> 00:02:33,650
As an example, consider a
startup medical diagnostics

61
00:02:33,650 --> 00:02:36,170
company that is trying to
build a remote diagnostic tool

62
00:02:36,170 --> 00:02:38,420
for the West African population.

63
00:02:38,420 --> 00:02:40,700
High quality coded data
sets from West Africa

64
00:02:40,700 --> 00:02:42,000
may not be available.

65
00:02:42,000 --> 00:02:45,380
So the startup uses a European
data set to train their models.

66
00:02:45,380 --> 00:02:48,260
Some diagnoses may be accurate,
but disease differences

67
00:02:48,260 --> 00:02:50,090
between Europe and
West Africa may

68
00:02:50,090 --> 00:02:51,830
cause misdiagnoses
for individuals,

69
00:02:51,830 --> 00:02:54,220
putting them at health risk.

70
00:02:54,220 --> 00:02:56,200
Now consider if the
startup used a data set

71
00:02:56,200 --> 00:02:58,500
based on East African
patient data instead

72
00:02:58,500 --> 00:03:00,850
of European patient data.

73
00:03:00,850 --> 00:03:03,370
While this would probably
provide better results,

74
00:03:03,370 --> 00:03:05,380
resulting diagnoses
from this model

75
00:03:05,380 --> 00:03:08,350
would overlook diseases such as
malaria and yellow fever, which

76
00:03:08,350 --> 00:03:10,540
tend to be more
common in West Africa,

77
00:03:10,540 --> 00:03:13,390
and also result in
improper diagnosis.

78
00:03:13,390 --> 00:03:15,220
Finally, let's
consider if the startup

79
00:03:15,220 --> 00:03:18,130
uses a data set from patients
from the largest hospitals

80
00:03:18,130 --> 00:03:20,140
in West African countries.

81
00:03:20,140 --> 00:03:22,330
While this may seem like
a good choice at first,

82
00:03:22,330 --> 00:03:23,980
this data set would
probably be more

83
00:03:23,980 --> 00:03:26,140
representative of
urban populations

84
00:03:26,140 --> 00:03:28,180
as compared to
rural populations,

85
00:03:28,180 --> 00:03:31,158
also resulting in
improper diagnosis.

86
00:03:31,158 --> 00:03:33,700
This example shows that there
can be different scales of data

87
00:03:33,700 --> 00:03:35,590
representation and
that coders need

88
00:03:35,590 --> 00:03:37,353
to be careful tests
right questions

89
00:03:37,353 --> 00:03:38,770
about the context
for each problem

90
00:03:38,770 --> 00:03:42,580
to design models appropriately.

91
00:03:42,580 --> 00:03:45,795
The third criterion is value.

92
00:03:45,795 --> 00:03:47,420
Does the machine
learning model produce

93
00:03:47,420 --> 00:03:48,837
predictions that
are more accurate

94
00:03:48,837 --> 00:03:50,720
than alternative methods?

95
00:03:50,720 --> 00:03:52,940
Does it explain
variation more completely

96
00:03:52,940 --> 00:03:55,100
than alternative models?

97
00:03:55,100 --> 00:03:57,710
Do the predicted values
inform human decisions

98
00:03:57,710 --> 00:03:59,690
in a meaningful way?

99
00:03:59,690 --> 00:04:01,670
For example, are
they actionable?

100
00:04:01,670 --> 00:04:03,860
Telling farmers that they
could improve their yields

101
00:04:03,860 --> 00:04:05,300
by moving to a
different elevation

102
00:04:05,300 --> 00:04:07,710
is not useful to them.

103
00:04:07,710 --> 00:04:08,990
Are they timely?

104
00:04:08,990 --> 00:04:11,500
Having information but not
enough time to act on it

105
00:04:11,500 --> 00:04:13,790
provides little to no value.

106
00:04:13,790 --> 00:04:15,890
Are they delivered
to the right people?

107
00:04:15,890 --> 00:04:18,140
You shouldn't build a system
that provides information

108
00:04:18,140 --> 00:04:19,760
to frontline workers
when decisions

109
00:04:19,760 --> 00:04:22,820
are made by their supervisors.

110
00:04:22,820 --> 00:04:24,700
While machine learning
is a powerful tool,

111
00:04:24,700 --> 00:04:27,220
it is not always the best
approach to all problems.

112
00:04:27,220 --> 00:04:29,020
Organizations should
have sufficient reason

113
00:04:29,020 --> 00:04:31,520
to think that machine learning
would add value and make sure

114
00:04:31,520 --> 00:04:33,610
that they evaluate that
assumption before scaling

115
00:04:33,610 --> 00:04:36,750
solutions.

116
00:04:36,750 --> 00:04:39,810
The fourth criterion
is explainability.

117
00:04:39,810 --> 00:04:43,560
How effectively is the use of
machine learning communicated?

118
00:04:43,560 --> 00:04:46,370
It is important to ensure that
the application is explained

119
00:04:46,370 --> 00:04:49,110
to end users in a way that
is effectively communicating

120
00:04:49,110 --> 00:04:51,180
how outcomes were determined.

121
00:04:51,180 --> 00:04:53,760
Organizations seeking to apply
machine learning outcomes

122
00:04:53,760 --> 00:04:55,860
without understanding
the nuances of how

123
00:04:55,860 --> 00:04:58,650
models make decisions
may use algorithm outputs

124
00:04:58,650 --> 00:05:00,850
inappropriately.

125
00:05:00,850 --> 00:05:03,250
Let's look back at the
earlier example of gender

126
00:05:03,250 --> 00:05:06,660
differentiated credit scoring
in the previous module.

127
00:05:06,660 --> 00:05:09,370
An explainable solution
could include information

128
00:05:09,370 --> 00:05:11,647
on why a specific
individual was denied a loan

129
00:05:11,647 --> 00:05:13,480
and which factors they
could change in order

130
00:05:13,480 --> 00:05:17,340
for them to increase
their credit worthiness.

131
00:05:17,340 --> 00:05:20,400
The fifth criterion
is auditability.

132
00:05:20,400 --> 00:05:22,560
Can the models decision
making processes

133
00:05:22,560 --> 00:05:26,140
be queried or monitored
by external actors?

134
00:05:26,140 --> 00:05:29,050
Increasingly, organizations
returning to black box machine

135
00:05:29,050 --> 00:05:31,720
learning solutions, whose
inner workings can range from

136
00:05:31,720 --> 00:05:34,270
unintuitive to incomprehensible.

137
00:05:34,270 --> 00:05:37,000
It is important that the outputs
can be monitored externally

138
00:05:37,000 --> 00:05:39,280
to show that the model
is fair, unbiased,

139
00:05:39,280 --> 00:05:42,060
and does not harm some users.

140
00:05:42,060 --> 00:05:43,890
This may require
additional infrastructure,

141
00:05:43,890 --> 00:05:46,680
whether it is institutional
or legal frameworks that

142
00:05:46,680 --> 00:05:49,770
require audits, provide
auditors with secure access

143
00:05:49,770 --> 00:05:51,810
to data and algorithms,
and require people

144
00:05:51,810 --> 00:05:54,660
to act on those findings.

145
00:05:54,660 --> 00:05:58,010
The sixth criterion is equity.

146
00:05:58,010 --> 00:05:59,762
If used to guide
decision making,

147
00:05:59,762 --> 00:06:01,220
has the machine
learning model been

148
00:06:01,220 --> 00:06:04,280
tested to determine whether
it is disproportionately

149
00:06:04,280 --> 00:06:06,620
harmful or beneficial to
some individuals or groups

150
00:06:06,620 --> 00:06:08,180
more than others?

151
00:06:08,180 --> 00:06:09,710
Testing the results
of algorithms

152
00:06:09,710 --> 00:06:12,800
against protected variables,
such as gender, age, race,

153
00:06:12,800 --> 00:06:15,290
or skin color, is key to
preventing the adoption

154
00:06:15,290 --> 00:06:17,870
of biased algorithms.

155
00:06:17,870 --> 00:06:20,460
Does a specific algorithm
fail for specific groups

156
00:06:20,460 --> 00:06:23,110
more often than it does
for other groups of people?

157
00:06:23,110 --> 00:06:25,880
Does it misclassified different
groups in different directions?

158
00:06:25,880 --> 00:06:27,710
Or did certain groups
have different rates

159
00:06:27,710 --> 00:06:31,180
of false positives
or false negatives?

160
00:06:31,180 --> 00:06:32,950
It is also important
to address the issue

161
00:06:32,950 --> 00:06:36,220
that accuracy and fairness are
not necessarily correlated.

162
00:06:36,220 --> 00:06:38,590
Algorithms can be accurate
technically but still

163
00:06:38,590 --> 00:06:41,140
inconsistent with the values
that the organizations may

164
00:06:41,140 --> 00:06:43,840
want to promote when making
decisions, such as who should

165
00:06:43,840 --> 00:06:46,300
be hired, who should
receive medical care,

166
00:06:46,300 --> 00:06:48,490
or other similar decisions.

167
00:06:48,490 --> 00:06:51,180
Gaining an understanding of
how these outcomes are derived

168
00:06:51,180 --> 00:06:52,870
and taking steps
to mitigate them

169
00:06:52,870 --> 00:06:54,400
is an important
piece in ensuring

170
00:06:54,400 --> 00:06:57,370
that unfair algorithms are
not widely adopted and used.

171
00:07:00,370 --> 00:07:02,720
The seventh criterion
is accountability

172
00:07:02,720 --> 00:07:04,730
or responsibility.

173
00:07:04,730 --> 00:07:06,980
If used to guide
decision making,

174
00:07:06,980 --> 00:07:08,720
are there mechanisms
in place to ensure

175
00:07:08,720 --> 00:07:10,850
that someone will be
responsible for responding

176
00:07:10,850 --> 00:07:14,450
to feedback and redress
harms, if necessary?

177
00:07:14,450 --> 00:07:16,790
For example, an
algorithm might be

178
00:07:16,790 --> 00:07:19,610
used to assist in diagnosing
medical conditions.

179
00:07:19,610 --> 00:07:22,610
But the final diagnosis should
be still provided by a trained

180
00:07:22,610 --> 00:07:24,440
medical professional.

181
00:07:24,440 --> 00:07:27,170
When used by itself, the risk
from false identifications

182
00:07:27,170 --> 00:07:30,690
from the algorithm can actually
cause harm to individuals.

183
00:07:30,690 --> 00:07:33,270
However, consider if there
is a shortage of trained

184
00:07:33,270 --> 00:07:35,100
medical professionals.

185
00:07:35,100 --> 00:07:38,160
Does the risk of misdiagnoses
outweigh the risk

186
00:07:38,160 --> 00:07:40,167
of not treating people?

187
00:07:40,167 --> 00:07:42,500
These decisions are complicated,
and there is not always

188
00:07:42,500 --> 00:07:44,920
a right answer.

189
00:07:44,920 --> 00:07:46,560
It is important to
keep accountability

190
00:07:46,560 --> 00:07:49,410
and responsibility in mind
when designing systems.

191
00:07:49,410 --> 00:07:51,690
Remember, an accountable
setup both make sure

192
00:07:51,690 --> 00:07:54,300
that there are systems in
place to prevent harmful errors

193
00:07:54,300 --> 00:07:56,530
and make sure someone is
responsible for correcting

194
00:07:56,530 --> 00:07:57,030
errors.

195
00:07:59,780 --> 00:08:02,500
As a review, we talked
about seven characteristics

196
00:08:02,500 --> 00:08:04,250
for the appropriate
application of machine

197
00:08:04,250 --> 00:08:06,180
learning in this module.

198
00:08:06,180 --> 00:08:08,330
The first is relevance.

199
00:08:08,330 --> 00:08:11,270
The second is
representativeness.

200
00:08:11,270 --> 00:08:13,850
The third is value.

201
00:08:13,850 --> 00:08:16,990
The fourth is explainability.

202
00:08:16,990 --> 00:08:20,080
The fifth is auditability.

203
00:08:20,080 --> 00:08:22,950
The sixth is equity.

204
00:08:22,950 --> 00:08:24,540
And the seventh
is accountability

205
00:08:24,540 --> 00:08:26,600
and responsibility.

206
00:08:26,600 --> 00:08:28,700
Be sure to take this
into consideration

207
00:08:28,700 --> 00:08:32,887
while you're implementing
your own solutions.

208
00:08:32,887 --> 00:08:34,970
Thank you for taking the
time to take this course.

209
00:08:34,970 --> 00:08:36,887
We hope that you'll
continue to watch the rest

210
00:08:36,887 --> 00:08:39,680
of the modules in this series.

211
00:08:39,680 --> 00:08:45,430
[MUSIC PLAYING]