1 00:00:11,800 --> 00:00:12,850 AMIT GANDHI: Hi. 2 00:00:12,850 --> 00:00:13,960 My name is Amit Gandhi. 3 00:00:13,960 --> 00:00:16,210 And I'm a graduate researcher at MIT. 4 00:00:16,210 --> 00:00:17,800 Welcome to the series on exploring 5 00:00:17,800 --> 00:00:20,748 fairness and machine learning for international development. 6 00:00:20,748 --> 00:00:23,290 In this module, we will cover the appropriate usage framework 7 00:00:23,290 --> 00:00:28,090 developed by the US Agency for International Development. 8 00:00:28,090 --> 00:00:30,470 The Center for Digital Development at USAID 9 00:00:30,470 --> 00:00:32,470 has been studying the appropriate use of machine 10 00:00:32,470 --> 00:00:35,050 learning and developing country contexts. 11 00:00:35,050 --> 00:00:36,760 Among other activities, this research 12 00:00:36,760 --> 00:00:38,950 is involved engaging stakeholders, conducting 13 00:00:38,950 --> 00:00:41,680 case studies, and developing and publishing and appropriate use 14 00:00:41,680 --> 00:00:43,030 framework. 15 00:00:43,030 --> 00:00:44,830 The work done by the MIT CITE team 16 00:00:44,830 --> 00:00:47,080 builds on certain aspects of this report, which can be 17 00:00:47,080 --> 00:00:50,770 found in the linked materials. 18 00:00:50,770 --> 00:00:53,620 In this section, we will be presenting some characteristics 19 00:00:53,620 --> 00:00:56,262 for the appropriate application of machine learning. 20 00:00:56,262 --> 00:00:57,970 Please keep these characteristics in mind 21 00:00:57,970 --> 00:01:00,020 as you think about the projects you were working with. 22 00:01:00,020 --> 00:01:01,947 They are intended to help practitioners think 23 00:01:01,947 --> 00:01:04,030 through the ethical and appropriate use of machine 24 00:01:04,030 --> 00:01:08,140 learning in international development. 25 00:01:08,140 --> 00:01:10,860 The first criterion is relevance. 26 00:01:10,860 --> 00:01:13,530 Is the use of machine learning in this context 27 00:01:13,530 --> 00:01:15,900 solving an appropriate problem? 28 00:01:15,900 --> 00:01:18,180 As machine learning becomes more of a trend, 29 00:01:18,180 --> 00:01:20,070 we are seeing more and more organizations 30 00:01:20,070 --> 00:01:21,577 seeking to apply it to their work 31 00:01:21,577 --> 00:01:23,160 in an effort to distinguish themselves 32 00:01:23,160 --> 00:01:24,870 from their competitors or to increase 33 00:01:24,870 --> 00:01:26,730 their appeal to funders. 34 00:01:26,730 --> 00:01:28,410 Many of these organizations may try 35 00:01:28,410 --> 00:01:31,080 to implement prepackaged or off the shelf machine learning 36 00:01:31,080 --> 00:01:32,730 solutions without understanding if it 37 00:01:32,730 --> 00:01:34,313 is the right tool for the problem they 38 00:01:34,313 --> 00:01:36,430 are trying to solve. 39 00:01:36,430 --> 00:01:39,370 For an example of relevance, let us consider the tracking 40 00:01:39,370 --> 00:01:41,260 of human assistance delivery. 41 00:01:41,260 --> 00:01:44,050 Organizations may want to create systems to monitor people 42 00:01:44,050 --> 00:01:46,000 in refugee camps and make sure they're only 43 00:01:46,000 --> 00:01:47,500 getting the food or other supplies 44 00:01:47,500 --> 00:01:49,210 that they're entitled to. 45 00:01:49,210 --> 00:01:52,690 However, the major losses often happen further upstream, 46 00:01:52,690 --> 00:01:55,820 such as people diverting whole trucks full of supplies, 47 00:01:55,820 --> 00:01:58,800 not individuals taking two bags of rice instead of one. 48 00:01:58,800 --> 00:02:01,480 An AI solution to keep a closer eye on individual aid 49 00:02:01,480 --> 00:02:03,645 recipients would fail the relevancy test 50 00:02:03,645 --> 00:02:06,020 because it is not addressing a major issue in the system. 51 00:02:09,600 --> 00:02:12,740 The second criterion is representativeness. 52 00:02:12,740 --> 00:02:14,810 Is the data used to train the machine learning 53 00:02:14,810 --> 00:02:17,030 models appropriately selected? 54 00:02:17,030 --> 00:02:18,957 In order to evaluate representativeness, 55 00:02:18,957 --> 00:02:21,290 the organization should consider if the machine learning 56 00:02:21,290 --> 00:02:24,530 model uses data representative to the context in which it will 57 00:02:24,530 --> 00:02:26,660 be deployed and which strategies are 58 00:02:26,660 --> 00:02:28,430 important for ensuring models can be 59 00:02:28,430 --> 00:02:30,720 trained with appropriate data. 60 00:02:30,720 --> 00:02:33,650 As an example, consider a startup medical diagnostics 61 00:02:33,650 --> 00:02:36,170 company that is trying to build a remote diagnostic tool 62 00:02:36,170 --> 00:02:38,420 for the West African population. 63 00:02:38,420 --> 00:02:40,700 High quality coded data sets from West Africa 64 00:02:40,700 --> 00:02:42,000 may not be available. 65 00:02:42,000 --> 00:02:45,380 So the startup uses a European data set to train their models. 66 00:02:45,380 --> 00:02:48,260 Some diagnoses may be accurate, but disease differences 67 00:02:48,260 --> 00:02:50,090 between Europe and West Africa may 68 00:02:50,090 --> 00:02:51,830 cause misdiagnoses for individuals, 69 00:02:51,830 --> 00:02:54,220 putting them at health risk. 70 00:02:54,220 --> 00:02:56,200 Now consider if the startup used a data set 71 00:02:56,200 --> 00:02:58,500 based on East African patient data instead 72 00:02:58,500 --> 00:03:00,850 of European patient data. 73 00:03:00,850 --> 00:03:03,370 While this would probably provide better results, 74 00:03:03,370 --> 00:03:05,380 resulting diagnoses from this model 75 00:03:05,380 --> 00:03:08,350 would overlook diseases such as malaria and yellow fever, which 76 00:03:08,350 --> 00:03:10,540 tend to be more common in West Africa, 77 00:03:10,540 --> 00:03:13,390 and also result in improper diagnosis. 78 00:03:13,390 --> 00:03:15,220 Finally, let's consider if the startup 79 00:03:15,220 --> 00:03:18,130 uses a data set from patients from the largest hospitals 80 00:03:18,130 --> 00:03:20,140 in West African countries. 81 00:03:20,140 --> 00:03:22,330 While this may seem like a good choice at first, 82 00:03:22,330 --> 00:03:23,980 this data set would probably be more 83 00:03:23,980 --> 00:03:26,140 representative of urban populations 84 00:03:26,140 --> 00:03:28,180 as compared to rural populations, 85 00:03:28,180 --> 00:03:31,158 also resulting in improper diagnosis. 86 00:03:31,158 --> 00:03:33,700 This example shows that there can be different scales of data 87 00:03:33,700 --> 00:03:35,590 representation and that coders need 88 00:03:35,590 --> 00:03:37,353 to be careful tests right questions 89 00:03:37,353 --> 00:03:38,770 about the context for each problem 90 00:03:38,770 --> 00:03:42,580 to design models appropriately. 91 00:03:42,580 --> 00:03:45,795 The third criterion is value. 92 00:03:45,795 --> 00:03:47,420 Does the machine learning model produce 93 00:03:47,420 --> 00:03:48,837 predictions that are more accurate 94 00:03:48,837 --> 00:03:50,720 than alternative methods? 95 00:03:50,720 --> 00:03:52,940 Does it explain variation more completely 96 00:03:52,940 --> 00:03:55,100 than alternative models? 97 00:03:55,100 --> 00:03:57,710 Do the predicted values inform human decisions 98 00:03:57,710 --> 00:03:59,690 in a meaningful way? 99 00:03:59,690 --> 00:04:01,670 For example, are they actionable? 100 00:04:01,670 --> 00:04:03,860 Telling farmers that they could improve their yields 101 00:04:03,860 --> 00:04:05,300 by moving to a different elevation 102 00:04:05,300 --> 00:04:07,710 is not useful to them. 103 00:04:07,710 --> 00:04:08,990 Are they timely? 104 00:04:08,990 --> 00:04:11,500 Having information but not enough time to act on it 105 00:04:11,500 --> 00:04:13,790 provides little to no value. 106 00:04:13,790 --> 00:04:15,890 Are they delivered to the right people? 107 00:04:15,890 --> 00:04:18,140 You shouldn't build a system that provides information 108 00:04:18,140 --> 00:04:19,760 to frontline workers when decisions 109 00:04:19,760 --> 00:04:22,820 are made by their supervisors. 110 00:04:22,820 --> 00:04:24,700 While machine learning is a powerful tool, 111 00:04:24,700 --> 00:04:27,220 it is not always the best approach to all problems. 112 00:04:27,220 --> 00:04:29,020 Organizations should have sufficient reason 113 00:04:29,020 --> 00:04:31,520 to think that machine learning would add value and make sure 114 00:04:31,520 --> 00:04:33,610 that they evaluate that assumption before scaling 115 00:04:33,610 --> 00:04:36,750 solutions. 116 00:04:36,750 --> 00:04:39,810 The fourth criterion is explainability. 117 00:04:39,810 --> 00:04:43,560 How effectively is the use of machine learning communicated? 118 00:04:43,560 --> 00:04:46,370 It is important to ensure that the application is explained 119 00:04:46,370 --> 00:04:49,110 to end users in a way that is effectively communicating 120 00:04:49,110 --> 00:04:51,180 how outcomes were determined. 121 00:04:51,180 --> 00:04:53,760 Organizations seeking to apply machine learning outcomes 122 00:04:53,760 --> 00:04:55,860 without understanding the nuances of how 123 00:04:55,860 --> 00:04:58,650 models make decisions may use algorithm outputs 124 00:04:58,650 --> 00:05:00,850 inappropriately. 125 00:05:00,850 --> 00:05:03,250 Let's look back at the earlier example of gender 126 00:05:03,250 --> 00:05:06,660 differentiated credit scoring in the previous module. 127 00:05:06,660 --> 00:05:09,370 An explainable solution could include information 128 00:05:09,370 --> 00:05:11,647 on why a specific individual was denied a loan 129 00:05:11,647 --> 00:05:13,480 and which factors they could change in order 130 00:05:13,480 --> 00:05:17,340 for them to increase their credit worthiness. 131 00:05:17,340 --> 00:05:20,400 The fifth criterion is auditability. 132 00:05:20,400 --> 00:05:22,560 Can the models decision making processes 133 00:05:22,560 --> 00:05:26,140 be queried or monitored by external actors? 134 00:05:26,140 --> 00:05:29,050 Increasingly, organizations returning to black box machine 135 00:05:29,050 --> 00:05:31,720 learning solutions, whose inner workings can range from 136 00:05:31,720 --> 00:05:34,270 unintuitive to incomprehensible. 137 00:05:34,270 --> 00:05:37,000 It is important that the outputs can be monitored externally 138 00:05:37,000 --> 00:05:39,280 to show that the model is fair, unbiased, 139 00:05:39,280 --> 00:05:42,060 and does not harm some users. 140 00:05:42,060 --> 00:05:43,890 This may require additional infrastructure, 141 00:05:43,890 --> 00:05:46,680 whether it is institutional or legal frameworks that 142 00:05:46,680 --> 00:05:49,770 require audits, provide auditors with secure access 143 00:05:49,770 --> 00:05:51,810 to data and algorithms, and require people 144 00:05:51,810 --> 00:05:54,660 to act on those findings. 145 00:05:54,660 --> 00:05:58,010 The sixth criterion is equity. 146 00:05:58,010 --> 00:05:59,762 If used to guide decision making, 147 00:05:59,762 --> 00:06:01,220 has the machine learning model been 148 00:06:01,220 --> 00:06:04,280 tested to determine whether it is disproportionately 149 00:06:04,280 --> 00:06:06,620 harmful or beneficial to some individuals or groups 150 00:06:06,620 --> 00:06:08,180 more than others? 151 00:06:08,180 --> 00:06:09,710 Testing the results of algorithms 152 00:06:09,710 --> 00:06:12,800 against protected variables, such as gender, age, race, 153 00:06:12,800 --> 00:06:15,290 or skin color, is key to preventing the adoption 154 00:06:15,290 --> 00:06:17,870 of biased algorithms. 155 00:06:17,870 --> 00:06:20,460 Does a specific algorithm fail for specific groups 156 00:06:20,460 --> 00:06:23,110 more often than it does for other groups of people? 157 00:06:23,110 --> 00:06:25,880 Does it misclassified different groups in different directions? 158 00:06:25,880 --> 00:06:27,710 Or did certain groups have different rates 159 00:06:27,710 --> 00:06:31,180 of false positives or false negatives? 160 00:06:31,180 --> 00:06:32,950 It is also important to address the issue 161 00:06:32,950 --> 00:06:36,220 that accuracy and fairness are not necessarily correlated. 162 00:06:36,220 --> 00:06:38,590 Algorithms can be accurate technically but still 163 00:06:38,590 --> 00:06:41,140 inconsistent with the values that the organizations may 164 00:06:41,140 --> 00:06:43,840 want to promote when making decisions, such as who should 165 00:06:43,840 --> 00:06:46,300 be hired, who should receive medical care, 166 00:06:46,300 --> 00:06:48,490 or other similar decisions. 167 00:06:48,490 --> 00:06:51,180 Gaining an understanding of how these outcomes are derived 168 00:06:51,180 --> 00:06:52,870 and taking steps to mitigate them 169 00:06:52,870 --> 00:06:54,400 is an important piece in ensuring 170 00:06:54,400 --> 00:06:57,370 that unfair algorithms are not widely adopted and used. 171 00:07:00,370 --> 00:07:02,720 The seventh criterion is accountability 172 00:07:02,720 --> 00:07:04,730 or responsibility. 173 00:07:04,730 --> 00:07:06,980 If used to guide decision making, 174 00:07:06,980 --> 00:07:08,720 are there mechanisms in place to ensure 175 00:07:08,720 --> 00:07:10,850 that someone will be responsible for responding 176 00:07:10,850 --> 00:07:14,450 to feedback and redress harms, if necessary? 177 00:07:14,450 --> 00:07:16,790 For example, an algorithm might be 178 00:07:16,790 --> 00:07:19,610 used to assist in diagnosing medical conditions. 179 00:07:19,610 --> 00:07:22,610 But the final diagnosis should be still provided by a trained 180 00:07:22,610 --> 00:07:24,440 medical professional. 181 00:07:24,440 --> 00:07:27,170 When used by itself, the risk from false identifications 182 00:07:27,170 --> 00:07:30,690 from the algorithm can actually cause harm to individuals. 183 00:07:30,690 --> 00:07:33,270 However, consider if there is a shortage of trained 184 00:07:33,270 --> 00:07:35,100 medical professionals. 185 00:07:35,100 --> 00:07:38,160 Does the risk of misdiagnoses outweigh the risk 186 00:07:38,160 --> 00:07:40,167 of not treating people? 187 00:07:40,167 --> 00:07:42,500 These decisions are complicated, and there is not always 188 00:07:42,500 --> 00:07:44,920 a right answer. 189 00:07:44,920 --> 00:07:46,560 It is important to keep accountability 190 00:07:46,560 --> 00:07:49,410 and responsibility in mind when designing systems. 191 00:07:49,410 --> 00:07:51,690 Remember, an accountable setup both make sure 192 00:07:51,690 --> 00:07:54,300 that there are systems in place to prevent harmful errors 193 00:07:54,300 --> 00:07:56,530 and make sure someone is responsible for correcting 194 00:07:56,530 --> 00:07:57,030 errors. 195 00:07:59,780 --> 00:08:02,500 As a review, we talked about seven characteristics 196 00:08:02,500 --> 00:08:04,250 for the appropriate application of machine 197 00:08:04,250 --> 00:08:06,180 learning in this module. 198 00:08:06,180 --> 00:08:08,330 The first is relevance. 199 00:08:08,330 --> 00:08:11,270 The second is representativeness. 200 00:08:11,270 --> 00:08:13,850 The third is value. 201 00:08:13,850 --> 00:08:16,990 The fourth is explainability. 202 00:08:16,990 --> 00:08:20,080 The fifth is auditability. 203 00:08:20,080 --> 00:08:22,950 The sixth is equity. 204 00:08:22,950 --> 00:08:24,540 And the seventh is accountability 205 00:08:24,540 --> 00:08:26,600 and responsibility. 206 00:08:26,600 --> 00:08:28,700 Be sure to take this into consideration 207 00:08:28,700 --> 00:08:32,887 while you're implementing your own solutions. 208 00:08:32,887 --> 00:08:34,970 Thank you for taking the time to take this course. 209 00:08:34,970 --> 00:08:36,887 We hope that you'll continue to watch the rest 210 00:08:36,887 --> 00:08:39,680 of the modules in this series. 211 00:08:39,680 --> 00:08:45,430 [MUSIC PLAYING]