1 00:00:05,280 --> 00:00:08,640 [MUSIC PLAYING] 2 00:00:10,360 --> 00:00:12,530 AUDACE NAKESHIMANA: In our work on fairness and AI, 3 00:00:12,530 --> 00:00:14,990 we present a case study on natural language processing 4 00:00:14,990 --> 00:00:17,030 titled "Identifying and Mitigating 5 00:00:17,030 --> 00:00:20,140 Unintended Demographic Bias in Machine Learning." 6 00:00:20,140 --> 00:00:23,030 We will break down what each part of the title means. 7 00:00:23,030 --> 00:00:24,710 This is the work that was done jointly 8 00:00:24,710 --> 00:00:27,440 by Chris Sweeney and Maryam Najafian. 9 00:00:27,440 --> 00:00:29,330 My name is Audace Nakeshimana. 10 00:00:29,330 --> 00:00:32,930 I am a Researcher at MIT, and I'll be presenting their work. 11 00:00:32,930 --> 00:00:35,630 The content of the slides presents a high-level overview 12 00:00:35,630 --> 00:00:37,970 of a thesis project that was done throughout a course 13 00:00:37,970 --> 00:00:38,680 of the year. 14 00:00:38,680 --> 00:00:40,860 It will be released soon on MIT DSpace. 15 00:00:43,910 --> 00:00:46,360 AI has the power to impact society 16 00:00:46,360 --> 00:00:48,260 in a vast amount of ways. 17 00:00:48,260 --> 00:00:50,710 For example, in the banking industry, 18 00:00:50,710 --> 00:00:53,530 many companies are trying to use machine learning to figure out 19 00:00:53,530 --> 00:00:57,280 if someone will default on a loan given the data about them. 20 00:00:57,280 --> 00:00:59,590 Now, because machine learning is used 21 00:00:59,590 --> 00:01:01,960 in the high-stakes applications, errors 22 00:01:01,960 --> 00:01:04,900 that cause it to be unfair could cause discrimination, 23 00:01:04,900 --> 00:01:07,750 preventing certain demographic groups from gaining access 24 00:01:07,750 --> 00:01:09,160 to fair loans. 25 00:01:09,160 --> 00:01:10,990 This problem is especially important 26 00:01:10,990 --> 00:01:13,270 to address in developing nations where 27 00:01:13,270 --> 00:01:17,080 there may not be existing sophisticated credit systems. 28 00:01:17,080 --> 00:01:20,200 Those nations will have to rely on machine learning models 29 00:01:20,200 --> 00:01:23,410 to make these high-stakes decisions, such as alternative 30 00:01:23,410 --> 00:01:26,350 credit scoring mechanisms that are possibly going to be 31 00:01:26,350 --> 00:01:30,440 involving AI more and more. 32 00:01:30,440 --> 00:01:32,240 This work focuses on applications 33 00:01:32,240 --> 00:01:35,120 of machine learning in natural language processing. 34 00:01:35,120 --> 00:01:36,950 NLP is important to studying fairness 35 00:01:36,950 --> 00:01:39,800 in AI because it is used in many different domains, 36 00:01:39,800 --> 00:01:42,080 from education to marketing. 37 00:01:42,080 --> 00:01:43,910 Furthermore, there are many sources 38 00:01:43,910 --> 00:01:46,520 of unintended demographic bias in the standard natural 39 00:01:46,520 --> 00:01:48,260 language processing pipeline. 40 00:01:48,260 --> 00:01:51,410 Here we define the NLP pipeline as a combination 41 00:01:51,410 --> 00:01:53,720 of steps involved, from collecting natural language 42 00:01:53,720 --> 00:01:57,200 data to making decisions based on the NLP models trend 43 00:01:57,200 --> 00:01:58,950 and resulting data. 44 00:01:58,950 --> 00:02:01,700 Lastly, [? therefore, ?] natural language processing systems 45 00:02:01,700 --> 00:02:03,110 is easier to get. 46 00:02:03,110 --> 00:02:05,820 Unlike tabular systems from banking or health care, 47 00:02:05,820 --> 00:02:08,280 where companies may be reluctant to release data 48 00:02:08,280 --> 00:02:10,940 due to privacy concerns, NLP data, 49 00:02:10,940 --> 00:02:13,610 especially in widely spoken languages like French 50 00:02:13,610 --> 00:02:16,610 or English, is available from different sources, including 51 00:02:16,610 --> 00:02:19,580 social media and different forms of formal and informal 52 00:02:19,580 --> 00:02:23,300 publications, making it more effective to use in research 53 00:02:23,300 --> 00:02:27,580 on how to make NLP systems more fair. 54 00:02:27,580 --> 00:02:30,250 We now break down what unintended demographic bias 55 00:02:30,250 --> 00:02:31,190 means. 56 00:02:31,190 --> 00:02:33,100 The unintended part means that this bias 57 00:02:33,100 --> 00:02:36,310 comes as an adverse side effect, not deliberately learned 58 00:02:36,310 --> 00:02:37,870 in a machine learning model. 59 00:02:37,870 --> 00:02:40,390 The demographic part means that the bias translates 60 00:02:40,390 --> 00:02:43,348 into some sort of inequality between demographic groups 61 00:02:43,348 --> 00:02:45,640 that could cause discrimination in a downstream machine 62 00:02:45,640 --> 00:02:47,000 learning model. 63 00:02:47,000 --> 00:02:50,230 And finally, bias is an artifact of natural language processing 64 00:02:50,230 --> 00:02:53,350 pipeline that causes this unfairness. 65 00:02:53,350 --> 00:02:55,420 Bias is [INAUDIBLE] term. 66 00:02:55,420 --> 00:02:57,280 Therefore, it is important that we 67 00:02:57,280 --> 00:02:59,140 center on a specific form of bias that 68 00:02:59,140 --> 00:03:02,380 causes unfairness in typical machine learning applications. 69 00:03:02,380 --> 00:03:03,970 In gender-based demographic bias, 70 00:03:03,970 --> 00:03:05,860 for example, machine learning model 71 00:03:05,860 --> 00:03:09,520 might associate specific types of jobs to specific gender 72 00:03:09,520 --> 00:03:11,800 just because it's the way it is in the data used 73 00:03:11,800 --> 00:03:12,730 to train the model. 74 00:03:16,470 --> 00:03:19,110 Within unintended demographic bias, 75 00:03:19,110 --> 00:03:20,730 there are two different types of bias 76 00:03:20,730 --> 00:03:22,920 that will focus on in natural language processing 77 00:03:22,920 --> 00:03:24,210 applications. 78 00:03:24,210 --> 00:03:26,940 These are bias in sentiment analysis systems 79 00:03:26,940 --> 00:03:29,550 that analyze positive or negative feelings associated 80 00:03:29,550 --> 00:03:32,340 with words or phrases and toxicity analysis 81 00:03:32,340 --> 00:03:35,820 systems designed to detect derogatory or offensive terms 82 00:03:35,820 --> 00:03:36,810 in words or phrases. 83 00:03:40,020 --> 00:03:42,120 Sentiment bias refers to an artifact 84 00:03:42,120 --> 00:03:45,000 of the machine learning pipeline that causes unfairness 85 00:03:45,000 --> 00:03:47,130 in sentiment analysis systems. 86 00:03:47,130 --> 00:03:50,040 And toxicity bias is an artifact of the pipeline that 87 00:03:50,040 --> 00:03:53,040 causes unfairness in systems that tries to predict toxicity 88 00:03:53,040 --> 00:03:54,330 from text. 89 00:03:54,330 --> 00:03:57,202 In either sentiment analysis or toxicity prediction, 90 00:03:57,202 --> 00:03:59,160 it is important that our machine learning model 91 00:03:59,160 --> 00:04:01,740 doesn't use sensitive attributes describing 92 00:04:01,740 --> 00:04:04,650 someone's demographic to inform them whether a sentence should 93 00:04:04,650 --> 00:04:07,020 be positive or negative sentiment or toxic 94 00:04:07,020 --> 00:04:09,890 or less toxic. 95 00:04:09,890 --> 00:04:12,940 Toxicity classification is used in a wide variety 96 00:04:12,940 --> 00:04:14,390 of applications. 97 00:04:14,390 --> 00:04:17,560 For example, it can be used to censor online comments that 98 00:04:17,560 --> 00:04:19,600 are too toxic or offensive. 99 00:04:19,600 --> 00:04:23,260 Unfortunately, these algorithms can be very unfair. 100 00:04:23,260 --> 00:04:25,650 For example, the decision of whether sentence 101 00:04:25,650 --> 00:04:28,630 is toxic or non-toxic can depend solely 102 00:04:28,630 --> 00:04:30,310 on the demographic identity term, 103 00:04:30,310 --> 00:04:34,420 such as American or Mexican, that appears in the sentence. 104 00:04:34,420 --> 00:04:36,820 This unfairness can be caused by many different artifacts 105 00:04:36,820 --> 00:04:39,430 of the natural language processing pipeline. 106 00:04:39,430 --> 00:04:42,550 For instance, certain nationalities and ethnic groups 107 00:04:42,550 --> 00:04:45,190 are specifically more frequently marginalized. 108 00:04:45,190 --> 00:04:47,830 And this is reflected in the language usually associated 109 00:04:47,830 --> 00:04:48,760 with them. 110 00:04:48,760 --> 00:04:52,050 Therefore, training NLP algorithms and resulting data 111 00:04:52,050 --> 00:04:54,040 sets could result in a certain form 112 00:04:54,040 --> 00:04:55,705 of unintended demographic bias. 113 00:04:58,990 --> 00:05:02,090 We want to drive home the point of unintended demographic bias 114 00:05:02,090 --> 00:05:03,850 versus unfairness. 115 00:05:03,850 --> 00:05:06,190 Unintended demographic bias can enter a typical machine 116 00:05:06,190 --> 00:05:08,990 learning pipeline from a wide variety of sources, 117 00:05:08,990 --> 00:05:11,340 from the word corpus to the word embedding, 118 00:05:11,340 --> 00:05:14,330 the data sent to the algorithm, and finally from the thresholds 119 00:05:14,330 --> 00:05:15,710 used to make decisions. 120 00:05:15,710 --> 00:05:18,533 The possible unfairness or the discrimination 121 00:05:18,533 --> 00:05:20,450 comes at the point where this machine learning 122 00:05:20,450 --> 00:05:23,090 model meets society and actually causes harm. 123 00:05:23,090 --> 00:05:25,610 This work addresses mitigating and identifying 124 00:05:25,610 --> 00:05:28,400 unintended demographic bias at each stage 125 00:05:28,400 --> 00:05:30,290 in the natural language processing pipeline, 126 00:05:30,290 --> 00:05:32,330 from the words corpus to the decision level. 127 00:05:35,170 --> 00:05:36,880 Our big goal here is to find ways 128 00:05:36,880 --> 00:05:38,860 to mitigate the bias that we might inherently 129 00:05:38,860 --> 00:05:41,320 find in the text corpora or other types of data 130 00:05:41,320 --> 00:05:45,340 representation that are used to build NLP applications. 131 00:05:45,340 --> 00:05:47,340 For this module, we cover measuring 132 00:05:47,340 --> 00:05:49,780 unintended demographic bias in word embeddings 133 00:05:49,780 --> 00:05:52,450 and using adversarial learning to mitigate word embedding 134 00:05:52,450 --> 00:05:53,870 bias. 135 00:05:53,870 --> 00:05:55,810 The corresponding thesis goes further, 136 00:05:55,810 --> 00:05:58,360 and it covers techniques for identifying and mitigating 137 00:05:58,360 --> 00:06:01,360 unintended demographic bias at other stages of the NLP 138 00:06:01,360 --> 00:06:03,570 pipeline. 139 00:06:03,570 --> 00:06:06,375 We now cover the work as measuring word embedding bias. 140 00:06:09,480 --> 00:06:12,150 Word embeddings [? encode ?] [? text ?] into vector spaces 141 00:06:12,150 --> 00:06:15,000 where distances between words describe a certain semantic 142 00:06:15,000 --> 00:06:16,080 meaning. 143 00:06:16,080 --> 00:06:18,380 This allows one to complete the analogy of man 144 00:06:18,380 --> 00:06:20,710 is to woman as king is to queen. 145 00:06:20,710 --> 00:06:24,360 Unfortunately, researchers Tolga Balukbasi and others 146 00:06:24,360 --> 00:06:26,880 found that even for word embeddings trained from Google 147 00:06:26,880 --> 00:06:30,910 News articles, there exists bias in word embedding space, where 148 00:06:30,910 --> 00:06:34,500 the analogy becomes man is to woman as computer programmer is 149 00:06:34,500 --> 00:06:37,410 to homemaker, another word for a housewife. 150 00:06:37,410 --> 00:06:40,170 This is concerning given that word embeddings could 151 00:06:40,170 --> 00:06:42,690 be used in natural language processing applications devoted 152 00:06:42,690 --> 00:06:45,570 to predicting whether someone should get a certain job. 153 00:06:45,570 --> 00:06:48,870 However, it is difficult to quantify the bias just based 154 00:06:48,870 --> 00:06:52,230 on the vector space analogies. 155 00:06:52,230 --> 00:06:55,740 In this work, researchers Sweeney and Najafian 156 00:06:55,740 --> 00:06:58,090 develop a system to measure sentiment bias 157 00:06:58,090 --> 00:07:00,810 in word embeddings to a specific number. 158 00:07:00,810 --> 00:07:04,230 The way they do this is they take the bias toward embeddings 159 00:07:04,230 --> 00:07:06,210 and use them to initialize an unbiased 160 00:07:06,210 --> 00:07:08,215 labeled word sentiments data set. 161 00:07:08,215 --> 00:07:12,300 They train a logistic regression classifier on this data set, 162 00:07:12,300 --> 00:07:14,100 and they predict negative sentiment 163 00:07:14,100 --> 00:07:16,500 for a set of identity terms. 164 00:07:16,500 --> 00:07:19,160 For example, in this case, this is 165 00:07:19,160 --> 00:07:21,690 a set of identity terms describing demographics 166 00:07:21,690 --> 00:07:23,920 from different national origins. 167 00:07:23,920 --> 00:07:25,530 They analyzed the negative sentiment 168 00:07:25,530 --> 00:07:27,720 for each identity term and predict 169 00:07:27,720 --> 00:07:33,460 a score that describes the bias in word embeddings. 170 00:07:33,460 --> 00:07:35,950 This score is the divergence between the [INAUDIBLE] 171 00:07:35,950 --> 00:07:37,810 for abilities, for negative sentiment, 172 00:07:37,810 --> 00:07:40,010 for national origin, identity terms, 173 00:07:40,010 --> 00:07:41,640 and the uniform distribution. 174 00:07:41,640 --> 00:07:44,320 The uniform distribution describes a perfectly fair 175 00:07:44,320 --> 00:07:46,690 case, wherein a demographic is receiving 176 00:07:46,690 --> 00:07:49,360 an equal amount of sentiment in the word embedding model. 177 00:07:52,020 --> 00:07:54,990 Now that we have a grasp on the word embedding bias, 178 00:07:54,990 --> 00:08:00,030 we can start to figure out how to mitigate some of this bias. 179 00:08:00,030 --> 00:08:02,820 In the thesis, Sweeney and Nafajian 180 00:08:02,820 --> 00:08:04,800 describe how they use adversarial learning 181 00:08:04,800 --> 00:08:06,780 to debias word embeddings. 182 00:08:06,780 --> 00:08:09,570 Different identity terms can be more or less correlated 183 00:08:09,570 --> 00:08:11,880 with positive or negative sentiment. 184 00:08:11,880 --> 00:08:15,570 For example, words like American, Mexican, and German 185 00:08:15,570 --> 00:08:18,720 can have more correlations with negative sentiment subspaces 186 00:08:18,720 --> 00:08:20,400 and positive sentiment subspaces, 187 00:08:20,400 --> 00:08:22,515 because in the data sets used, it 188 00:08:22,515 --> 00:08:24,390 might appear to be more frequently associated 189 00:08:24,390 --> 00:08:26,700 with negative or positive sentiments. 190 00:08:26,700 --> 00:08:27,690 This is concerning. 191 00:08:27,690 --> 00:08:29,273 Even downstream machine learning model 192 00:08:29,273 --> 00:08:30,990 picks up on these correlations. 193 00:08:30,990 --> 00:08:34,230 Ideally, you want to have each of those identity terms 194 00:08:34,230 --> 00:08:36,720 to a neutral point between negative and positive sentiment 195 00:08:36,720 --> 00:08:39,929 subspaces without distorting their meaning within the vector 196 00:08:39,929 --> 00:08:44,790 space so that the word embedding model can still be useful. 197 00:08:44,790 --> 00:08:48,240 They use an adversarial learning algorithm to achieve this. 198 00:08:48,240 --> 00:08:50,610 More details of this algorithm are described 199 00:08:50,610 --> 00:08:51,780 in the corresponding phases. 200 00:08:54,860 --> 00:08:57,350 I now present some of their work in evaluating 201 00:08:57,350 --> 00:08:59,750 how adversarial learning algorithms can 202 00:08:59,750 --> 00:09:02,420 debias word embeddings and make the resulting natural language 203 00:09:02,420 --> 00:09:04,900 processing system more fair. 204 00:09:04,900 --> 00:09:08,490 We focus on realistic systems in both sentiment analysis 205 00:09:08,490 --> 00:09:10,410 and toxicity prediction. 206 00:09:10,410 --> 00:09:13,200 For each application, Sweeney and Najafian 207 00:09:13,200 --> 00:09:14,880 define fairness metrics to let us 208 00:09:14,880 --> 00:09:17,370 know whether the debiased word embeddings are actually 209 00:09:17,370 --> 00:09:19,690 helping. 210 00:09:19,690 --> 00:09:22,180 These fairness metrics often come in the form 211 00:09:22,180 --> 00:09:24,040 of a templates data set. 212 00:09:24,040 --> 00:09:27,070 Researchers have created these data sets to somewhat tease out 213 00:09:27,070 --> 00:09:30,640 different biases with respect to different demographic groups. 214 00:09:30,640 --> 00:09:33,550 For example, this set is meant to tease out 215 00:09:33,550 --> 00:09:36,640 biases between African-American names and European-American 216 00:09:36,640 --> 00:09:41,590 names when substituting each name out in the same sentence. 217 00:09:41,590 --> 00:09:43,390 Similar template data sets have been 218 00:09:43,390 --> 00:09:46,090 created for toxicity classification algorithms, 219 00:09:46,090 --> 00:09:48,520 where you sub out different demographic identity 220 00:09:48,520 --> 00:09:51,010 terms within a sentence and compare differences 221 00:09:51,010 --> 00:09:53,100 in the overall toxicity predictions. 222 00:09:56,140 --> 00:09:58,590 Sweeney and Najafian used these templates data 223 00:09:58,590 --> 00:10:02,290 sets to compute fairness for a real-world toxicity classifier. 224 00:10:02,290 --> 00:10:04,900 This graph shows per-term AUC distributions 225 00:10:04,900 --> 00:10:07,750 for CNN convolution neural network 226 00:10:07,750 --> 00:10:10,960 that was trained on a toxicity classification data set. 227 00:10:10,960 --> 00:10:13,390 The x-axis represents each demographic group, 228 00:10:13,390 --> 00:10:16,170 where the templates data set has that identify term subbed 229 00:10:16,170 --> 00:10:17,980 in for each sentence. 230 00:10:17,980 --> 00:10:22,330 Each dot describes a particular training run of the CNN. 231 00:10:22,330 --> 00:10:25,390 The y-axis describes the area under the curve accuracy 232 00:10:25,390 --> 00:10:27,430 for this template's data set. 233 00:10:27,430 --> 00:10:29,710 One can see that there is a lot of disparity 234 00:10:29,710 --> 00:10:33,400 between the accuracies for different demographic groups. 235 00:10:33,400 --> 00:10:36,400 Ideally, you would want the variance in different training 236 00:10:36,400 --> 00:10:38,980 runs to be compressed as well as the differences 237 00:10:38,980 --> 00:10:41,440 between each demographic group in the AUC scores 238 00:10:41,440 --> 00:10:42,670 to be smaller. 239 00:10:42,670 --> 00:10:45,970 Sweeney and Najafian show a toxicity classification 240 00:10:45,970 --> 00:10:48,820 algorithm that uses the debias towards embeddings 241 00:10:48,820 --> 00:10:52,060 creates better results. 242 00:10:52,060 --> 00:10:55,290 This slide shows results for per-term AUC distributions 243 00:10:55,290 --> 00:10:58,870 for the CNN with different debias treatments. 244 00:10:58,870 --> 00:11:02,680 Sweeney and Najafian measure how their word embedding debiasing 245 00:11:02,680 --> 00:11:05,510 compares to other state-of-the-art techniques. 246 00:11:05,510 --> 00:11:08,500 Further discussion and evaluation of these graphs 247 00:11:08,500 --> 00:11:12,540 are presented in the corresponding thesis. 248 00:11:12,540 --> 00:11:16,650 To wrap up, we describe some key takeaways from this project. 249 00:11:16,650 --> 00:11:18,963 First, there is no silver bullet. 250 00:11:18,963 --> 00:11:20,880 There are many different types of applications 251 00:11:20,880 --> 00:11:23,370 and various types of bias to correct for when trying 252 00:11:23,370 --> 00:11:25,860 to make NLP systems more fair. 253 00:11:25,860 --> 00:11:29,160 Second, bias can emanate from any stage of the machine 254 00:11:29,160 --> 00:11:30,510 learning pipeline. 255 00:11:30,510 --> 00:11:33,810 Therefore, having to also identify and mitigate bias 256 00:11:33,810 --> 00:11:37,170 at all stages of the machine learning pipeline is essential. 257 00:11:37,170 --> 00:11:40,800 Finally, we focus on solving this problem within an academic 258 00:11:40,800 --> 00:11:43,470 context for natural language processing pipeline, 259 00:11:43,470 --> 00:11:48,100 but this cannot all be solved in academia. 260 00:11:48,100 --> 00:11:51,030 For example, much of the unintended bias in the data 261 00:11:51,030 --> 00:11:54,610 set, like the text corpus, could come from decisions made 262 00:11:54,610 --> 00:11:56,710 upstream in direct collection. 263 00:11:56,710 --> 00:11:58,720 Furthermore, unintended bias could 264 00:11:58,720 --> 00:12:00,610 come from decisions made when deploying 265 00:12:00,610 --> 00:12:02,430 the model into society. 266 00:12:02,430 --> 00:12:04,390 When the model is used in a way that does not 267 00:12:04,390 --> 00:12:07,730 resonate with how the data was collected in the first place, 268 00:12:07,730 --> 00:12:09,730 this could cause discrimination. 269 00:12:09,730 --> 00:12:11,860 An example of this is when the data collected 270 00:12:11,860 --> 00:12:14,110 from a specific demographic population 271 00:12:14,110 --> 00:12:17,260 is used to make predictions that affect other demographics that 272 00:12:17,260 --> 00:12:20,050 were not taken into account during data collection. 273 00:12:20,050 --> 00:12:23,178 Finally, it is important to have efficient channels of feedback 274 00:12:23,178 --> 00:12:24,595 for these machine learning models. 275 00:12:27,400 --> 00:12:29,410 The work presented in this module 276 00:12:29,410 --> 00:12:32,620 highlights why fairness is a very important concept. 277 00:12:32,620 --> 00:12:35,650 It is therefore critical for data scientists and engineers 278 00:12:35,650 --> 00:12:38,350 to measure and understand performance of their models 279 00:12:38,350 --> 00:12:42,490 not just through accuracy, but also through fairness. 280 00:12:42,490 --> 00:12:45,840 [MUSIC PLAYING]