It has been said that algorithms are “opinions embedded in code.” Few people understand the implications of that better than Abeba Birhane. Born and raised in Bahir Dar, Ethiopia, Birhane moved to Ireland to study: first psychology, then philosophy, then a PhD in cognitive science at University College Dublin.
During her doctorate, she found herself surrounded by software developers and data science students—immersed in the models they were building and the data sets they were using. But she started to realize that no one was really asking questions about what was actually in those data sets.
Artificial intelligence has infiltrated almost every aspect of our lives: It can determine whether you get hired, diagnose you with cancer, or make decisions about whether to release prisoners on parole. AI systems are often trained on gargantuan data sets, usually scraped from the web for cost-effectiveness and ease. But this means AI can inherit all the biases of the humans who design them, and any present in the data that feeds them. The end result mirrors society, with all the ugliness baked in.
Failing to recognize this risks causing real-world harm. AI has already been accused of underestimating the health needs of Black patients and of making it less likely that people of color will be approved for a mortgage.
Birhane redirected her research toward investigating the data sets that are increasingly shaping our world. She wants to expose their biases and hold the giant corporations that design and profit from them to account. Her work has garnered global recognition. In October 2022, she even got the opportunity to talk about the harms of Big Tech at a meeting with the Dalai Lama.
Often, Birhane only has to scratch the surface of a data set before the problems jump out. In 2020, Birhane and colleague Vinay Prabhu audited two popular data sets. The first is “80 Million Tiny Images,” an MIT set that’s been cited in hundreds of academic papers and used for more than a decade to teach machine learning systems how to recognize people and objects. It was full of offensive labels—including racist slurs for images of Black people. In the other data set, ImageNet, they found pornographic content, including upskirt images of women, which ostensibly did not require the individuals’ explicit consent because they were scraped from the internet. Two days after the pair published their study, the MIT team apologized and took down the Tiny Images dataset.
These problems come from the top. Machine learning research is overwhelmingly male and white, a demographic world away from the diverse communities it purports to help. And Big Tech firms don’t just offer online diversions—they hold enormous amounts of power to shape events in the real world.
Birhane and others have branded this “digital colonialism”—arguing that the power of Big Tech rivals the old colonial empires. Its harms will not affect us all equally, she argues: As technology is exported to the global south, it carries embedded Western norms and philosophies along with it. It’s sold as a way of helping people in underdeveloped nations, but it’s often imposed on them without consultation, pushing them further into the margins. “Nobody in Silicon Valley stays up worrying about the unbanked Black women in a rural part of Timbuktu,” Birhane says.
Birhane believes shifting public attitudes will be the most effective driver of change: Big Tech firms respond more to outrage than bureaucratic rule changes. But she has no desire to live in a permanent cloud of bile: As a Black woman doing critical work, she has faced pushback from day one. “I don’t know if I can live my life fighting,” she says. Birhane—who now combines lecturing with a senior fellowship at the Mozilla Foundation—would prefer to let her research do the work. “I am a big proponent of ‘show the data,’” she says.
But Birhane does not think that will be enough—she is not optimistic that Big Tech will self-correct. For every problematic data set that’s revealed and corrected, another lies waiting. Sometimes nothing even changes: In 2021, Birhane and colleagues published a paper about a data set of more than 400 million images, called the LAION-400M data set, which returned explicit pornography when prompted with even mildly feminine words such as “mummy” or “aunty.” The paper triggered outrage, but the data set still exists and has swelled to more than 5 billion images. It recently won an award.
There’s a reason nothing has changed. While creating data sets for AI is fairly simple—just trawl the internet—auditing them is time-consuming and expensive. “Doing the dirty work is just a lot harder,” Birhane says. There is no incentive to make a clean data set—only a profitable one. But this means all that dirty work falls on the shoulders of researchers like Birhane, for whom sifting through these data sets—having to spend hours looking at racist imagery or rape scenes–takes a toll. “It’s really depressing,” she says. “It really can be traumatizing, looking at these things.”
In an ideal world, change would be driven by the vast resources of the tech companies, not by independent researchers. But corporations are not likely to overhaul their ways without considerable pressure. “I want, in an ideal world, a civilized system where corporations will take accountability and responsibility and make sure that the systems they’re putting out are as accurate and as fair and just for everybody,” Birhane says. “But that just feels like it’s asking too much.”
This article appears in the March/April 2023 edition of WIRED UK magazine.
Source : Wired