top of page

This New Way to Train AI Could Curb Online Harassment

Misogyny on the internet often slips through the filters of content moderators. A new method hopes to inject more nuance into the process.

FOR ABOUT SIX months last year, Nina Nørgaard met weekly for an hour with seven people to talk about sexism and violent language used to target women in social media. Nørgaard, a PhD candidate at IT University of Copenhagen, and her discussion group were taking part in an unusual effort to better identify misogyny online. Researchers paid the seven to examine thousands of Facebook, Reddit, and Twitter posts and decide whether they evidenced sexism, stereotypes, or harassment. Once a week, the researchers brought the group together, with Nørgaard as a mediator, to discuss the

tough calls where they disagreed.

Misogyny is a scourge that shapes how women are represented online. A 2020 Plan International study, one of the largest ever conducted, found that more than half of women in 22 countries said they had been harassed or abused online. One in five women who encountered abuse said they changed their behavior—cut back or stopped use of the internet—as a result.

Social media companies use artificial intelligence to identify and remove posts that demean, harass, or threaten violence against women, but it’s a tough problem. Among researchers, there’s no standard for identifying sexist or misogynist posts; one recent paper proposed four categories of troublesome content, while another identified 23 categories. Most research is in English, leaving people working in other languages and cultures with even less of a guide for difficult and often subjective decisions.

So the researchers in Denmark tried a new approach, hiring Nørgaard and the seven people full-time to review and label posts, instead of relying on part-time contractors often paid by the post. They deliberately chose people of different ages and nationalities, with varied political views, to reduce the chance of bias from a single worldview. The labelers included a software designer, a climate activist, an actress, and a health care worker. Nørgaard’s task was to bring them to a consensus.

“The great thing is that they don't agree. We don't want tunnel vision. We don't want everyone to think the same,” says Nørgaard. She says her goal was “making them discuss between themselves or between the group.”

Nørgaard viewed her job as helping the labelers “find the answers themselves.” With time, she got to know each of the seven as individuals, and who, for example, talked more than others. She tried to make sure no individual dominated the conversation, because it was meant to be a discussion, not a debate.

“The great thing is that they don't agree. We don't want tunnel vision. We don't want everyone to think the same.” - NINA NØRGAARD, PHD STUDENT AND MODERATOR

The toughest calls involved posts with irony, jokes, or sarcasm; they became big topics of conversation. Over time, though, “the meetings became shorter and people discussed less, so I saw that as a good thing,” Nørgaard says.

The researchers behind the project call it a success. They say the conversations led to more accurately labeled data to train an AI algorithm. The researchers say AI fine-tuned with the data set can recognize misogyny on popular social media platforms 85 percent of the time. A year earlier, a state-of-the-art misogyny detection algorithm was accurate about 75 percent of the time. In all, the team reviewed nearly 30,000 posts, 7,500 of which were deemed abusive.

The posts were written in Danish, but the researchers say their approach can be applied to any language. “I think if you're going to annotate misogyny, you have to follow an approach that has at least most of the elements of ours. Otherwise, you're risking low-quality data, and that undermines everything,” says Leon Derczynski, a coauthor of the study and an associate professor at IT University of Copenhagen.


The findings could be useful beyond social media. Businesses are beginning to use AI to screen job listings or publicly facing text like press releases for sexism. If women exclude themselves from online conversations to avoid harassment, that will stifle democratic processes.

“If you're going to turn a blind eye to threats and aggression against half the population, then you won't have as good democratic online spaces as you could have,” Derczynski said.

The survey of online sexism and harassment last year by the nonprofit Plan International found that attacks were most common on Facebook, followed by Instagram, WhatsApp, and Twitter. That survey found that online attacks against women tend to focus on abusive language, deliberate acts of embarrassment like body shaming, and threats of sexual violence.

In its State of Online Harassment report released in January, Pew Research said a higher percentage of respondents reported threats of sexual harassment and stalking last year than in a 2017 survey. Pew found that men are more likely to experience online harassment but that women are far more likely to experience stalking or sexual harassment, and more than twice as likely to come away from a harassment episode feeling extremely upset about the encounter. Approximately half of women surveyed said they encountered harassment based on their gender. Similar numbers of people surveyed who identify as Black or Latinx said they felt they were targeted because of their race or ethnicity.

Labeling data may seem banal, but labeled data is the fuel that makes machine-learning algorithms work. AI ethics and fairness researchers have called for the makers of AI to pay more attention to data sets used to train large language models like OpenAI’s text generator GPT-3 or the ImageNet model for recognizing objects in photos. Both models are widely known for advancing the field of AI, but they’ve been shown to produce racist and sexist content or classifications.

The Danish study is one of a series of recent works attempting to improve how people use AI to recognize and remove misogyny from online forums.

Researchers from the Alan Turing Institute and UK-based universities also trained annotators and a mediator to review more than 6,500 Reddit posts for a paper presented at a conference in April. The researchers said they focused on Reddit because it’s “increasingly home to numerous misogynistic communities.”

In the Turing Institute study, data labelers read through posts in chronological order, in order to understand the context of a conversation, rather than drawing conclusions from a single post. As in the Danish study, the researchers convened meetings to seek consensus about how a post should be labeled. As a result, they claim 92 percent accuracy when identifying misogyny in online content using a language model fine-tuned with their data set.

Elisabetta Fersini is an associate professor at the University of Milan-Bicocca in Italy who has studied misogyny in social media since 2017. In collaboration with a Spanish university and Google’s Jigsaw unit, Fersini and some colleagues launched a competition this week to improve detection of online memes with objectification, violence, body shaming, or other types of misogyny. Facebook hosted a similar effort, the hateful meme challenge, last year.

Fersini called the Danish researchers’ approach a helpful contribution to labeling data and building robust AI models. She applauds the study for including posts from multiple social media networks, since many studies rely on data from a single network. But she thinks the research could have taken a more fine-grained approach to labeling data, like that used by researchers from the Turing Institute.

In her work, Fersini said she’s observed some commonalities in misogyny online. Insults like referring to a woman as a female dog, for example, are fairly universal, but misogyny is manifested differently in different languages. Online posts in Spanish, for instance, have a higher proportion of sexist content related to dominance, while Italian social media users lean toward stereotypes and objectification, and English speakers seek to discredit women more often than their Italian or Spanish counterparts, she says.

“Misogyny depends on the culture and on the social demographic attributes of people seeing a specific image or text.”


The grammatical structure of a language can also complicate matters. For example: Saying “You are beautiful” in English does not connote a specific gender, but the same sentence in a Romance language like Italian or Spanish can indicate it is being addressed to a woman. And languages like Finnish have gender-neutral pronouns.

“Misogyny depends on the culture and on the social demographic attributes of people seeing a specific image or text,” Fersini says. She advocates conducting research in multiple languages. “Our perception could be completely different, and this is because of many factors: where I live, level of education, type of education, and relationship with a specific religion.”

In the Danish research, for example, the most common form of misogyny detected was “neosexism,” which denies that misogyny exists, based on a belief that women have achieved equality. Neosexism was first proposed in the 1990s by researchers in Canada. Subsequent research since then has revealed the presence of the phenomenon in Scandinavian nations like Denmark and Sweden. The Danish researchers say it’s unclear how common neosexism is in other societies, but they suggest future research include the term when labeling specific kinds of misogyny.

Pulkit Parikh, a doctoral student at the International Institute of Information Technology in Hyderabad, India, says that in his experience, annotators labeling sexism and misogyny often disagree. In 2019, Parikh and colleagues worked with data labelers to create a data set based on accounts witnessed or experienced by people around the world gathered from the Everyday Sexism Project.

Earlier this year that data set was used to create a methodology to detect sexism or misogyny, with 23 categories ranging from hypersexualization to hostile working environment to sexual harassment or “mansplaining.” Annotators found that nearly half of the posts reviewed could be defined as containing multiple forms of sexism or misogyny.

The Danish study offered other insights into improving AI. After the study was complete, researchers asked data labelers how they could improve their methodology. The most common response: more time to discuss disagreements over labels.

“That they needed more time tells you it's hard,” says Mary Gray, an anthropologist and senior principal researcher at Microsoft. She is coauthor of Ghost Work, a book published in 2018 about crowdworkers doing tasks like data labeling through platforms like Amazon's Mechanical Turk.

Spokespeople from Facebook and Twitter declined to respond to questions about how those companies label data used to train AI to detect misogyny online. Traditionally, Gray said, data labeling for social media companies training AI for content moderation is done by contractors looking at material that users have reported as harassment, with few insights into the context or nuance behind it. She says that approach is not helpful for assessing violent speech, which is “swimming in the world of ambiguity.”

“That they needed more time tells you it's hard.” - MARY GRAY, ANTHROPOLOGIST AND SENIOR PRINCIPAL RESEARCHER, MICROSOFT

“My colleagues in engineering and computer science in the commercial space don't know just how challenging this is, because they have such a reductive sense of humanity,” she says. Gray says the approaches taken by the Danish and Turing researchers have “a much more nuanced sense of humanity and individuals, but it's still thinking of individuals, and that's going to break the system eventually.”

She thinks using a mediator in the labeling process can be a step forward, but tackling online harassment requires more than a good algorithm. “What bothers me about that approach is that it assumes there could ever be a set of annotators that could look over a corpus and produce a classifier that applies to everybody in the world,” she says.

Multiple studies have found that misogyny is a common characteristic among people who carry out mass shootings. A review earlier this year by Bloomberg found that between 2014 and 2019, almost 60 percent of shooting incidents with four or more casualties involved an aggressor with a history of—or in the act of—domestic violence. Accusations of stalking and sexual harassment are also common among mass shooters.

Gray thinks posts considered potentially misogynistic should be flagged, then put in the hand of a mediator, rather than automating decision making through AI, which can lead to, for example, Black Lives Matter activists getting kicked off Facebook instead of white supremacists. That’s a challenge for social media companies, because it means technology alone cannot fix the problem.

“Most parents can't understand their teenagers,” she says. “I don't know why we're not using that same logic when we're talking about building a classifier to do anything that has to do with words online, let alone these very nuanced ways of delivering pain.” She says it’s naive to think “there's something easily classifiable about how humans and groups will express something as complicated as harassment.”

Earlier studies also tried to encourage consensus among data labelers as a way to overcome ambiguity. In a 2018 study, researchers from SAFElab, which combines social work, computer science, and the expertise of young people, worked with local experts in Chicago to label tweets associated with gang violence. That project found that AI analyzing tweets can recognize instances when a retaliatory shooting may occur after a homicide. Assembling that data set also called for consensus among annotators when labeling content.

“Having a process for which you study disagreement became really important,” says Desmond Patton, a professor at Columbia University and director of SAFElab. “You can learn from those disagreements how to improve your labeling process.”

By Khari Johnson is a senior writer for WIRED covering artificial intelligence and the positive and negative ways AI shapes human lives. He was previously a senior writer at VentureBeat, where he wrote stories about power, policy, and novel or noteworthy uses of AI by businesses and governments.


bottom of page