Megan Squire: Deep dive into data sheds light on toxic online commmunities

Professor of Computer Science Megan Squire delivered the Distinguished Scholar Lecture Thursday night about her research into connections among online communities. 

Explaining how she works with data to identify trends, relationships and networks, Professor of Computer Science Megan Squire turned to a heap of Legos — disconnected and colorful, and seemingly without order. 

Sharing what she said was her “favorite graphic,” she explained that data science is the process of taking that pile of Legos, sorting them by color, arranging them so they make sense, and then presenting them in single-color stacks that allow you to hear what the data is trying to tell you. 

“Hopefully, when you get down to the bottom of it, you’re telling a story that people actually care about,” Squire said Sept. 20 during her Distinguished Scholar Lecture in McKinnon Hall. 

The stories Squire told through her research have offered insight into the nature of the communications and connections among online communities, with recent work to demonstrate the overlapping memberships of groups such as neo-Nazis, white nationalists, anti-immigration and other extremists. Her large body of work has garnered national media attention, with an article in The New Yorker published this week that cited Squire’s earlier research into how celebrated programmer Linus Torvalds often communicates with fellow programmers using abusive language. 

Squire’s lecture followed her selection for the Distinguished Scholar Award, an annual honor from Elon University for a faculty member whose research has earned the respect and praise from peers for making significant contributions to their field of study.

For her work in the area of open online communities and open source software, Squire has generated international acclaim as a global leader in the area. In 2004, Squire launched FLOSSmole, an online effort to gather, share and analyze data from free and open source software projects that has offered insight into how computer programmers interact with each other and online communities. 

Squire explained that an early project centered on humor — how computer programmers joked with each other as they interacted online. The goal was to use data pulled from publicly available online communications to build a collection of jokes, and use that data to help teach a computer to generate and detect jokes. 

“We found we could pretty quickly build out a nice collection of jokes,” Squire said. 

In examining that data, Squire kept seeing the use of the phrase “Aunt Tillie,” which she discovered had become widely used a phrase used by software developers for “an old lady who doesn’t know how to use their computer.” That led her to begin examining how some language in these online communications was used to mock women and also racial minorities. It was a first step into exploring toxicity in online interactions.

Her research would eventually lead her to Linus Torvalds, the programmer responsible for the Linux operating system that Squire said supports half the internet. Squires says he is known as “one of the rudest people on the internet.” He has claimed that “his culture is cursing” and he’s known for insulting programmers he interacts with, Squire said. 

Squire examined 46,000 publicly available emails from Torvalds and his second-in-charge, Greg Kroah-Hartman, and researched whether it was possible to determine the messages’ authors based solely upon the language used. The examination found that if the email contained the word “thanks,” there was a 98 percent chance that it was written by Kroah-Hartman. In contrast, if the emails contained an expletive, there was a 94 percent chance it was written by Torvalds. If the emails contained “actually,” there was a 90 percent chance Torvalds wrote it, while if it contained “sorry,” there was a 95 percent chance it was from Kroah-Hartman. 

It’s that data that factored into the recent article in The New Yorker, “After Years of Abusive E-mails, the Creator of Linux Steps Aside,” published Sept. 19 about how the culture that Torvalds had created would eventually lead to his departure. Squire was interviewed by the reporter and offered her insights into how Torvalds communicates online.

Each of these projects was leading Squire closer to research she would undertake in 2017 focused on the planned “Unite the Right” rally in Charlottesville, Virginia. Squire had seen a documented rise in extremism among tech communities, and in focusing on the planned Unite the Right rally, she began to examine how online communities of extremists were connected. 

“I had already seen these communities,” Squire said. “I’ve seen what they are like. I see what they do.”

Using data from Facebook that was publicly available at the time, Squire began creating a database of the members of Facebook groups associated with right-wing extremist groups. Squire then began to compare membership rosters between groups to see what overlap there was. For instance, were members of a neo-Confederate group also members of a neo-Nazi group? Were members of anti-immigration groups also members of white nationalist groups?

The idea was to get a picture of the network of these groups that would potentially be “united” at the rally in Charlottesville. Squire emphasizes that she has not publicly shared the identities of individuals who are members of these groups and has no interest in publicizing the membership roles, but instead is focused on looking at the connections between various groups. 

Her work would eventually produce a dataset of 700,200 Facebook users that were members of 1,870 groups that fell into 10 categories of radical right-wing ideologies. She saw that the Facebook event page for the “Unite the Right” rally was connected across a very broad range of groups. 

“I was very surprised to see this, but then again, I had been watching those numbers (of expected attendees at the rally) go up all summer,” Squire said. “Everybody is coming to this party. This is going to be a big deal.”

Squire traveled to Charlottesville the weekend of the rally to offer a counterprotest to those gathered for the “Unite the Right” rally, and her work later attracted the attention of the national media, with articles following about her work to examine these connections between these extremist groups. 

“Once you put yourself out there as being on a side, then the backlash starts,” Squire said. 

Because of her work, Squire has been targeted online by members of various groups she has focused on and has received threats to her personal safety. That has not deterred her about the work that she is doing. 

She’s now focused on examining the language used in the growing area of racist podcasts to look for similarities in language and communication style. That research is supported by a small grant from Google that is allowing her to use the tech giant’s speech-to-text software. 

She regularly receives datasets from people looking at ways to examine toxic online communities. 

“You never know where the next cool data set is going to come from,” Squire said. “The data just comes, and I have to figure out what’s useful and figure out what to do with it.”