Details of the session
danah boyd is known for her incisive analysis of where our use of social networks is taking us. In her keynote talk at the WWW2010 conference in Raleigh April 29, she extended in a new direction from her highly publicized speech this spring at the South By Southwest conference in Austin, where she made a big splash discussing the implications of sharing our personal information online. She delivered a rousing challenge to the Web engineers and entrepreneurs in the audience to think before they act.
Her WWW2010 talk, titled "Privacy and Publicity in the Context of Big Data," zeroed in on the theme of Web 2.0 and the influence of this era where "data is cheap, but making sense of it is not." She said that in our obsession with data-sharing online, "we've forgotten to ask some of the hard critical questions about what all this data means and how we should be engaging with it.
"Privacy concerns are not new; people have been talking about privacy - or the lack thereof - forever," she noted. "So what's different now? The difference is big data... the kinds of data that marketers and researchers and business folks are currently salivating over... The opportunities to scrape data - or more politely, 'leverage APIs' - are also unprecedented. And folks are buzzing around wondering what they can do with all of the data they've got their hands on."
Pointing out that the Internet has created new opportunities for people to produce and share data, interact with and remix data, aggregate and organize data, she said, "Data is the digital air in which we breathe and countless efforts are being put into trying to make sense of all of the data swirling around. When we talk about privacy and publicity in a digital age, we can't avoid talking about data."
She said the WWW conference is the perfect place to talk about thinking critically about this issue and starting a conversation on methodology and ethics, because understanding the challenges will allow everyone to better frame how to address concerns over privacy and publicity.
The noted ethnographer explained that she maps culture. "Using social science logic," she said. "I want to discuss four things that all working with big data must understand:
1) Bigger Data are Not Always Better Data - "You need to know your dataset. Just because you’re seeing millions and millions of pieces of data doesn’t mean that your data is random or representative or anything. To make claims about your data, you need to know where the data comes from."
Boyd said she believes the number-one "destabilizer of privacy" today results from the obsession with big data. She added that biases, misinterpretations and the way they play out are affecting people's lives, noting "the Uncertainty Principle doesn't just apply to physics - the more you try to formalize and model social interactions, the more you disturb the balance of them. When you implement new features based on misinterpretations, you can hurt people."
She said the biggest methodological danger zone this presents is "just because data is't accessible doesn't mean that using it is ethical" and added that this is why methodological ethics matters.
Boyd said the idea of privacy is a "collective understanding of a social situation's boundaries and knowing how to operate within them, it's about having control over a situation, it's about understanding the audience and knowing how far information will flow. It's about trusting the people, the situation, the context."
She pointed out that "people seek privacy so they can make themselves vulnerable in order to gain something: personal support, knowledge, friendship and so forth."
She said she wanted to emphazise five key points:
1) Security Through Obscurity Is a Reasonable Strategy - "In mediated settings like Facebook, recording and amplifying are now default. The very act of interacting with these systems involves accounting for the role of technology. As people make sense of each new system, they interpret the situation and try to act appropriately. When the system changes, when the context changes, people must adjust. But each transition can have consequences. People's encounters with social systems rely on their interpretation of the context. And they've come to believe that, even when their data is recorded, they're relatively obscure, just like they're obscure when they're in the ocean. And generally, that's pretty true. Just because technology can record things doesn't mean that it brings attention to them. So people rely on being obscure, even when technology makes that really uncertain. You may think that they shouldn't rely on being obscure, but asking everyone to be paranoid about everyone else in the world is a very very very unhealthy thing. People need to understand the context and they need to have a sense of boundaries to cope. Even in public situations, people regularly go out of their way to ignore others, to give them privacy in a public setting. Sociologist Erving Goffman refers to this as 'civil inattention.'"
Boyd noted that "publicity twists it all" - saying, "All five of these issues present ethical questions for big data. Just because we can rupture obscurity, should we? Just because we can publicize content, should we? Just because we can leverage PII, should we? Just because we can aggregate and redistribute data, should we? The answers to these questions aren't clear.
"Social norms can and are changing, but that doesn't mean that privacy has been thrown out the door. People care deeply about privacy, care deeply about maintaining context. But they also care about publicity, or the right to walk out in public and be seen. Technology has provided new opportunities for people to actively seek to distribute their content. They can and should have a right to leverage technology to demand a presence in public. And technology that helps them scale is beneficial. The problem is that it's hard to differentiate between publicly accessible data that is meant to be widely distributed and that which is meant to simply be accessible. It's hard to distinguish between the content that people want to share to be aggregated for their own gains and that which is never meant for any such thing. It's hard to distinguish between PII that is shared for social purposes and that which is shared as a self-branding exercise.
"This goes back to our methodological conundrum with big data. Not all data are created equal and it's really hard to make reasonable interpretations from 30,000 feet without understanding the context in which content is produced and shared. Treating data as arbitrary bytes is bound to get everyone into trouble. So we’re stuck with an ethical conundrum: do we err on the side of making sure that we care for those who are most likely to be hurt or do we accept the costs of exposing people?"
She talked in detail about Facebook's privacy-boundaries-pushing actions over the past several years. "Facebook has slowly dismantled the protective walls that made users trust Facebook. Going public is not inherently bad - there are plenty of websites out there where people are even more publicly accessible by default. But Facebook started out one way and is slowly changed, leaving users either clueless or confused or outright screwed. This is fundamentally how contexts get changed in ways that make people's lives really complicated. Facebook users are the proverbial boiling frog - they jumped in when the water was cold but the water has slowly been heating up and some users are getting cooked.
"...Healthy social interaction depends on effectively interpreting a social situation and knowing how to operate accordingly. This, along with an understanding of how information flows, is central to the process of privacy. When people cannot get a meaningful read on what's happening, people are likely to make numerous mistakes that are socially costly. Facebook does a great job of giving people lots of settings for adjusting content's visibility, but they do a terrible job of making them understandable. Even when they inform people that change is underway, they opt people in by default rather than doing the work of convincing people that a new feature might be valuable to them. The opt-out norm in Facebook - and on many other sites - is not in the better interest of people; it's in the better interest of companies."
She said Facebook could tell you all of the services that have accessed your data through their APIs and all of the accounts inspecting any item of content, noting that people would like to have this feature, "but it's not in the company's better interest," she said, because it is likely to stifle participation - people will recognize that they are being scrutinized and they may withdraw from sharing their information in public.
"It's easy to swing to extremes, preaching about the awesomeness of all of these new technologies or condemning them as evil," she said. "But we know that reality is much more complicated and that the pros and cons are intricately intertwined. Teasing out how to walk the tightrope of privacy and publicity is going to be a critical challenge of our era."
She directed her closing remarks at the crowd of Web engineers in her audience. "You have the technical and organizational chops to shape the future of code. What you choose to build and how you choose to engage with big data matters. What is possible is wide open, but so are the consequences of your decisions... Privacy will never be encoded in zeros and ones. It will always be a process that people are navigating. Your challenge is to develop systems and do analyses that balance the complex ways in which people are negotiating these systems. You are shaping the future. Build the future you want to inhabit."