WWW2010 keynote panel looks at movements to make data accessible
Brief description: The opening ceremony of WWW2010 included a plenary panel offering perspectives from both sides of the Atlantic on the move to make data more open and accessible on the Web (data.gov and data.gov.uk). The panel featured Sir Tim Berners-Lee, inventor of the World Wide Web; David Ferriero, archivist of the United States; James Hendler a semantic Web expert from Rensselaer Polytechnic Institute and moderator for the discussion; Paul Jones, UNC professor and director of iBiblio.org; Andrew McLaughlin, deputy U.S. chief technology officer in the Executive Office of the US President; and Nigel Shadbolt, director of the Web Science Trust and Web Foundation. (No photographs or video of Tim Berners-Lee were allowed to be distributed to this site.)
Details of the session
The headline angle for this panel came in the final two minutes, when moderator James Hendler asked each panelist to take 30 seconds or less to offer a final thought and Tim Berners-Lee offered his “Five-Star” plan for open data. He said public information should be awarded a star rating based on the following criteria: one star for making the information public; a second is awarded if the information is machine-readable; a third star if the data is offered in a non-proprietary format; a fourth is given if it is in Linked Data format; a fifth if it has actually been linked.
The Berners-Lee five-star system was mentioned again and again by panelists in later sessions throughout the day. The catchy idea stuck.
Berners-Lee also was the man who started this session out by explaining the value of governments not only making their data public but making it public in a way that allows it to be combined with other data in powerful information sets that bring new ideas to light. “Data is much more powerful when you match it up with something else,” he said. “Putting the data on a website isn’t hard work,” he added, noting that the information is already paid for. “It’s of huge value, not only for government but for industry.”
He said a second advantage of putting public data online is that transparency makes institutions more accountable, forces them to be more open and can generate more positive change. “There will be much stronger checks and balances and people can give feedback to government,” he said.
Andrew McLaughlin brought up a third point to add to Berners-Lee’s first two. “Open data helps you do your job better and more effectively,” he said. “In the US, we set up data.gov about a year ago, and a big focus has been to explain why raw data feeds in and of themselves are valuable. Nowthe problem is how to change the culture of government agencies around data. That’s a much larger problem. To make the data free from the get-go is much harder.
“Scarcity of distribution instantiated in the form of control over data is getting disrupted. Companies and governments resist that change.”
McLaughlin said creating a datastream in which agencies presumptively make data available without people having to request it first is a goal to work toward. “That is the focus for next year and beyond,” he noted.
Nigel Shadbolt observed that standards in design are vital to allowing data sets to be meshed, shared and compared, and he pointed out the advantages of implementing Linked Data and Semantic Web principles and methods.
Hendler asked David Ferriero, archivist of the United States, if the massive amounts of information coming available today, with more to come, is weighing heavy as an issue. “What keeps me up at night is more the culture of government; it is not a technological problem,” Ferriero answered. “It is a social problem.” He said that he and others are working to encourage government agencies to create an open-government plan that specifically spells out how people are going to make information-sharing transparent, collaborative and participatory.
Ferreiro noted that the new push for open documents online has “had tremendous transformational power,” saying that it has inspired people at the National Archive to address new ways to conduct the work of collecting all of the information generated by the government and making it available in perpetuity.
All of the panelists mentioned how important it is for the public to use the data provided by government in new ways, inventing new analytical applications, layering it to make new meaning and making mashups that teach us things we never knew about ourselves before we had such open access to information.
Paul Jones spoke several times throughout the session about local examples of open and closed government data. He said the Cedar Grove Institute for Sustainable Communities, based in Mebane, North Carolina, uses publicly available geographic information systems data to show where and when local governments have created exclusionary zones that deny certain communities sewer and water access. He noted that the local government in Mebane used anti-terror regulations to hide its data about water and sewer information. A positive example of web-based collective intelligence Jones cited was the city of Burlington, NC’s use of the Citizen Request Tracker, a GIS application for iPhone and iPod, allowing citizens to report non-emergency issues such as potholes.
There was a concentrated discussion of the flaws in data likely to be exposed when information flows online quickly and is exposed to much more scrutiny than ever before, especially in an environment where putting data online in a timely fashion is emphasized. The panelists said they value the contributions of the people in charge of gathering and disseminating statistical information and other government data and they expect that the new culture that emerges out of wider public use of such information will generally be supportive.
“We didn’t want people to learn the lesson that you put data out there and achieve the transparency we’ve been aiming for and everybody piles on you and makes your life hell and kind of blames you for all of the errors in the data where in fact there were literally tens of thousands or maybe hundreds of thousands of people reporting data in,” McLaughlin said.
“So the right attitude is to say, ‘Thank you for pointing out mistakes in the data,’ right? Like we want this to be a collective enterprise. We get the data, we put it up, you help us find the errors, we fix the errors, we all work on this together to achieve data accuracy. But for the culture-change project we can’t have this be about sacrificing quality and integrity and the authority of the data that comes out because, frankly, much of our economy hums because we can rely in some measure on the data that comes out. So this is a tension that has to be resolved… This is the turbulence that comes with the revolution. It is inevitable.”
He noted that as the process of governments making data public evolves measures have to be taken for quality assurance. “One thing we have been talking about is metadata that speaks to the quality and reliability of the data itself. We know how to do that.”
Berners-Lee added, “If you want perfect data, you have 10-year-old data.” He suggested that there should be an ethos in which people are given “kudos” and given credit for publishing useful data. He noted with satisfaction that The World Bank just announced its own open data initiative on April 20, 2010.