I work with data. I love it. I love the puzzles, I lose myself in the tedium of data-cleaning, and I love being able to uncover something that was not known before. My work is a bit abstract; I help homeless people, but I am not in the street passing out blankets. I am analyzing information for strategic planning purposes and to show funders that our community’s efforts are worthy of their dollars. I spend a lot of time thinking about the potential of data to do good or reform society. I come to the same conclusions: data can help society when it is a tool of transparency. You cannot fix what you do not know exists. You cannot disprove a myth without evidence. Data is a tool. Transparency has many layers, and it is not enough for the data to exist if the process of analyzing it is opaque. Open data is important, and open methods are equally important for any open data pursuit to have a democratizing effect.
I think a lot about democracy, because I care about that as much as I care about creating a less-brutal capitalism. A functioning democracy is a leveling tool in a context of inequitable social strata. The elite have other currencies of power: money, connections, and system-savvy, ie they know how to navigate various bureaucracies to their benefit. Votes are nearly universal (Caveat 1: if you’re not a member of a disenfranchised group. Caveat 2: ignoring the impact of lobbying: another rant for another day, in the meantime, check out LittleSis.org). If society is going to be making decisions on behalf of itself, its members need to clearly understand circumstances. I am frustrated by what I perceive to be opacity in my city’s and state’s government. I am also frustrated by positions advanced by special interest organizations without commitment to veracity or acknowledgement of nuance. I am very cynical – when a government official says that something is true, I really want to see the numbers behind the claims. What informed your projection that some project will save so much money, or create so many jobs? Why don’t political speeches have methodological appendices? Could you imagine a world where politicians and advocates had to back up everything they said with all the information that led them to their conclusion, as scientists do?
What if it was possible for you to go digging yourself? What if there was a problem that you were particularly passionate about? What if you wanted to learn more? The standard operating procedure right now is to consult various secondary sources: internet sites, books, or people involved in the field. In addition to learning about something you are interested in, there is also a socialization process in which you are taught to think about something in the ways that others have thought about it. You learn the vocabulary, the language, the best-practice (or at least heavily-adopted) strategies for accomplishing something. This is not a bad thing necessarily – best practices often are called such because they are the most effective. However, though in the course of learning a new thing, you are being taught to think about it in an old way. This socialization process can be an innovation killer. For instance, it seems to me that in Bufffalo, part of the socialization process is being taught that weary resignation is the most proper response to Buffalo’s dysfunction. It is as though being introduced to Buffalo politics is a slow process of being taught that expecting transparency is not worth your time. This is a problem. I suspect this contributes to why some of Buffalo’s issues have proven intractable (schools, grant-spending, etc).
What I am describing is the difference between learning about the woods on your own, through walking and observing, and reading a book first. You may structure your understanding of the woods differently than botanists do. Botanists have a great reason for structuring the knowledge of the woods in the taxonomies that they do. Perhaps your goal of understanding is something else. And I guess that is my idealistic wonder for open data. If people can find out on their own, would we get different ways of thinking about things? Would we get a broader skill set and perspective set infused into trying to solve our social problems? The data has to be available first. You can’t walk through the woods if there is a fence around it.
Open data, as a political movement, argues that data should be available, largely as an anti-corruption tool. For a primer on Open Data, see the Sunlight Foundation. Open Data as an aggregation of information tends to refer to data that scientists collect and the data that the government uses. The former is frequently collected as a result of public funding (for instance, a National Science Foundation grant) and the latter exists to guide decisions being made on behalf of the public. All of my previous residences’ municipalities post their real-estate records online, for instance, though only Seattle posted their code-enforcement efforts. Some government-collected data is restricted because it contains personally identifiable information that would leave clients at risk of identity theft. I work with one of those databases, though we happily provide aggregate data if one asks.
I see incorporating open data into basic government practices as a small step towards changing citizens’ sense of civic entitlement. If it is available, and becomes available for a long time, then it would be seen as normal and proper for anyone to know what the government does. Perhaps open data would be a way to let the light in, crack by crack.
A functioning democracy requires education. It is not enough to know what is going on, you have to understand it. There are nuances to this; above I mention that knowledge comes paired with a socialization process. With that said, best-practices are often such because they are effective. This is true of data. How you clean data is going to completely change your output. How you analyze data is going to inform your conclusions. It’s not enough for data to be open, but it is also important that the methods of analysis are just as transparent.
The presentation of data needs to be clear if most people are going to understand it. Otherwise open data becomes a creator of new elite class, where transparency is only available to those with a statistics or programming background. Some data presentation styles are easy to understand, like maps. Statistics are not as accessible. How many people can read a multiple regression chart and understand what it means? I work in a field where many people claim to be “bad at math”. I have heard many say they freeze whenever they see numbers. The high school graduation rate in the city of Buffalo was only 56% last year, and they celebrated it because it was an 8 point rise. Open data will be useful if the skills of interpretation are not uncommon. I would argue that explaining what a chart means, or explaining why a certain analytic tactic was used, is as important as the information being available.
The skills of creation are even more infrequent. It is like everything else: we live in houses but not everyone has the skills and knowledge to build or fix them. I will never forget the time I made my first GIS map. It was part of a University of Washington graduate statistics workshop series. I sent it to a good friend who uses GIS professionally with a cheeky, “Hey, look what I did” caption. He replied with an itemized list of the basic ways I violated map-making best practices. I forgot to include a legend – what WAS this map that I sent him? Lest you think this is my mistake alone, I have now seen three open data maps passed around twitter which lacked legends and any explanation of what they were beyond a title; sometimes they lacked even that. How was the data collected? How was it cleaned? Why should we believe it is accurate? Just because it exists? I bring up that last point because most people will uncritically believe what they read on the internet if it conforms to how they otherwise see the world. Open data could become another tool of obfuscation if it is not transparent.
I mentioned data-cleaning. If your data isn’t cleaned properly, the results will not be accurate. I had to go to graduate school to learn that skill. High schools do not teach it. Certain disciplines of undergraduate education will, so here is another skill restricted to the social stratification of privilege. (Could you teach yourself? Sure. It’s possible. Would you know to teach yourself it? Maybe.) You can have data that is open, but if the skills to analyze and interpret it are uncommon, it’s another tool of the elite.
Open data efforts should also be subject to some background research, though I concede that is very difficult given the norms of paywalls on scientific essays. Perhaps not necessarily as a first step- that would undo the “walk in the woods” benefit of open data efforts. It should be done at some point, because your data may not be as useful as it seems. For instance, awhile ago I saw a map shared on Buffalo Rising that showed the East Side of Buffalo had less crime than other parts. Oh, really? This map used police report from crimereports.com as a proxy for crime. It was created by Buffalo Open Data. The way the commentor shared the map leads me to believe he’s quite confident in its veracity. It is possible crime rates are lower on the East Side. However, it appeared that he was probably unaware of the body of sociological research which finds that folks in the poor parts of the inner city frequently don’t call the police when they experience crime. The police are not seen as allies and are perceived as doing more harm than good. It is not to say that the police are never called, but they are not as frequently relied upon as they would be in better-off neighborhoods. The people who are most comfortable with police tend to be those who are members of privileged social strata, like the white or wealthy, and those folks don’t tend to live on the East Side of Buffalo.
I would be a jerk to expect every layperson to be up-and-up on sociology (though, I bet if the commentor had been a person of color, he might have known that because, well, racism). It is totally a fair critique of my argument to say that the best practice, a survey of citizens, is pretty far out of reach for someone not being paid. (That’s how statistics for under-reported crimes, like rape, are created.) It is not that the map is terrible, it is that it requires some sizable caveats which neither the commentor, nor anyone else from Buffalo Open Data, has ever shared. And it is to the credit of the commentor that he shared the methods of the map creation with enough specificity that I could
rip it apart use it as an example of potential limits for open data efforts.
Sharing the procedure is so important. Open Data efforts, if to have reformatory potential, absolutely have to be Open Data Methods efforts too. It is not just what you did, but how you did it. As teaching a man to fish is more helpful than giving him one, teaching the process of public data analysis is an integral part of transparency. It doesn’t help if it is written in R code or if it is completely disconnected from the publication of the result. This is a basic requirement of scientific papers for a reason: the findings have to earn your trust. I confess that I am advocating for a skill that I struggle with. Case in point: my analysis of length of first stay for people who experience homelessness once versus those who have multiple episodes. I tried to write it so a PhD was not required to understand it. I am not sure that I succeeded. It is hard. It is time-consuming. I recognize that this may be a burden for a those who do this as a hobby and not a paid profession.
In conclusion, for Open Data to be a reformation tool towards effective democracy, it needs to be 1) available, 2) accessible, and 3) understood. Without these, it will never achieve its potential to reform civic entities. It would become another tool of the elite.