The Challenges of Open Data and Privacy Issues
Barbara J. Parker is city attorney for Oakland and can be reached at BParker@oaklandcityattorney.org. Kiran Jain is the former senior deputy city attorney for Oakland and can be reached at email@example.com.
In the normal course of business and operation throughout the nation, government agencies collect vast amounts of data about residents and businesses. For many years the City of Oakland and other cities have struggled with providing their residents — on whose behalf a city provides its services and programs — with easy access to information and data about city activities. The recent advent of extraordinary technological advances has made it possible to provide easy access to city information and data. It has also presented an unprecedented opportunity to improve services and provide this information in formats that enable city residents and businesses to create or obtain programs that can manipulate and readily interpret and evaluate data made available or disclosed upon request.
Cities are at a crossroads. As public stewards of information in this “brave new world,” we must balance the public’s interest in and right to public information against our residents’ privacy rights and interests. We must simultaneously protect the privacy rights of residents while providing open and transparent access to government information.
The Oakland City Attorney’s Office hosted a roundtable in November 2014 to begin a discussion about privacy rights in the digital age and how they intersect and potentially conflict with government agencies’ collection and use of data. Ultimately the city’s goals are to:
- Develop a comprehensive, citywide policy that balances government transparency and personal privacy rights; and
- Coordinate this policy with related regionwide, statewide and nationwide policies and standards.
The City Attorney’s Office partnered with the nonprofit Startup Policy Lab1 to convene the roundtable and moderate the discussion. Experts from nonprofits, government agencies, think tanks and other organizations participated in the thought-provoking discussion of these issues at City Hall. The discussion focused on Oakland’s increasing use of data to drive decision-making and how the city’s use and management of such data may raise privacy concerns. As the first in an anticipated series of discussions, Oakland provides a real-world model to identify critical issues and then leverage those insights as a model for other cities, culminating in open data policy and standards that will guide the city as well as potentially other cities and government agencies.
The City Attorney’s Office efforts in this area support and are being coordinated with Oakland’s October 2013 Open Data Policy, sponsored by then-Council Member and now Mayor Libby Schaaf. That policy declares the city’s resolve to enhance access to and ease of use of city-collected data by providing raw data in machine-readable formats, using open data standards, that maximize the free sharing of certain data without typical controls like copyright and licensing schemes.2 With data easily accessible at data.oaklandnet.com, the local community is able to develop software applications and tools to collect, organize and share city data in ways that benefit both residents and the city.3 But as noted earlier, not all city-collected data is or should be accessible. In light of protections for data privacy under federal and state law, Oakland seeks to respect its position as the steward — and not the owner — of such data.
Types of Data
A city might have access to three categories of data: infrastructural, public service and personal data.4 The first category, infrastructural data, is generally noncontroversial; it includes information “about the state of the world,” such as data on weather measurements and transportation networks.5 Public service data concerns the activities of government, such as a city’s performance statistics and budgets. While this data typically does not implicate privacy concerns, it can when the data is linked to individual users of public services.6 The last type of data is about individuals — or “personally identifiable information” (PII) — such as health data, and it is generally the type of data most strongly protected, either by private industry practice or by local, state and/or federal law, as applicable.7
Personally Identifiable Information
PII can be classified into four types of data, each of which might require a different tactic to “scrub” private information out of any given data set.8 These are:
- Unintentional PII — data compiled in government records that was not supposed to be included, such as Social Security numbers in census data;
- Unnecessary PII — data that is useful for government but is not essential to the utility of the data. Its inclusion, however, prevents the data from being shared. One solution is structuring the data so that the PII can be severed from the useful data, perhaps by creating separate data fields for PII;
- Necessary PII — data in which PII is necessary for the data’s best use. One solution here is to use trusted, skilled data intermediaries who can produce meaningful aggregates for the public without releasing PII; and
- Legally identified data — examples include criminal records, campaign finance reports and public employer data.
The Mosaic Effect
While not a type of data, the mosaic effect describes a phenomenon in which non-PII data can be combined with other available information in such a way as to pose a risk of identifying an individual.9 The U.S. government, in combatting this effect, requires agencies to consider other publicly available data to determine if some combination of data could allow an individual to be identified.10
The classification of data is the first step in creating meaningful and responsible data sets for public use. Despite its strong connection with technology, data privacy is both a legal and a policy issue. Without proper stewardship of city-collected data, controversies may erupt as data in or extracted from cumbersome public records becomes more accessible and linkable in unexpected ways. For example, a register of gun ownership in Westchester and Rockland counties in New York was transformed by a local newspaper into a data set, mapped and then published, creating a strong backlash.11 The publication triggered a number of concerns, which included the possibility that the map would identify potential targets for burglars.12
For the foregoing reasons, cities should endeavor to create privacy policies that not only comply with federal, state and local law but also balance the justifiable demand for open data with the entities’ responsibilities as public stewards. These privacy policies should take into account the issues of scope, best use of data, policy versus technology, third-party interests, adaptability and providing notice.
Scope is the level of detail a given data set might have, and that level may set the parameters of uses to which the data can be applied. For example, crime data can be generated for a general area in a city, or it can be released with much more specific levels of detail, down to street address information. A Neighborhood Watch group might have different concerns than an insurer.
The best use of data can be a concern that has public policy ramifications, especially when the mosaic effect is involved. Users of the data provided may seek more and more data, and the interconnectedness of all available data may collectively chip away at otherwise protected PII. For example, profit-seeking enterprises might be able to sell information obtained from building permit applications to construction supply businesses that in turn target the applicants with specific advertisements, creating what may be considered to be government-aided targeted advertising.
Policy versus technology raises the question: To what extent should privacy concerns be handled at a technological level as opposed to a policy level?13 For example, ShotSpotter — a technology deployed by local police departments that triggers recording of sound when a gunshot occurs — has the potential to inadvertently record private conversations.
Third-party interests might be considered in a manner similar to best use as described here, both when the city releases data and when private parties access the data. With regard to the former, interested parties who learn that mug shots are readily available, for example, might argue that such data should be made available for their use even though such information could lead to false conclusions. Regarding the latter, Oakland’s RecordTrac, which identifies the requestor of a public record by name, has garnered complaints by journalists that public access to information about their requests might allow other journalists to find out what they are researching, allowing others to scoop their stories.
Providing notice. As Oakland produces data that it receives or collects in the course of its municipal duties, its city officials and staff are committed to ensuring that its residents are aware of the city-collected data that might be included in a data set. This is an important responsibility, and one that may be challenging to fulfill. Disclosures, caveats, online terms of service and conditions of use are ubiquitous, often lengthy and written in “legalese” that at best requires careful study and at worst may be incomprehensible to the average reader. Many people breeze past this information, simply clicking “I agree.” But nevertheless it should be a goal that local officials strive to meet, namely to make all residents fully aware of the implications of the government’s access to and potential disclosure of data.
Notice may be communicated to a party in a variety of ways, especially when information is collected online. Examples include click-through and browse wrap notice statements (such as hyperlinks at the bottom of a web page). Other possibilities include notice for each particular data entry field, perhaps via a hyperlink, or after some time and acclimation a series of symbols that quickly keys users into how the data might be used (or is protected). The federal Chief Information Officers Council gives some guidance on the elements of a digital privacy notice in its Recommendations for Standardized Implementation of Digital Privacy Controls (Table 5).14
In 2015 we stand on the brink of an exciting and dynamic environment in which citizens will be able to perform the lion’s share of their work and other activities online. Everything from paying parking tickets to dissecting a city’s budget is at our fingertips. As stewards of the public’s information, it is our duty and responsibility to continue to ask questions about standards and ethical issues as well as privacy rights, so that we can provide transparency and promote accountability and also protect the privacy interests of our citizens. We have taken the first steps to accomplish these goals in Oakland, and we are committed to remaining vigilant and thoughtful in this process.
The National League of Cities’ newly released report, City Open Data Policies: Learning by Doing, examines what cities are currently doing with open data and what they could be doing far into the future. This publication is a resource for cities developing open data policies.
 Startup Policy Lab (www.startuppolicylab.org) connects the startup community to policy-makers through events and research.
 Oakland City Council Resolution No. 84659.
 Open Data Research Network, http://www.opendataresearch.org/content/2013/501open-data-privacy-discussion-notes
 Roundtable notes, p.3.
 Id., p. 5.
 OMB-13-13, p. 4.
 Open Data Research Network, supra note iii.
 See roundtable notes, p. 9.
 Federal Chief Information Officers Council, https://cio.gov/wp-content/uploads/downloads/2012/12/Standardized_Digital_Privacy_Controls.pdf.
About Legal Notes
This column is provided as general information and not as legal advice. The law is constantly evolving, and attorneys can and do disagree about what the law requires. Local agencies interested in determining how the law applies in a particular situation should consult their local agency attorneys.