We have modified the City-Owned Land Inventory dataset to remove records with a status of “In Acquisition” from the dataset. These properties are not yet owned by the City of Chicago and were included prematurely.Read more
The Relocated Vehicles dataset is not updating. We are investigating the cause and a potential solution. In the meantime, we will do occasional manual updates but cannot maintain anything close to the normal 15-minute, automated update frequency.Read more
As discussed in previous posts, we apply specific rules to our Taxi and Transportation Network Provider (TNP) trip data before publication to the Data Portal in order to protect privacy. One of these rules is that for any combination of a 15-minute time period and a Census Tract that has fewer than three trips, we remove the census tract before publication and show only the community area.Read more
We have made available a new source of open data, which delivers real-time locations of scooters available for rent in Chicago. More information on Chicago’s scooter pilot is available at chicago.gov/scooters.
The data are accessible by API, and each of the 10 scooter companies hosts its own API feed as a requirement of participation. Details on the data specification can be found here. Updated URLs for GBFS feeds are available on GitHub and are also listed below.Read more
Earlier this week, Divvy performed a system upgrade. As part of this upgrade, its real-time station status feed switched from the previous format to a new format, based on the General Bikeshare Feed Specification (GBFS).Read more
We will be making extensive changes to the Building Permits dataset this week. These changes will provide additional and higher-quality data but may have the side effect of requiring corresponding changes by users, especially for any processes that ingest the data in an automated way.Read more
We added to the Transportation Network Provider datasets with data reported by the companies covering January to March 2019.Read more
The City of Chicago prioritizes personal privacy in developing datasets for publication. Protecting individual privacy is a guiding tenet applied during the preparation of a dataset for public release. Specific to the Taxi and Transportation Network Provider (TNP or “ride-share”) Trips datasets, a deidentification and aggregation technique was developed and applied to reduce the risk of reidentification while allowing for beneficial public use of the data.
Taxi and TNP companies do not report identifying passenger characteristics directly to the City of Chicago. The City of Chicago does not request nor obtain a passenger’s name, date of birth, zip code, phone number, gender identification, or any other attribute related to the individual except for the location and time of both the trip start and trip end.
It has been recognized in scientific literature and news reports that even data without directly identifying attributes can be reidentified using other data sources. Specifically, data about an individual’s location at certain points in time can create a “fingerprint” that can allow for reidentification, as long as there is a separate dataset available containing parts of the fingerprint along with identifying fields. While our mission to provide data transparency is essential, protecting individual passenger privacy is also extremely important. Therefore, the Taxi and TNP Trips datasets have been aggregated in a way that protects passenger personal privacy by avoiding reidentification, explained below.
- Aggregation by time: all trips are rounded to the nearest 15-minute interval.
- Aggregation by geographical space: latitude and longitude points are not provided; the census tract in which each trip started and ended is provided.
- Chicago is split into approximately 800 census tracts, ranging in size from about 89,000 square feet to eight square miles.
- As a result, for each row of the dataset, it is impossible to know the precise time and place the trip occurred beyond a 15-minute window and an 89,000 square foot area.
- The precise location and time of a trip cannot be determined.
- Wider-ranging aggregation by geographical space: As the dataset does provide the approximate location of a trip, another layer of protection was added to avoid linking individuals’ trip location data to their identities.
- If the above method resulted in any aggregation having two or fewer unique trips in the same census tract and 15-minute time window, the geographical space published was widened to the Community Area level for both ends of that trip.
- Even if one acquires separate data about a trip location and trip time along with identifying information about a passenger/rider, the presence of at least three matching trips would inhibit isolating a specific trip’s census tracts in this dataset.
- As a result of this protection, approximately a third of census tracts that would otherwise be shown in the initial dataset are blank. (Others are blank because of missing data or falling outside Chicago.) By removing the census tract from these particular records, we limit the location information that could be reidentified by providing only the Community Area in which the trip started and ended. On average, a Community Area covers 3 square miles of the City.