We added to the Transportation Network Provider datasets with data reported by the companies covering January to March 2019.Read more
The City of Chicago prioritizes personal privacy in developing datasets for publication. Protecting individual privacy is a guiding tenet applied during the preparation of a dataset for public release. Specific to the Taxi and Transportation Network Provider (TNP or “ride-share”) Trips datasets, a deidentification and aggregation technique was developed and applied to reduce the risk of reidentification while allowing for beneficial public use of the data.
Taxi and TNP companies do not report identifying passenger characteristics directly to the City of Chicago. The City of Chicago does not request nor obtain a passenger’s name, date of birth, zip code, phone number, gender identification, or any other attribute related to the individual except for the location and time of both the trip start and trip end.
It has been recognized in scientific literature and news reports that even data without directly identifying attributes can be reidentified using other data sources. Specifically, data about an individual’s location at certain points in time can create a “fingerprint” that can allow for reidentification, as long as there is a separate dataset available containing parts of the fingerprint along with identifying fields. While our mission to provide data transparency is essential, protecting individual passenger privacy is also extremely important. Therefore, the Taxi and TNP Trips datasets have been aggregated in a way that protects passenger personal privacy by avoiding reidentification, explained below.
- Aggregation by time: all trips are rounded to the nearest 15-minute interval.
- Aggregation by geographical space: latitude and longitude points are not provided; the census tract in which each trip started and ended is provided.
- Chicago is split into approximately 800 census tracts, ranging in size from about 89,000 square feet to eight square miles.
- As a result, for each row of the dataset, it is impossible to know the precise time and place the trip occurred beyond a 15-minute window and an 89,000 square foot area.
- The precise location and time of a trip cannot be determined.
- Wider-ranging aggregation by geographical space: As the dataset does provide the approximate location of a trip, another layer of protection was added to avoid linking individuals’ trip location data to their identities.
- If the above method resulted in any aggregation having two or fewer unique trips in the same census tract and 15-minute time window, the geographical space published was widened to the Community Area level for both ends of that trip.
- Even if one acquires separate data about a trip location and trip time along with identifying information about a passenger/rider, the presence of at least three matching trips would inhibit isolating a specific trip’s census tracts in this dataset.
- As a result of this protection, approximately a third of census tracts that would otherwise be shown in the initial dataset are blank. (Others are blank because of missing data or falling outside Chicago.) By removing the census tract from these particular records, we limit the location information that could be reidentified by providing only the Community Area in which the trip started and ended. On average, a Community Area covers 3 square miles of the City.
Our Employee Indebtedness to the City of Chicago dataset was not updating properly from 12/15/2018 through 3/2/2019. The update job runs weekly on Saturdays. We discovered the problem today and believe it is fixed for future runs.Read more
Our ordinance violations dataset is not updating properly and is showing no records modified more recently than 2/22/2019. We are looking into the problem and trying to fix it.Read more
The following datasets were not updated yesterday (2/5/2019) or today. We have identified the common point of failure and are working to fix it. There is no current estimate. We will edit this post with any significant updates.Read more
Due to an issue with the source data over the weekend, the Chicago Traffic Tracker - Congestion Estimates by Segments and Chicago Traffic Tracker - Congestion Estimates by Regions datasets did not update between 1/11/2019 10:50 pm and 1/14/2019 12:10 pm (today).Read more