In September 2017, City staff discovered one of the taxi trips data sources appeared to be incomplete and paused the updates, with the last update being July 2017 trips (plus a small amount of spillover into August 2017, as occurs with each month's update). After investigating and confirming the issue, the City announced it and the pause in October 2017.
We did not expect that pause to continue into 2019 and this has been a high priority item for us, especially with a number of inquiries from users of the data. However, the problem turned out to be harder than expected and to reveal additional issues with the source data. We felt it was more important to be thorough than to be quick. We are happy to announce that we have been able to resume updates to the dataset.
The natural question is, "What changed?"
We determined that one or more of the underlying problems affected the entire dataset. Most months from January 2013 to July 2017 have been updated with corrected data and all months from August 2017 to April 2019 have been added to the dataset for the first time.
The net changes per month and year are shown in Table 1.
The previous version of the dataset is downloadable in Comma-Separated Value (CSV) format (46 GB file). Please note that we cannot promise this file will be permanently available so recommended downloading it now if you will want it at some point.
The overall effect is that while, as commonly believed, taxi trip volume in Chicago has fallen from about 2015 on, the rate of decrease has been smaller than previously thought. We use fundamentally the same data internally that we share on the Data Portal so this was a significant discovery for many reasons.
We now believe we have acquired all available data – within the usual margins of error inherent to collecting real-world data from millions of taxi trips across thousands of vehicles – with one notable exception.
Despite multiple attempts, we have been unable to get full data for the Flash taxi fleet for November 2014 to December 2015. The problem is technical in nature, not lack of willingness by any party or anything of that sort. We will continue to try to get the missing data and will add it to the dataset if we succeed at some point but, disappointingly to us as well, do not expect that to be the case.
Trips by month for Flash are shown in Table 2.
It may be useful to explain the meaning of "all available data" above. Our taxi trip data come from the major taxi industry payment processing vendors. They have used a variety of hardware and software systems over time. Despite best efforts, some data that once existed may no longer be available years later when someone attempts to extract it. That is a general limitation where one may not even know what is missing. By contrast, for Flash in that 13-month period, there is a tightly defined set of trip records that appear to exist but be irretrievable, or even countable.
For many use cases, this may be a distinction without a difference. The summary message is that we are publishing what we have but, as with almost all data, there are limitations – some known and some unknown. However, this situation is different enough that we wanted to address it explicitly in case the information may be helpful to some users.
As mentioned, our main reason for keeping the dataset on pause for so long was a desire to get it right. However, collecting and processing this sort of data is complex and further issues certainly are always possible. What we can promise our users, though, is transparency. If we should discover future credible indications of material errors in the data, we will announce them through the same channels used this time or similar ones.
Please direct any questions about the Taxi Trips dataset or the Data Portal in general to firstname.lastname@example.org or @ChicagoCDO. Please direct any subject matter questions about City of Chicago taxi operations to BACPPV@cityofchicago.org.
Table 1 – Net Changes per Month and Year
|Month||Previous Trip Count||Updated Trip Count||Trips Added||Trips Added %||Year or YTD Trips Added||Year or YTD Trips Added %|
Table 2 – Flash Trip Count by Month
|Month||Flash Trip Count||Trailing 13 Month Flash Trip Count|
|11/2014||71,139 (Partially Missing)|