July 16, 2015 / by Open Data Portal Team / In Open Data , Data Portal

What is a “deprecated” dataset?

The most basic answer is that it is not the dataset you want if you are looking for current data.

When possible, we try to keep historical records in a dataset with dates or other indicators that allow for filtering by time. However, in some cases – notably maps – the dataset is inherently a snapshot in time and mixing historical and new data is impossible or potentially confusing. For minor changes in the data, we usually simply overwrite the old data but for less-frequent / more-substantial changes, we often first create a copy of the old data and mark it “deprecated” in at least two ways – in the dataset title and by applying the “deprecated” topic/tag.

The deprecated dataset will be a newly created dataset so that the existing URL will continue to point to the current data.  As a result, the Created and Updated dates of the dataset can be misleading. The best indicators of the time period covered by a deprecated dataset will be the date we put in the title and the Time Period shown under “About this Dataset.”  Note that if there was a gap between when the data became outdated in a real-world sense and when we deprecated the dataset, the date in the title may refer to either.  Our deprecation process has evolved as we have gained experience and the meaning of the date should become clearer.

Separately, there is a feature built into the Data Portal that can also be used, in some cases, to see historical versions of data. Every dataset has a “More Views” button, normally used to see charts, maps, filtered views, and other presentations of the data.  However, for tabular datasets, there is also a section labeled “Dataset Snapshots.”  It often shows previous versions of the dataset.  It is important to note that the date on a snapshot is when that version of the data was replaced, not when it was created.  As a general rule, the date indicates when the next-newer snapshot (or the current version, in the case of the newest snapshot) was created. Because of technical changes in how datasets are updated, many changes no longer create snapshots and we recommend against relying on them but they can be useful where they do exist.