The complete dataset is available:

The format of these exports is explained here

Exports

You can also export the data from the map, which gives you much more filters options (per category, dataset, substance, etc). However, this excludes the points for which we have only the city and not the precise coordinates, as they are not shown on the map.

The map export also allows to have a single column per substance. This is useful if you don’t want to have to deal with the pfas_values in a json array field. Be aware that this means that there will be one column per substance, so in some cases, dozens or even hundreds of columns.

You can also export the data by dataset from the datasets page.

Data api

A minimalistic API was developed for the map to work. If it is more convenient, feel free to use it to export our data. Be careful that the API can change without notice, if you plan on using it regularly please let us know. It has an automated doc page that allows to try the API directly: https://pdh.cnrs.fr/api/docs

Example notebook for data analysis

Please find here a Google Collab notebook with different examples of data filtering and data transformations. You can of course download it to run it somewhere else than Google Collab if you prefer.

Clustering the data

A recurring question is how to cluster our data to be able to count polluted sites. Counting the different coordinates will result is an overestimate, especially in zones have been tested extensively like industrial zones in the Netherlands or in Belgium. In this notebook I show an example of clustering using K-MEANS method, clustering the data with a 200m minimal distances between 2 clusters.

If you want to use this clustering method, you should try to change the parameters to see how it affects the results. Let me know if you have any questions.