-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.Rmd
141 lines (77 loc) · 11.5 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: "Population-weighted centres of local administrative units (LAU) in Europe"
description: |
If you need the coordinates of a location, where do you put the centre?
author:
- name: Giorgio Comai
url: https://giorgiocomai.eu
affiliation: OBCT/EDJNet
affiliation_url: https://www.europeandatajournalism.eu/
site: distill::distill_website
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
__N.B. This repository includes datasets pre-processed to facilitate common operations and full correspondence with commonly used sources. Full correspondence is possible only through some compromise solutions: ensure that the data provided are fit for purpose for your specific use case. Please open [an issue](https://github.com/EDJNet/lau_centres/issues) if you find problems with the data or have suggestions for improvement.__
This repository presents datasets of the population-weighted centres of local administrative units in Europe, the scripts used to generate them with different approaches, as well as details on quality checks and peculiarities of the data.
It also tries to facilitate matching a given local administrative unit to their [correspondent NUTS region](https://ec.europa.eu/eurostat/web/nuts/local-administrative-units).
## Motivation
- when showing data points of a local adminsitrative unit (or other territorial boundaries) we often need a specific point on a map to show relevant information. Where should that point fall?
- when taking values from a data grid it is often useful to have a specific set of coordinates: which should that be for local administrative units?
- the LAU/NUTS concordance table distributed by Gisco have inconsistencies, making it a pain to adequately combine data (somewhat puzzingly, each year something else is missing or does not quite match)
In reference to population-weighted centres, check out the following blog post for context:
[How to find the population-weighted centre of local administrative units](https://medium.com/european-data-journalism-network/how-to-find-the-population-weighted-centre-of-local-administrative-units-a0d198fc91f7)
The ~30 seconds you see [starting with minute 3 of this video](https://youtu.be/cAzMzIepOC8?t=189) show visually how this works overlaying the population grid to Google Earth imagery.
## Preliminary notes on terminology and data
- LAU stands for "local administrative unit" and usually coincides with municipalities
- NUTS stands for "nomenclature of territorial units for statistics"; there are NUTS of different levels, including NUTS 3 (mostly province/district/county or equivalent), NUTS 2 (larger regions), NUTS 1 (macro-regions).
See the [relevant Wikipedia page for details](https://en.wikipedia.org/wiki/Nomenclature_of_Territorial_Units_for_Statistics).
Eurostat releases every year an updated dataset with LAUs, including basic statistics about each of them as well as the spatial boundaries in formats that can be read by a number of software packages. The same is true for NUTS, with the exception that they are not updated yearly: recent years of release of NUTS regions include 2021, 2016, and 2013.
Besides, the coverage of these datasets is not the same across all years, as some countries were not part of the NUTS system earlier on.
This discrepancy in release dates means that, as LAUs are merged or change their boundaries, sometimes across different NUTS regions, there may not be perfect concordance between, for example, LAU for 2019 and NUTS for 2016. Eurostat does distribute concordance table, but these are not perfectly consistent... some LAUs are missing for some countries, some countries are missing, etc. (see this pages to see [examples of how they do not match](lau_nuts_botched.html))).
This forces users of these data to make a judgement call: should they use old data? should they manually find where the LAUs missing in the concordance tables are located? Should they mix and match?
This repository outlines some possible solutions and provides consistently pre-processed data. The user should still be mindful of the structural imperfection of the results.
## Steps and intermediate datasets
Since fully consistent concordance tables between LAU and NUTS published by the EU are not available for most years, a first step in the data processing involves situating each LAU in a NUTS region. This is achieved by calculating the spatial overlap between the relevant geometries.
This process results in three sets of datasets:
- one with the overlap of each LAU in a given year, with the NUTS of a given year (this can be useful, for example, for quality checks or for caching data when processing in bulk) - it is available in the [`lau_nuts_area` folder in this repository](https://github.com/EDJNet/lau_centres/tree/main/lau_nuts_area) for multiple combinations of LAU and NUTS based on data published between 2016 and 2021
- one with full concordance between LAU and NUTS, where each LAU is attributed to the LAU where it has most of its surface - it is available in the [`lau_nuts_concordance_by_geo` folder in this repository](https://github.com/EDJNet/lau_centres/tree/main/lau_nuts_concordance_by_geo) (N.B. the resulting dataset may not be fully consistent with official statistics as the possibility that a tiny number of LAUs may be misattributed to a neighbouring NUTS cannot be excluded)
- a third and probably most useful dataset based on official concordance table, falling back to the above solution when concordance tables are incomplete, is finally distributed. It includes a column clarifying if the matching for the given LAU was based on official concordance tables or based on geo-datasets. It is available in the [`lau_nuts_concordance_combo` folder in this repository](https://github.com/EDJNet/lau_centres/tree/main/lau_nuts_concordance_combo)
It is finally possible to get back to the original objective of this endeavour: having a ready-made dataset with the population-weighted centres of all LAUs accross Europe. As detailed [in the page explaining how population-weighted centres are calculated](how_is_this_computed.html), there are various elements determining the final coordinates. At this stage, datasets with the following characteristics are available:
- LAU for all years between 2018 and 2020 based on NUTS for 2016
- LAU for 2020 and NUTS 2021
- all of the above are calculated based on the most recent population grid at the time of writing (with 2018 population data)
- datasets available in separate files by country as well as for all of the EU in a single file
- all datasets are based on "adjusted intersection", i.e. only a share of residents of the cells crossing the relevant administrative boundary are included, proportional to the part of the cell that falls into the given administrative unit. This should be better than the alternative to include all residents for cells that just intersect the boundary under most circumstances, perhaps with the partial exception of coastal towns.
- in order to "push" the centre towards more densely populated areas, the population-weighted centre is calculated by raising to the power of two the number of residents in each cell. This seems a good solutions for most use cases; the same with other parameters can easily be calculated with the scripts included in this repository.
All datasets are available in the [`lau_centres` folder in this repository](https://github.com/EDJNet/lau_centres/tree/main/lau_centres)
## "I just need a dataset with LAU centres, what should I download?"
The most recent, consistent, and complete datasets are probably the following:
- dataset based on LAU for 2020, NUTS for 2016, population-grid for 2018, available [following this link](https://github.com/EDJNet/lau_centres/tree/main/lau_centres/lau_2020_nuts_2016_pop_2018_p_2_adjusted_intersection.csv).
- dataset based on LAU for 2020, NUTS for 2021, population-grid for 2018, available [following this link](https://github.com/EDJNet/lau_centres/tree/main/lau_centres/lau_2020_nuts_2021_pop_2018_p_2_adjusted_intersection.csv).
## Future and forthcoming
This dataset may be updated to include datasets based on more recent data, to add coverage for other European countries not included in Eurostat datasets, or to recalculate the population-weighted centres based on high-resolution population grids.
## Data availability
This repository covers local administrative units as distributed by Europen Union's [Gisco services](https://gisco-services.ec.europa.eu/distribution/v2/lau/download/), and tentatively in nieghbouring jurisdictions.
The main determinants of data availability are:
- inclusion of relevant country in the [LAU datasets](https://gisco-services.ec.europa.eu/distribution/v2/lau/download/) for the relevant year
- coverage by the EU population grid [as distributed by the EU's Gisco services](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/population-distribution-demography/geostat)
Additional sources for the population grid include [High Resolution Population Density Maps distributed by Facebook](https://data.humdata.org/organization/facebook).
[GADM for administrative boundaries](https://gadm.org/).
## Sources
### Local Administrative Units (LAU)
The original dataset of Local Administrative Units (LAU) can be downloaded from the following link:
https://gisco-services.ec.europa.eu/distribution/v2/lau/download/
LAU can be matched to NUTS via concordance tables:
https://ec.europa.eu/eurostat/web/nuts/local-administrative-units
### Population grid
The population grid can be downloaded from the following link:
https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/population-distribution-demography/geostat
Details on how the population grid was generated as well as context on the reliability of the data is included in the factsheet that can be downloaded with the latest available dataset (at the time of writing), based on 2018 data and released in 2021.
## Copyright and licensing
The data included in this repository are based on various datasets that come with different licensing. Geographic data are not included in this repository and should instead be downloaded from the sources, as current licensing does not allow for redistribution of geographic data nor for their use for commercial purposes (see details below). Indeed, we do not include any of the geo-spatial data provided in the original datasets, and include only standard identifiers of NUTS regions and LAUs as well as basic statistics distributed by Eurostat.
See below and links to the sources for more details on copyright and licensing.
© EuroGeographics for the administrative boundaries (see [full licensing details for LAUs](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units))
Licensing for the population grid dataset varies depening on year and the country originally providing the data. See [the relevant page for full details](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/population-distribution-demography). The population grid for 2018 has been created by the European Commission, Joint Research Centre (B.3) and DG REGIO (B.1); ownership: European Commission.
[Correspondence tables](https://ec.europa.eu/eurostat/web/nuts/local-administrative-units) do not seem to carry additional licensing, besides what is standard to all contents distributed by Eurostat.
Besides considering all of the above, if you use these data or the scripts that generate them you are encouraged, but not required, to credit [EDJNet](https://europeandatajournalism.eu/) and, if relevant, to link to this repository.