-
Notifications
You must be signed in to change notification settings - Fork 0
/
NHS_Borders_cancer_incidence.Rmd
420 lines (372 loc) · 17.9 KB
/
NHS_Borders_cancer_incidence.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
---
title: "Incidence of cancer in NHS Borders"
subtitle: "Data Analysis using R"
author: "Lenka Rozborilova"
date: "4/2/2021"
output:
html_document: default
pdf_document: default
df_print: paged
---
```{r setup, include=FALSE}
# Don't include messages and warnings in all of knitted file
# by setting warning = F, and message = F in the setup chunk below
# can add echo = F if don't want any code
knitr::opts_chunk$set(echo = F,
warning = F, message = F,
results = F)
```
```{r}
library(tidyverse)
library(janitor)
library(here)
library(lubridate)
```
Following analysis is based on data available from The Scottish Health and Social Care Open Data platform, https://www.opendata.nhs.scot/dataset/annual-cancer-incidence
```{r}
# read in data, downloaded from https://www.opendata.nhs.scot/dataset/annual-cancer-incidence/resource/3aef16b7-8af6-4ce0-a90b-8a29d6870014
incidence_all_HB <- read_csv(here("data/new_cancer_incidence.csv")) %>% clean_names()
# filter Borders Health Board, Code S08000016
incidence_borders <- incidence_all_HB %>%
filter(hb == "S08000016")
head(incidence_borders)
```
```{r}
# list all years of diagnosis (1994-2018)
years <- unique(incidence_borders$year) %>% sort()
```
Upward trend in the incidences of cancer in NHS Borders between years 1994 and 2018:
```{r}
# total number of new cancer registrations (=incidences) in each year (Borders only)
# All cancer types are excluding Non-melanoma skin cancer, need to add it
total_new_cancers_registrations_years <- incidence_borders %>%
filter(cancer_site == "All cancer types" | cancer_site == "Non-melanoma skin cancer") %>%
filter(sex %in% c("Female", "Male")) %>%
group_by(year) %>%
summarise(total_new_cancers_registrations = sum(incidences_all_ages))
# visualize time series
# we can clearly see there is an upward trend in registered incidences of cancer in NHS HB Borders
total_new_cancers_registrations_years %>%
ggplot() +
aes(x = year, y = total_new_cancers_registrations) +
geom_line() +
geom_point() +
theme_bw() +
scale_x_continuous(labels = years, breaks = years) +
theme(axis.text.x = element_text(angle = 90)) +
labs(
x = "",
y = "new cancer registrations",
title = "Incidence of cancer in NHS Borders, 1994-2018",
subtitle = "All cancer types, incl. non-melanoma skin cancer"
)
```
Number of all cancer registrations (incl. non-melanoma skin cancer) between years 1994 and 2018 (period of 25 years) was 23385.
```{r}
# total of all cancer registrations over the period of 25 years was 23385
# calculated as total of All cancer types excluding Non-melanoma skin cancer plus Non-melanoma skin cancer (ICD10 codes "C00-C97, excluding C44" + "C44")
# C00-C97, excluding C44 - All cancer types
# C44 - Non-melanoma skin cancer
# 17644 + 5741 = 23385
total_new_cancers_registrations_years %>%
summarise(total_1994_2018 = sum(total_new_cancers_registrations))
```
\newpage
Summary of all incidences of cancer diagnosed in NHS Borders, broken down by cancer sites for all ages and for both sexes throughout the years 1994 to 2018 (32 categories):
```{r}
# list all the ICD10 codes and related diagnosed cancer sites (51 + 1 (all cancer types))
ICD10_codes <- incidence_borders %>%
distinct(cancer_site_icd10code, cancer_site) %>%
arrange(cancer_site_icd10code) %>%
as_tibble()
# BUT! some of the categories are subsets of other categories!
# We need to properly assign these subset categories to superset categories
incidence_borders_with_superset_categories <- incidence_borders %>%
filter(cancer_site != "All cancer types") %>%
#select(-c(id, hb, sex_qf, crude_rate:sir_upper95pc_confidence_interval)) %>%
mutate(
cancer_site_superset = case_when(
cancer_site_icd10code %in% c("C00-C14", "C01-C02", "C01-C06", "C01, C02.4, C05.1, C05.2, C09, C10", "C03-C06", "C07-C08") ~ "C00-C14, C30-C32", #Head and neck
cancer_site_icd10code %in% c("C18", "C19-C20") ~ "C18-C20", #Colorectal cancer
cancer_site_icd10code %in% c("C44, M-8050-8078, M-8083-8084", "C44, M-8090-8098") ~ "C44", #Non-melanoma skin cancer
cancer_site_icd10code %in% c("C53", "C54") ~ "C53-C55", #Uterus
cancer_site_icd10code %in% c("C70-C72, C75.1-C75.3", "C71", "D18.0, D32-D33,
D35.2-D35.4, D42-D43, D44.3-D44.5") ~ "C70-C72, C75.1-C75.3, D18.0, D32-D33, D35.2-D35.4, D42-D43, D44.3-D44.5", #All brain and CNS tumours (malignant and non-malignant)
cancer_site_icd10code %in% c("C91.0", "C91.1", "C92.0", "C92.1-C92.2") ~ "C91-C95", #Leukaemias
cancer_site_icd10code %in% c("ICD-10 C47+C49") ~ "ICD-10 C40-C41, C47+C49", #Bone and connective tissue
TRUE ~ cancer_site
)
) %>%
mutate(
superset = case_when(
cancer_site_superset %in% c("C00-C14, C30-C32", "C18-C20", "C44", "C53-C55",
"C70-C72, C75.1-C75.3, D18.0, D32-D33, D35.2-D35.4, D42-D43, D44.3-D44.5", "C91-C95", "ICD-10 C40-C41, C47+C49") ~ FALSE,
TRUE ~ TRUE
)
)
# list all the superset categories (there is 32 of them)
all_superset_categories <- incidence_borders_with_superset_categories %>%
filter(superset == TRUE) %>%
distinct(cancer_site)
# Total of incidences of each cancer superset category in all ages for both sexes throughout the years 1994 to 2018
# # Non-melanoma skin cancer - 5741
# # Breast - 2483
# # Trachea, bronchus and lung - 2441
# # Colorectal cancer - 2436
# # Prostate - 2119
# # Carcinoma in situ of the cervix uteri - 1386
incidence_borders_with_superset_categories_totals <- incidence_borders_with_superset_categories %>%
filter(sex %in% c("Female", "Male")) %>%
#filter(sex == "All") %>%
filter(superset == TRUE) %>%
group_by(cancer_site) %>%
summarise(total = sum(incidences_all_ages)) %>%
arrange(desc(total)) %>%
as_tibble()
# visualize (cancer categories on y axis)
# The number of new cancer registrations for each cancer site between years 1994 and 2018
incidence_borders_with_superset_categories_totals %>%
ggplot() +
aes(x = reorder(cancer_site, total), y = total) %>%
geom_col(stat = "identity", fill = "#f68060", width = 0.4) +
theme_bw() +
coord_flip() +
labs(
x = "",
y = "number of incidences",
title = "Overview of the frequency of cancer sites",
subtitle = "NHS Borders, 1994-2018"
)
```
Note: Some of the cancer site categories also include sub-categories, please see the footnote for more detail:^[Non-melanoma skin cancer (incl. Squamous cell carcinoma of the skin, Basal cell carcinoma of the skin);
Colorectal cancer (incl. Colon, Rectum and rectosigmoid junction);<br>
Uterus (incl. Cervix uteri, Corpus uteri);<br>
Leukaemias (incl. Acute lymphoblastic leukaemia, Chronic lymphocytic leukaemia, Acute myeloid leukaemia, Chronic myeloid leukaemia);<br>
All brain and CNS tumours (malignant and non-malignant) (incl. Malig brain ca (incl pit. gland, cranio. duct, pineal gland), Malignant brain cancer, Non-malig brain ca (incl pit.gland,cranio.duct,pineal gland));<br>
Head and neck (incl. Lip, oral cavity and pharynx, Tongue, Oropharyngeal cancers, Mouth (IARC definition), Salivary glands);<br>
Bone and connective tissue (incl. Connective tissue)]
<details>
<summary>**Number of cancer incidences for the six most common cancer sites diagnosed between years 1994 and 2018**</summary>
```{r results = T}
incidence_borders_with_superset_categories_totals %>%
head(6) %>%
as_tibble()
# # Non-melanoma skin cancer - 5741
# # Breast - 2483
# # Trachea, bronchus and lung - 2441
# # Colorectal cancer - 2436
# # Prostate - 2119
# # Carcinoma in situ of the cervix uteri - 1386
```
</details>
\newpage
Non-melanoma skin cancer was the most diagnosed cancer site in the NHS Borders with Malignant melanoma being the 7th most diagnosed cancer site.
Following graph is showing the Crude Rate^[calculated by dividing the number of new cancers observed during each year by the corresponding number of people in the population at risk] of both types of skin cancer:
```{r}
# rates: crude_rate, easr, standardised_incidence_ratio
# skin cancer
skin_cancer_rates <- incidence_borders_with_superset_categories %>%
filter(superset == TRUE) %>%
filter(cancer_site %in% c("Non-melanoma skin cancer",
"Malignant melanoma of the skin")) %>%
select(cancer_site, sex, year, incidences_all_ages, crude_rate, easr, standardised_incidence_ratio, cancer_site_superset, superset) %>%
group_by(cancer_site)
# visualize skin
skin_cancer_rates %>%
filter(sex == "All") %>%
group_by(year) %>%
ggplot() +
aes(x = year, y = crude_rate, colour = cancer_site) +
geom_line(aes(group = cancer_site)) +
theme_bw() +
scale_colour_discrete("Type of skin cancer") +
scale_x_continuous(labels = years, breaks = years) +
theme(axis.text.x = element_text(angle = 90)) +
labs(
x = "",
y = "rate (per 100,000 persons)",
title = "Crude rate of skin cancer in NHS Borders, 1994-2018"
)
```
\newpage
Crude rate and Standardised Incidence Ratio^[obtained by dividing the observed number of cases of cancer by the “expected” number of cases, tells us if the number of observed cancer cases in a particular geographic area is higher or lower than expected] for the six **most common cancer sites** diagnosed between years 1994 and 2018:
```{r}
# vector of top 6 cancer sites with the highest frequency of incidence between years 1994 and 2018
top_6_cancer_sites <- incidence_borders_with_superset_categories_totals %>%
head(6) %>%
select(cancer_site) %>%
pull()
# top 6 cancer sites with the highest frequency of incidence between years 1994 and 2018, with all details
top_six_cancer_sites_rates <- incidence_borders_with_superset_categories %>%
filter(superset == TRUE) %>%
filter(cancer_site %in% top_6_cancer_sites) %>%
select(cancer_site, sex, year, incidences_all_ages, crude_rate, easr, standardised_incidence_ratio, cancer_site_superset, superset) %>%
group_by(cancer_site)
# To change plot order of facet wrap, need to change the order of variable levels with factor()
top_six_cancer_sites_rates$cancer_site <- factor(top_six_cancer_sites_rates$cancer_site, levels = c("Non-melanoma skin cancer",
"Breast",
"Trachea, bronchus and lung",
"Colorectal cancer",
"Prostate",
"Carcinoma in situ of the cervix uteri"))
# as for instance cancer sites Breast or Prostate has value 0 of variable incidences_all_ages in sex category All,
# I need to use both sexes and calculate and adjust the rates proportionally
top_six_cancer_sites_rates_adjusted <- top_six_cancer_sites_rates %>%
pivot_longer(
cols = c(crude_rate, standardised_incidence_ratio),
names_to = "metric",
values_to = "metric_value"
) %>%
filter(sex %in% c("Female", "Male")) %>%
group_by(year, sex, cancer_site, incidences_all_ages, metric) %>%
summarise(total = sum(metric_value)) %>%
#filter(cancer_site == "Breast") %>%
as_tibble() %>%
mutate(
total_gender = incidences_all_ages * total
) %>%
group_by(year, cancer_site, metric) %>%
summarise(metric_value_both_gender = sum(total_gender)/sum(incidences_all_ages))
# visualize top 6 cancer sites, multi-panel plots, looking at Crude rate and Standardised Incidence Ratio (SIR) throughout 1994-2018
# explain what both metrics mean and what SIR higher than 100% means
## Non-melanoma skin cancer is the most commonly occurring cancer in the UK, followed by breast cancer.
## In 2018, there were XY new cases
top_six_cancer_sites_rates_adjusted %>%
ggplot() +
aes(x = year, y = metric_value_both_gender, group = metric, colour = metric) +
geom_line() +
theme_bw() +
scale_x_continuous(labels = years, breaks = years) +
theme(axis.text.x = element_text(angle = 90)) +
facet_wrap(~cancer_site, ncol = 2) +
scale_color_manual(name = "Metric",
labels = c("crude_rate" = "CR", "standardised_incidence_ratio" = "SIR"),
values = c("crude_rate" = "#F8766D", "standardised_incidence_ratio" = "#00BFC4")) +
labs(
x = "",
y = "ratio",
title = "Crude Rate (CR) and Standardised Incidence Ratio (SIR), \nNHS Borders, 1994-2018",
subtitle = "For top 6 cancer sites by incidences in last 25 years"
)
```
```{r}
## close look at 2018
# # Non-melanoma skin cancer crude_rate 336.33996
# # Non-melanoma skin cancer standardised_incidence_ratio 108.23833
# # Breast crude_rate 163.62747
# # Breast standardised_incidence_ratio 78.47573
top_six_cancer_sites_rates_adjusted_2018 <- top_six_cancer_sites_rates %>%
filter(year == 2018) %>%
pivot_longer(
cols = c(crude_rate, standardised_incidence_ratio),
names_to = "metric",
values_to = "metric_value"
) %>%
filter(sex %in% c("Female", "Male")) %>%
group_by(sex, cancer_site, incidences_all_ages, metric) %>%
summarise(total = sum(metric_value)) %>%
mutate(
total_gender = incidences_all_ages * total
) %>%
group_by(cancer_site, metric) %>%
summarise(metric_value_both_gender = sum(total_gender)/sum(incidences_all_ages))
```
\newpage
Crude rate and Standardised Incidence Ratio for the six cancer sites with the **most extreme SIR** in last 5 years (2013-2018) and visualised the rates again
```{r}
# look at the most extreme SIR in last 5 years
SIR_2013_2018 <- incidence_borders_with_superset_categories %>%
filter(year >= 2013) %>%
filter(standardised_incidence_ratio > 0) %>%
filter(superset == TRUE) %>%
pivot_longer(
cols = c(crude_rate, standardised_incidence_ratio),
names_to = "metric",
values_to = "metric_value"
) %>%
filter(sex %in% c("Female", "Male")) %>%
group_by(year, sex, cancer_site, incidences_all_ages, metric) %>%
summarise(total = sum(metric_value)) %>%
mutate(
total_gender = incidences_all_ages * total
) %>%
group_by(year, cancer_site, metric) %>%
summarise(metric_value_both_gender = sum(total_gender)/sum(incidences_all_ages)) %>%
filter(metric == "standardised_incidence_ratio") %>%
group_by(cancer_site) %>%
summarise(SIR_avg_2013_2018 = sum(metric_value_both_gender)/5) %>%
arrange(desc(SIR_avg_2013_2018))
```
```{r}
# top 6 cancer sites with the highest SIR in last 5 years (2013-2018)
# # Thyroid 227.4784
# # Hodgkin lymphoma 195.9399
# # Bone and articular cartilage 187.9386
# # Multiple myeloma and malignant plasma cell neoplasms 173.2608
# # Non-Hodgkin lymphoma 160.5566
# # Malignant melanoma of the skin 147.4512
SIR_2013_2018 %>%
head(6)
top_6_cancer_sites_SIR <- SIR_2013_2018 %>%
head(6) %>%
select(cancer_site) %>%
pull()
# top 6 cancer sites with the highest SIR in last 5 years (2013-2018), with all details
top_six_cancer_sites_SIR_rates <- incidence_borders_with_superset_categories %>%
filter(superset == TRUE) %>%
filter(cancer_site %in% top_6_cancer_sites_SIR) %>%
select(cancer_site, sex, year, incidences_all_ages, crude_rate, easr, standardised_incidence_ratio, cancer_site_superset, superset) %>%
group_by(cancer_site)
# To change plot order of facet wrap, need to change the order of variable levels with factor()
top_six_cancer_sites_SIR_rates$cancer_site <- factor(top_six_cancer_sites_SIR_rates$cancer_site, levels = c("Thyroid",
"Hodgkin lymphoma",
"Bone and articular cartilage",
"Multiple myeloma and malignant plasma cell neoplasms",
"Non-Hodgkin lymphoma",
"Malignant melanoma of the skin"))
# as for instance cancer sites Breast or Prostate has value 0 of variable incidences_all_ages in sex category All,
# I need to use both sexes and calculate and adjust the rates proportionally
top_six_cancer_sites_SIR_rates_adjusted <- top_six_cancer_sites_SIR_rates %>%
pivot_longer(
cols = c(crude_rate, standardised_incidence_ratio),
names_to = "metric",
values_to = "metric_value"
) %>%
filter(sex %in% c("Female", "Male")) %>%
group_by(year, sex, cancer_site, incidences_all_ages, metric) %>%
summarise(total = sum(metric_value)) %>%
#filter(cancer_site == "Breast") %>%
as_tibble() %>%
mutate(
total_gender = incidences_all_ages * total
) %>%
group_by(year, cancer_site, metric) %>%
summarise(metric_value_both_gender = sum(total_gender)/sum(incidences_all_ages))
# visualize top 6 cancer sites with the highest SIR in last 5 years, multi-panel plots, looking at Crude rate and Standardised Incidence Ratio (SIR) throughout 1994-2018
# explain what both metrics mean and what SIR higher than 100% means
top_six_cancer_sites_SIR_rates_adjusted %>%
ggplot() +
aes(x = year, y = metric_value_both_gender, group = metric, colour = metric) +
geom_line() +
theme_bw() +
scale_x_continuous(labels = years, breaks = years) +
theme(axis.text.x = element_text(angle = 90)) +
facet_wrap(~cancer_site, ncol = 2) +
scale_color_manual(name = "Metric",
labels = c("crude_rate" = "CR", "standardised_incidence_ratio" = "SIR"),
values = c("crude_rate" = "#F8766D", "standardised_incidence_ratio" = "#00BFC4")) +
labs(
x = "",
y = "ratio",
title = "Crude Rate (CR) and Standardised Incidence Ratio (SIR), \nNHS Borders, 1994-2018",
subtitle = "For top 6 cancer sites with the highest SIR in last 5 years"
)
```
**Conclusion:**
Based on the data analysis on incidences of cancer diagnosed in NHS Borders over the period of 25 years (1994-2018), I suggest to allocate future
provision of cancer treatment services in NHS Borders focusing on cancer sites: \newline
* Thyroid \newline
* Hodgkin and Non-Hodgkin lymphoma \newline
* Bone and articular cartilage \newline
* Multiple myeloma and malignant plasma cell neoplasm.\newline
Concidering the high frequency of skin cancer being diagnosed, I would also suggest to focus on increasing the awareness of this type of cancer among public as these are, in many cases, preventable types of cancer.