First load the package. We also load several other packages to help quickly explore the data.
library(getTBinR)
library(ggplot2)
library(knitr)
library(magrittr)
library(dplyr)
Get TB burden data with a single function call. This will download the data if it has never been accessed and then save a local copy to R’s temporary directory (see tempdir()
). If a local copy exists from the current session then this will be loaded instead.
tb_burden <- get_tb_burden()
#> Loading data from: /tmp/RtmpnHmmdl/TB_burden.rds
#> Loading data from: /tmp/RtmpnHmmdl/MDR_TB.rds
#> Joining TB burden data and MDR TB data.
tb_burden
#> # A tibble: 3,850 x 68
#> country iso2 iso3 iso_numeric g_whoregion year e_pop_num e_inc_100k
#> <chr> <chr> <chr> <int> <chr> <int> <int> <dbl>
#> 1 Afghan… AF AFG 4 Eastern Me… 2000 20093756 190
#> 2 Afghan… AF AFG 4 Eastern Me… 2001 20966463 189
#> 3 Afghan… AF AFG 4 Eastern Me… 2002 21979923 189
#> 4 Afghan… AF AFG 4 Eastern Me… 2003 23064851 189
#> 5 Afghan… AF AFG 4 Eastern Me… 2004 24118979 189
#> 6 Afghan… AF AFG 4 Eastern Me… 2005 25070798 189
#> 7 Afghan… AF AFG 4 Eastern Me… 2006 25893450 189
#> 8 Afghan… AF AFG 4 Eastern Me… 2007 26616792 189
#> 9 Afghan… AF AFG 4 Eastern Me… 2008 27294031 189
#> 10 Afghan… AF AFG 4 Eastern Me… 2009 28004331 189
#> # … with 3,840 more rows, and 60 more variables: e_inc_100k_lo <dbl>,
#> # e_inc_100k_hi <dbl>, e_inc_num <int>, e_inc_num_lo <int>,
#> # e_inc_num_hi <int>, e_tbhiv_prct <dbl>, e_tbhiv_prct_lo <dbl>,
#> # e_tbhiv_prct_hi <dbl>, e_inc_tbhiv_100k <dbl>,
#> # e_inc_tbhiv_100k_lo <dbl>, e_inc_tbhiv_100k_hi <dbl>,
#> # e_inc_tbhiv_num <int>, e_inc_tbhiv_num_lo <int>,
#> # e_inc_tbhiv_num_hi <int>, e_mort_exc_tbhiv_100k <dbl>,
#> # e_mort_exc_tbhiv_100k_lo <dbl>, e_mort_exc_tbhiv_100k_hi <dbl>,
#> # e_mort_exc_tbhiv_num <int>, e_mort_exc_tbhiv_num_lo <int>,
#> # e_mort_exc_tbhiv_num_hi <int>, e_mort_tbhiv_100k <dbl>,
#> # e_mort_tbhiv_100k_lo <dbl>, e_mort_tbhiv_100k_hi <dbl>,
#> # e_mort_tbhiv_num <int>, e_mort_tbhiv_num_lo <int>,
#> # e_mort_tbhiv_num_hi <int>, e_mort_100k <dbl>, e_mort_100k_lo <dbl>,
#> # e_mort_100k_hi <dbl>, e_mort_num <int>, e_mort_num_lo <int>,
#> # e_mort_num_hi <int>, cfr <dbl>, cfr_lo <dbl>, cfr_hi <dbl>,
#> # c_newinc_100k <dbl>, c_cdr <dbl>, c_cdr_lo <dbl>, c_cdr_hi <dbl>,
#> # source_rr_new <chr>, source_drs_coverage_new <chr>,
#> # source_drs_year_new <int>, e_rr_pct_new <dbl>, e_rr_pct_new_lo <dbl>,
#> # e_rr_pct_new_hi <dbl>, e_mdr_pct_rr_new <int>, source_rr_ret <chr>,
#> # source_drs_coverage_ret <chr>, source_drs_year_ret <int>,
#> # e_rr_pct_ret <dbl>, e_rr_pct_ret_lo <dbl>, e_rr_pct_ret_hi <dbl>,
#> # e_mdr_pct_rr_ret <int>, e_inc_rr_num <int>, e_inc_rr_num_lo <int>,
#> # e_inc_rr_num_hi <int>, e_mdr_pct_rr <int>,
#> # e_rr_in_notified_pulm <int>, e_rr_in_notified_pulm_lo <int>,
#> # e_rr_in_notified_pulm_hi <int>
The WHO provides a large, detailed, data dictionary for use with the TB burden data. However, searching through this dataset can be tedious. To streamline this process getTBinR
provides a search function to find the definition of a single or multiple variables. Again if not previously used this function will download the data dictionary to the temporary directory, but in subsequent uses will load a local copy.
vars_of_interest <- search_data_dict(var = c("country",
"e_inc_100k",
"e_inc_100k_lo",
"e_inc_100k_hi"))
#> Loading data from: /tmp/RtmpnHmmdl/TB_data_dict.rds
#> 4 results found for your variable search for country, e_inc_100k, e_inc_100k_lo, e_inc_100k_hi
knitr::kable(vars_of_interest)
variable_name | dataset | code_list | definition |
---|---|---|---|
country | Country identification | Country or territory name | |
e_inc_100k | Estimates | Estimated incidence (all forms) per 100 000 population | |
e_inc_100k_hi | Estimates | Estimated incidence (all forms) per 100 000 population, high bound | |
e_inc_100k_lo | Estimates | Estimated incidence (all forms) per 100 000 population, low bound |
We might also want to search the variable definitions for key phrases, for example mortality.
defs_of_interest <- search_data_dict(def = c("mortality"))
#> Loading data from: /tmp/RtmpnHmmdl/TB_data_dict.rds
#> 9 results found for your definition search for mortality
knitr::kable(defs_of_interest)
variable_name | dataset | code_list | definition |
---|---|---|---|
e_mort_100k | Estimates | Estimated mortality of TB cases (all forms) per 100 000 population | |
e_mort_100k_hi | Estimates | Estimated mortality of TB cases (all forms) per 100 000 population, high bound | |
e_mort_100k_lo | Estimates | Estimated mortality of TB cases (all forms) per 100 000 population, low bound | |
e_mort_exc_tbhiv_100k | Estimates | Estimated mortality of TB cases (all forms, excluding HIV) per 100 000 population | |
e_mort_exc_tbhiv_100k_hi | Estimates | Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, high bound | |
e_mort_exc_tbhiv_100k_lo | Estimates | Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, low bound | |
e_mort_tbhiv_100k | Estimates | Estimated mortality of TB cases who are HIV-positive, per 100 000 population | |
e_mort_tbhiv_100k_hi | Estimates | Estimated mortality of TB cases who are HIV-positive, per 100 000 population, high bound | |
e_mort_tbhiv_100k_lo | Estimates | Estimated mortality of TB cases who are HIV-positive, per 100 000 population, low bound |
Finally we could both search for a known variable and for key phrases in variable definitions.
vars_defs_of_interest <- search_data_dict(var = c("country"),
def = c("mortality"))
#> Loading data from: /tmp/RtmpnHmmdl/TB_data_dict.rds
#> 1 results found for your variable search for country
#> 9 results found for your definition search for mortality
knitr::kable(vars_defs_of_interest)
variable_name | dataset | code_list | definition |
---|---|---|---|
country | Country identification | Country or territory name | |
e_mort_100k | Estimates | Estimated mortality of TB cases (all forms) per 100 000 population | |
e_mort_100k_hi | Estimates | Estimated mortality of TB cases (all forms) per 100 000 population, high bound | |
e_mort_100k_lo | Estimates | Estimated mortality of TB cases (all forms) per 100 000 population, low bound | |
e_mort_exc_tbhiv_100k | Estimates | Estimated mortality of TB cases (all forms, excluding HIV) per 100 000 population | |
e_mort_exc_tbhiv_100k_hi | Estimates | Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, high bound | |
e_mort_exc_tbhiv_100k_lo | Estimates | Estimated mortality of TB cases (all forms, excluding HIV), per 100 000 population, low bound | |
e_mort_tbhiv_100k | Estimates | Estimated mortality of TB cases who are HIV-positive, per 100 000 population | |
e_mort_tbhiv_100k_hi | Estimates | Estimated mortality of TB cases who are HIV-positive, per 100 000 population, high bound | |
e_mort_tbhiv_100k_lo | Estimates | Estimated mortality of TB cases who are HIV-positive, per 100 000 population, low bound |
To start exploring the WHO TB data we map, the most recently available, global TB incidence rates. Mapping data can help identify spatial patterns.
getTBinR::map_tb_burden(metric = "e_inc_100k")
#> Loading data from: /tmp/RtmpnHmmdl/TB_burden.rds
#> Loading data from: /tmp/RtmpnHmmdl/MDR_TB.rds
#> Joining TB burden data and MDR TB data.
#> Loading data from: /tmp/RtmpnHmmdl/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k
To showcase how quickly we can go from no data to plotting informative graphs we quickly explore incidence rates for all countries in the WHO data.
getTBinR::plot_tb_burden_overview(metric = "e_inc_100k",
interactive = FALSE)
#> Loading data from: /tmp/RtmpnHmmdl/TB_burden.rds
#> Loading data from: /tmp/RtmpnHmmdl/MDR_TB.rds
#> Joining TB burden data and MDR TB data.
#> Loading data from: /tmp/RtmpnHmmdl/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k
Another way to compare incidence rates in countries is to look at the annual percentage change. The plot below only shows countries with a maximum incidence rate above 5 per 100,000.
higher_burden_countries <- tb_burden %>%
group_by(country) %>%
summarise(e_inc_100k = min(e_inc_100k)) %>%
filter(e_inc_100k > 5) %>%
pull(country) %>%
unique
getTBinR::plot_tb_burden_overview(metric = "e_inc_100k",
interactive = FALSE,
annual_change = TRUE,
countries = higher_burden_countries)
#> Loading data from: /tmp/RtmpnHmmdl/TB_burden.rds
#> Loading data from: /tmp/RtmpnHmmdl/MDR_TB.rds
#> Joining TB burden data and MDR TB data.
#> Loading data from: /tmp/RtmpnHmmdl/TB_data_dict.rds
#> 1 results found for your variable search for e_inc_100k