Introduction

Calculate comorbidities, Charlson scores, perform fast and accurate validation, conversion, manipulation, filtering and comparison of ICD-9 and ICD-10 codes. Common ambiguities and code formats are handled. This package enables a work flow from raw lists of ICD codes in hospital billing databases to comorbidities. ICD-9 and ICD-10 comorbidity mappings from Quan (Deyo and Elixhauser versions), Elixhauser and AHRQ included. This package replaces ‘icd9’, which should be uninstalled.

When calculating which patients have which comorbidities, the data are typically structured in long or wide formats:

# long format ICD-9-CM codes, with present-on-arrival flags
patients_icd9
#>   visit_id  icd9  poa
#> 1     1000 40201    Y
#> 2     1000  2258 <NA>
#> 3     1000  7208    N
#> 4     1000 25001    Y
#> 5     1001 34400    X
#> 6     1001  4011    Y
#> 7     1002  4011    E

# long format ICD-10 codes, real mortality data
uranium_pathology[1:5, ]
#>   case icd10
#> 1    1 F17.9
#> 2    1 I21.9
#> 3    1 K75.9
#> 4    1   R55
#> 5    2 I25.1

# wide format, real ICD-9 discharge diagnoses
vermont_dx[1:5, c(1, 6:15)]
#>   visit_id   DX1   DX2   DX3   DX4   DX5   DX6   DX7   DX8   DX9  DX10
#> 1        7 27801 03842 51881 41519 99591 42842  5849  5609  6826  5853
#> 2       10 71526 25000 42830  4280  4019  4241   311 49390  2724 73300
#> 3       13 71535 59651 78052 27800 V8537   311  4019 53081 56400      
#> 4       16 71535 49390 53081 27800  V140  V141  V142  V160 V8538      
#> 5       37 71536  4241  2859  2720  4414 53081 V5866

In real life, there are often problems with the data, such is NA entries, out-of-order visit_ids, non-existent or invalid ICD codes, etc.. Although standard R tools can be used to clean the data, knowing the specific validation rules for ICD-9 and ICD-10 codes, as well as the standardized structure of the data enables faster and more accurate data cleaning.

One use of this package is to identify diagnoses present on admission to hospital (POA). If a POA field is present in the data, it can be used to filter the diagnosis codes. Let’s walk through an example for some ICD-9 data:

# use AHRQ revision of Elixhauser comorbidities, show only first eight columns
icd9_comorbid_ahrq(patients_icd9)[, 1:8]
#>        CHF Valvular  PHTN   PVD  HTN Paralysis NeuroOther Pulmonary
#> 1000  TRUE    FALSE FALSE FALSE TRUE     FALSE      FALSE     FALSE
#> 1001 FALSE    FALSE FALSE FALSE TRUE      TRUE      FALSE     FALSE
#> 1002 FALSE    FALSE FALSE FALSE TRUE     FALSE      FALSE     FALSE

Things work beautifully using magrittr %>% to chain functions together. I use magrittr in the package itself, and I recommend trying it to clarify chains of commands, such as the following:

# find Elixhauser comorbidities present-on-arrival
patients_icd9 %>% icd_filter_poa %>% icd9_comorbid_elix %>% head
#>        CHF Arrhythmia Valvular  PHTN   PVD   HTN Paralysis NeuroOther
#> 1000 FALSE      FALSE    FALSE FALSE FALSE FALSE     FALSE      FALSE
#> 1001 FALSE      FALSE    FALSE FALSE FALSE  TRUE     FALSE      FALSE
#>      Pulmonary    DM  DMcx Hypothyroid Renal Liver   PUD   HIV Lymphoma
#> 1000     FALSE  TRUE FALSE       FALSE FALSE FALSE FALSE FALSE    FALSE
#> 1001     FALSE FALSE FALSE       FALSE FALSE FALSE FALSE FALSE    FALSE
#>       Mets Tumor Rheumatic Coagulopathy Obesity WeightLoss FluidsLytes
#> 1000 FALSE FALSE     FALSE        FALSE   FALSE      FALSE       FALSE
#> 1001 FALSE FALSE     FALSE        FALSE   FALSE      FALSE       FALSE
#>      BloodLoss Anemia Alcohol Drugs Psychoses Depression
#> 1000     FALSE  FALSE   FALSE FALSE     FALSE      FALSE
#> 1001     FALSE  FALSE   FALSE FALSE     FALSE      FALSE

# same as above, then summarize first five:
patients_icd9 %>% 
  icd_filter_poa %>% 
  icd9_comorbid_elix %>% 
  extract(, 1:5) %>% 
  apply(2, as.integer) %>% 
  summary
#>       CHF      Arrhythmia    Valvular      PHTN        PVD   
#>  Min.   :0   Min.   :0    Min.   :0   Min.   :0   Min.   :0  
#>  1st Qu.:0   1st Qu.:0    1st Qu.:0   1st Qu.:0   1st Qu.:0  
#>  Median :0   Median :0    Median :0   Median :0   Median :0  
#>  Mean   :0   Mean   :0    Mean   :0   Mean   :0   Mean   :0  
#>  3rd Qu.:0   3rd Qu.:0    3rd Qu.:0   3rd Qu.:0   3rd Qu.:0  
#>  Max.   :0   Max.   :0    Max.   :0   Max.   :0   Max.   :0

# convert vermont discharge data to wide format, 
# find comorbidities, convert TRUE to 1 and show first few
vermont_dx %>% icd_wide_to_long  %>% icd9_comorbid_quan_deyo  %>% apply(2, as.integer) %>%  head
#>      MI CHF PVD Stroke Dementia Pulmonary Rheumatic PUD LiverMild DM DMcx
#> [1,]  0   1   0      0        0         0         0   0         0  0    0
#> [2,]  0   1   0      0        0         1         0   0         0  1    0
#> [3,]  0   0   0      0        0         0         0   0         0  0    0
#> [4,]  0   0   0      0        0         1         0   0         0  0    0
#> [5,]  0   0   1      0        0         0         0   0         0  0    0
#> [6,]  0   0   0      0        0         0         0   0         0  0    0
#>      Paralysis Renal Cancer LiverSevere Mets HIV
#> [1,]         0     1      1           0    0   0
#> [2,]         0     0      0           0    0   0
#> [3,]         0     0      0           0    0   0
#> [4,]         0     0      0           0    0   0
#> [5,]         0     0      0           0    0   0
#> [6,]         0     0      0           0    0   0

The above can be rewritten in classic R with many parentheses:

head(apply(icd9_comorbid_quan_deyo(icd_wide_to_long(vermont_dx)), 2, as.integer))

Specifying data types

icd will guess the type and form of input data when possible, but there are ambiguities:

icd_is_valid("100") # valid ICD-9 code
#> [1] TRUE
icd_is_valid("A1001") # valid ICD-10 code
#> [1] TRUE
icd_is_valid(c("100", "A1001")) # they can't both be valid
#> [1]  TRUE FALSE

You can let icd guess types, or specify the type of your data explicitly:

# decimal format ICD-10 codes
codes <- c("A10.01", "L40.50", "Z77.098")
# set class to be icd10cm (and implicitly icd10)
as.icd10cm(codes)
#> [1] "A10.01"  "L40.50"  "Z77.098"
#> attr(,"class")
#> [1] "icd10cm"   "icd10"     "character"
# set class to indicate decimal code and icd10 (not necessarily icd10cm)
codes %>% as.icd_decimal_diag %>% as.icd10
#> [1] "A10.01"  "L40.50"  "Z77.098"
#> attr(,"icd_short_diag")
#> [1] FALSE
#> attr(,"class")
#> [1] "icd10"     "character"

Doing this avoids mistakes in guessing type. For example code V10 is valid in both ICD-9 and ICD-10.

Vectors of codes, data frames and matrices can have a class set. Conflicting classes are not allowed, e.g. if a data frame has columns with both ICD-9 and ICD-10 codes, it doesn’t make sense to set an ICD version for the data frame.

df <- data.frame(i9 = as.icd9(c("100", "001")), 
                 i10 = as.icd10(c("Z771", "Z87820")))

# demonstrate that an error is thrown for trying to do this:
try(df %>% as.icd9 %>% as.icd10)

Converting ICD-9 codes between types

ICD-9 codes are usually presented in decimal format (beware, for this is not a number), e.g. 003.21, whereas most electronic records seem to use the short form without a decimal place. These are not interchangeable simply by removing the decimal place, and great care is taken to do this correctly. The functions were also designed to deal with the common problem of incorrectly formatted ICD-9 codes. The assumption is made that short codes of three or fewer characters are describing only the ‘major’ part: there is no other reasonable interpretation. For example, 020 must be taken to mean 20, not 2.0 or even 0.20. In most cases, when icd works on ICD-9 codes, it will convert any codes of fewer than three characters into zero-padded three-digit codes.

icd_decimal_to_short(c("1", "10.20", "100", "123.45"))
#> [1] "001"   "01020" "100"   "12345"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd9"      "character"
icd_short_to_decimal(c("1", "22", "2244", "1005"))
#> [1] "001"   "022"   "224.4" "100.5"
#> attr(,"icd_short_diag")
#> [1] FALSE
#> attr(,"class")
#> [1] "icd9"      "character"

# similar operations with magrittr, also showing invalid codes
codes <- as.icd9(c("87.65", "9999", "Aesop", -100, "", NA))
icd_decimal_to_short(codes)
#> [1] "08765" ""      ""      ""      ""      NA     
#> attr(,"class")
#> [1] "icd9"      "character"
#> attr(,"icd_short_diag")
#> [1] TRUE

Validation of ICD-9 codes

# guess both ICD version (9, but could be 10?), and decimal vs short form
icd_is_valid("V10.2")
#> [1] TRUE

# state we are using short or decimal codes:
icd_is_valid(c("099.17", "-1"), short_code = TRUE)
#> [1] FALSE FALSE
icd_is_valid(c("099.17", "-1.1"), short_code = FALSE)
#> [1]  TRUE FALSE
icd_is_valid(c("1", "001", "100", "123456", "003.21"), short_code = TRUE)
#> [1]  TRUE  TRUE  TRUE FALSE FALSE

Ranges of ICD-9 codes

These functions generate syntactically valid ICD-9 codes, without including parent codes when the range limit would subset the parent. E.g. "100.99" %i9da% "101.01" does not include 100 or 100.0, both of which imply larger subsets than requested by the range command (i.e. every code up to 100.99). The shorter forms %i9s% and %i9d% return only real codes (i.e. listed in the CMS definitions as either three-digit codes or diagnoses), whereas %i9sa% and %i9da% return all possible syntactically valid ICD-9 codes:

# get all possible codes
#"003" %i9sa% "0033" %>% head(9) # show first 9 of 111 values
# just get the ones which correspond to diagnoses (keeping the 3-digit chapters)
#"494" %i9s% "4941"

#"10099" %i9sa% "10101"
#"V10" %i9da% "V10.02"
"E987" %i9da% "E988.1"
#>  [1] "E987"   "E987.0" "E987.1" "E987.2" "E987.3" "E987.4" "E987.5"
#>  [8] "E987.6" "E987.7" "E987.8" "E987.9" "E988.0" "E988.1"
#> attr(,"icd_short_diag")
#> [1] FALSE
#> attr(,"class")
#> [1] "icd9"      "character"

# can't range between different types:
# "V10" %i9s% "E800" # throws an error

This is used internally to interpret ranges of ICD-9 codes specified in the literature. Sometimes it is not clear exactly what an ICD-9 range presented in a paper means, but at least we can explicitly decide what should be included in our interpretation, and the ranges can be reused even when the underlying codes may be different, as codes are added and removed from time-to-time, and although the original papers would have been based on their ICD-9 ranges resolving to a specific set of codes, they are likely to be valid for new diagnoses in the given subgroups. Ultimately, this needs detailed attention, but the strategy in is to give a good best guess, given these limitations.

Another way of specifying ranges are to use function calls. These are exactly equivalent to the %i9s% and %i9d% range operators. This example shows the result when the user specifies a range which would include parents but not all their children:

icd_expand_range("4820", "4823") # default, equivalent to %i9s%
#> [1] "4820"  "4821"  "4822"  "4823"  "48230" "48231" "48232" "48239"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd9"      "character"
icd_expand_range("4820", "4823", defined = FALSE)
#>  [1] "4820"  "48200" "48201" "48202" "48203" "48204" "48205" "48206"
#>  [9] "48207" "48208" "48209" "4821"  "48210" "48211" "48212" "48213"
#> [17] "48214" "48215" "48216" "48217" "48218" "48219" "4822"  "48220"
#> [25] "48221" "48222" "48223" "48224" "48225" "48226" "48227" "48228"
#> [33] "48229" "4823"  "48230" "48231" "48232" "48233" "48234" "48235"
#> [41] "48236" "48237" "48238" "48239"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd9"      "character"
# see the first few differences (which are by definition not 'real' codes):
setdiff(icd_expand_range("4820", "4823", defined = FALSE),
        icd_expand_range("4820", "4823")) %>% head
#> [1] "48200" "48201" "48202" "48203" "48204" "48205"

It is easy to find the children of a higher-level ICD-9 code:

icd_children(as.icd9("391"))
#> [1] "391"  "3910" "3911" "3912" "3918" "3919"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd9"      "character"
# mid-level code
icd:::icd_children.icd9("0032")
#> [1] "0032"  "00320" "00321" "00322" "00323" "00324" "00329"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd9"      "character"
# leaf node has no children
# be explicit about the type of code:
test_code <- as.icd9(as.icd_short_diag("00321"))
icd_children(test_code)
#> [1] "00321"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd9"      "character"
# or the same, but guessing the characteristics
icd_children("00321")
#> [1] "00321"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd9"      "character"
# pneumococcal pneumonia is a three-digit ICD-9 code with no descendants
icd_children("481")
#> [1] "481"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd9"      "character"

By adding defined = TRUE, all syntactically valid ICD-9 codes are returned, even if not defined by CMS as diagnoses. This is relevant because of minor coding errors, or coding in a different year to the master list. A planned feature is to allow testing of an ICD-9 code against the valid codes for the year it was entered, but at present only the 2014 master list is used. This means that some older valid codes may no longer be on the list. However, there have been very few changes to ICD-9-CM in the last five years with ICD-10-CM in the wings.

# first ten possible ICD-9 child codes from 391
icd_children("391", defined = FALSE)[1:10]
#>  [1] "391"   "3910"  "39100" "39101" "39102" "39103" "39104" "39105"
#>  [9] "39106" "39107"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd9"      "character"

Decoding ICD-9 codes to descriptions

There are various ways of extracting the description of the condition described by an ICD-9 code. the icd_explain group of functions return a data frame with a column for the ICD-9 code, a column for the full length Diagnosis, and a column for the short Description.

icd_explain("1.0") # 'decimal' format code inferred
#> [1] "Cholera due to vibrio cholerae"
icd_explain("0019") # 'short' format code inferred
#> [1] "Cholera, unspecified"
# we can be explicit about short vs decimal
icd_explain("434.00", short_code = FALSE)
#> [1] "Cerebral thrombosis without mention of cerebral infarction"
icd_explain(c("43410", "43491"), short_code = TRUE)
#> [1] "Cerebral embolism without mention of cerebral infarction"       
#> [2] "Cerebral artery occlusion, unspecified with cerebral infarction"
#explain top level code with children
"391" %>% icd_explain # single three-digit code
#> [1] "Rheumatic fever with heart involvement"
"391" %>% icd_children # let's see the child codes
#> [1] "391"  "3910" "3911" "3912" "3918" "3919"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd9"      "character"
"391" %>% icd_children %>% icd_explain # children condensed to parent code
#> [1] "Rheumatic fever with heart involvement"
"391" %>% icd_children %>% icd_explain(condense = FALSE) # prevent condense
#> [1] "Rheumatic fever with heart involvement"    
#> [2] "Acute rheumatic pericarditis"              
#> [3] "Acute rheumatic endocarditis"              
#> [4] "Acute rheumatic myocarditis"               
#> [5] "Other acute rheumatic heart disease"       
#> [6] "Acute rheumatic heart disease, unspecified"

Arbitrary named list(s) of codes:

icd_explain(list(somecodes = as.icd9(c("001", "391")),
                 morecodes = as.icd9cm(c("001.1", "001.9"))))
#> $somecodes
#> [1] "Cholera"                               
#> [2] "Rheumatic fever with heart involvement"
#> 
#> $morecodes
#> [1] "Cholera due to vibrio cholerae el tor"
#> [2] "Cholera, unspecified"

001 (Cholera) isn’t itself a diagnostic code, i.e. leaf node in the hierarchy, but 390 (Rheumatic fever without heart involvement) is. Both are explained correctly:

icd_explain(list(cholera = "001", rheumatic_heart = "390"))
#> $cholera
#> [1] "Cholera"
#> 
#> $rheumatic_heart
#> [1] "Rheumatic fever without mention of heart involvement"

Now try to explain on a non-existent (but ‘valid’) ICD-9 code:

s <- icd_explain("001.5") # gives warning

As we have just seen, icd_explain can convert lists of ICD-9 or ICD-10 codes to a human-readable format. Let’s apply the icd_explain to a list of comorbidity ICD-9 codes in one of the commonly-used mappings. This makes comprehending a complicated list much easier. Taking the list for dementia:

length(icd9_map_quan_deyo[["Dementia"]]) # 133 possible ICD-9 codes
#> [1] 133
length(icd10_map_quan_deyo[["Dementia"]]) # the ICD-10 map is different
#> [1] 20
# icd_explain summarizes these to just two groups:
icd9_map_quan_deyo[["Dementia"]] %>% icd_explain(warn = FALSE)
#> [1] "Dementias"                                  
#> [2] "Dementia in conditions classified elsewhere"
#> [3] "Senile degeneration of brain"
# contrast with:
icd9_map_quan_deyo[["Dementia"]] %>% icd_explain(condense = TRUE, warn = FALSE)
#> [1] "Dementias"                                  
#> [2] "Dementia in conditions classified elsewhere"
#> [3] "Senile degeneration of brain"

Use a range with more than two hundred ICD-9 codes (most of them not real):

length("390" %i9da% "392.1")
#> [1] 244
"390" %i9da% "392.1" %>% icd_explain(warn = FALSE)
#> [1] "Rheumatic fever without mention of heart involvement"
#> [2] "Rheumatic fever with heart involvement"

The warnings here are irrelevant because we know that `%i9da% produces codes which do not correspond to diagnoses. However, in other usage, the user would typically expect the ICD-9 codes he or she is using to be diagnostic, hence the default to warn.

Filtering by Present-on-Arrival

This flag is recorded with each ICD-9 code, indicating whether that diagnosis was present on admission. With some caution, codes flagged specifically not POA can be treated as new diseases during an admission.

Present-on-arrival (POA) is typically a factor, or vector of values such as “Y”, “N”, “X”, “E”, or NA. Intermediate codes, such as “exempt”, “unknown” and NA mean that “yes” is not the same as “not no.” This requires four functions to cover the possibilities stored in icd_poa_choices:

#> [1] "yes"    "no"     "notYes" "notNo"

Filter for present-on-arrival being “Y”

patients_icd9 %>% icd_filter_poa_yes
#>   visit_id  icd9
#> 1     1000 40201
#> 4     1000 25001
#> 6     1001  4011

Show that yes is not equal to not no (e.g. due to NA in poa field)

patients_icd9 %>% icd_filter_poa_not_no
#>   visit_id  icd9
#> 1     1000 40201
#> 2     1000  2258
#> 4     1000 25001
#> 5     1001 34400
#> 6     1001  4011
#> 7     1002  4011

Comorbidities

The comorbidities from different sources are provided as lists. At present only the most recent mapping of ICD-9 codes to comorbidities is provided. See these github issues.

This package contains ICD-9-CM to comorbidity mappings from several sources, based on either the Charlson or Elixhauser lists of comorbidities. Updated versions of these lists from AHRQ and Quan et al are included, along with the original Elixhauser mapping . Since some data is provided in SAS source code format, this package has internal functions to parse this SAS source code and generate R data structures. This processing is limited to what is needed for this purpose, although may be generalizable and useful in other contexts. Other lists are transcribed directly from the published articles, but interpretation of SAS code used for the original publications is preferable.

AHRQ comorbidity classification

The AHRQ keeps an updated version of the Elixhauser classification of ICD-9-CM codes into comorbidities, useful for research. They provide the data in the form of SAS code. The names of the comorbidities derived from ICD-9 and ICD-10 codes are the same. Maps contain the ICD code to comorbidity mappings; the functions that apply those mappings are called things like icd10_comorbid_ahrq.

#icd9_map_ahrq <- icd:::sas_parse_ahrq() # user doesn't need to do this
names(icd9_map_ahrq)
#>  [1] "CHF"          "Valvular"     "PHTN"         "PVD"         
#>  [5] "HTN"          "HTNcx"        "Paralysis"    "NeuroOther"  
#>  [9] "Pulmonary"    "DM"           "DMcx"         "Hypothyroid" 
#> [13] "Renal"        "Liver"        "PUD"          "HIV"         
#> [17] "Lymphoma"     "Mets"         "Tumor"        "Rheumatic"   
#> [21] "Coagulopathy" "Obesity"      "WeightLoss"   "FluidsLytes" 
#> [25] "BloodLoss"    "Anemia"       "Alcohol"      "Drugs"       
#> [29] "Psychoses"    "Depression"
icd9_map_ahrq$CHF[1:5]
#> [1] "39891" "40201" "40211" "40291" "40401"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd9"      "character"
icd10_map_ahrq$CHF[1:5]
#> [1] "I0981" "I501"  "I5020" "I5021" "I5022"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd10"     "character"

Elixhauser comorbidities

Elixhauser originally devleoped this set of comorbidities to predict long term mortality based on hospital ICD-9-CM coding records. The AHRQ comorbidities are an updated version of this, however the original Elixhauser have been used in many publications. The ICD-9-CM codes have changed slightly over the years.

# the names of the comorbidities in each map are available as named lists:
icd_names_elix[1:5]
#> $`01`
#> [1] "Congestive heart failure"
#> 
#> $`02`
#> [1] "Cardiac arrhythmias"
#> 
#> $`03`
#> [1] "Valvular disease"
#> 
#> $`04`
#> [1] "Pulmonary circulation disorders"
#> 
#> $`05`
#> [1] "Peripheral vascular disorders"
unlist(unname(icd_names_elix))
#>  [1] "Congestive heart failure"                       
#>  [2] "Cardiac arrhythmias"                            
#>  [3] "Valvular disease"                               
#>  [4] "Pulmonary circulation disorders"                
#>  [5] "Peripheral vascular disorders"                  
#>  [6] "Hypertension, combined"                         
#>  [7] "Paralysis"                                      
#>  [8] "Other neurological disorders"                   
#>  [9] "Chronic pulmonary disease"                      
#> [10] "Diabetes, uncomplicated"                        
#> [11] "Diabetes, complicated"                          
#> [12] "Hypothyroidism"                                 
#> [13] "Renal failure"                                  
#> [14] "Liver disease"                                  
#> [15] "Peptic ulcer disease excluding bleeding"        
#> [16] "HIV/AIDS"                                       
#> [17] "Lymphoma"                                       
#> [18] "Metastatic cancer"                              
#> [19] "Solid tumor without metastasis"                 
#> [20] "Rheumatoid arthritis/collagen vascular diseases"
#> [21] "Coagulopathy"                                   
#> [22] "Obesity"                                        
#> [23] "Weight loss"                                    
#> [24] "Fluid and electrolye disorders"                 
#> [25] "Blood loss anemia"                              
#> [26] "Deficiency anemias"                             
#> [27] "Alcohol abuse"                                  
#> [28] "Drug abuse"                                     
#> [29] "Psychoses"                                      
#> [30] "Depression"
# The map contents have ICD codes with the class set
icd9_map_elix$HTNcx
#> [1] "40210" "40290" "40410" "40490" "40511" "40519" "40591" "40599"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd9"      "character"
icd10_map_elix$HTNcx
#> [1] "I11" "I12" "I13" "I15"
#> attr(,"icd_short_diag")
#> [1] TRUE
#> attr(,"class")
#> [1] "icd10"     "character"

Quan

Quan’s paper looked at indices using both ICD-10 and ICD-9-CM. Quan generated updated ICD-9-CM codes for all 30 of Elixhauser and all 17 of Charlson/Deyo’s comorbidities. Thus there are two ‘Quan’ comorbidity mappings.

names(icd10_map_quan_deyo)
#>  [1] "MI"          "CHF"         "PVD"         "Stroke"      "Dementia"   
#>  [6] "Pulmonary"   "Rheumatic"   "PUD"         "LiverMild"   "DM"         
#> [11] "DMcx"        "Paralysis"   "Renal"       "Cancer"      "LiverSevere"
#> [16] "Mets"        "HIV"
names(icd10_map_quan_elix)
#>  [1] "CHF"          "Arrhythmia"   "Valvular"     "PHTN"        
#>  [5] "PVD"          "HTN"          "HTNcx"        "Paralysis"   
#>  [9] "NeuroOther"   "Pulmonary"    "DM"           "DMcx"        
#> [13] "Hypothyroid"  "Renal"        "Liver"        "PUD"         
#> [17] "HIV"          "Lymphoma"     "Mets"         "Tumor"       
#> [21] "Rheumatic"    "Coagulopathy" "Obesity"      "WeightLoss"  
#> [25] "FluidsLytes"  "BloodLoss"    "Anemia"       "Alcohol"     
#> [29] "Drugs"        "Psychoses"    "Depression"

Examples

Filter patients and create comorbidities

Take my patients, find the ones where there definitely or maybe was a diagnosis present on admission, then generate comorbidities based on the AHRQ mapping. N.b. NotNo is not the same as Yes because of some exempt, unclassifiable conditions, or NA values for the present-on-admission flag.

patients_icd9 %>%
  icd_filter_poa_not_no %>%
  icd9_comorbid_ahrq %>%
  extract(1:9)
#> [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Compare two comorbidity definitions

We will find the differences between some categories of the original Elixhauser and the updated version by Quan. Just taking the select few comorbidity groups for brevity:

difference <- icd_diff_comorbid(icd9_map_elix, icd9_map_quan_elix,
                 all_names = c("CHF", "PHTN", "HTN", "Valvular"))
#> Comorbidity CHF: 
#> icd9_map_quan_elix has 70 codes not in icd9_map_elix. First few are: 'Heart failure' 'Mal hypert hrt dis w hf' 'Mal hyp ht/kd I-IV w hf' 'Mal hyp ht/kd stg V w hf' 'Prim cardiomyopathy NEC' 
#> Comorbidity PHTN: 
#> icd9_map_quan_elix has 45 codes not in icd9_map_elix. First few are: 'Acute pulmonary heart disease' 'Chronic pulmonary heart disease' 'Arterioven fistu pul ves' 'Pulmon circulat dis NEC' 
#> Comorbidity HTN: 
#> icd9_map_quan_elix has 89 codes not in icd9_map_elix. First few are: 'Essential hypertension' 
#> Comorbidity Valvular: 
#> icd9_map_quan_elix has 106 codes not in icd9_map_elix. First few are: 'Diseases of mitral valve' 'Diseases of other endocardial structures' 'Other diseases of endocardium' 'Syphilitic endocarditis'
# reuslts also returned as data
str(difference)
#> List of 4
#>  $ CHF     :List of 3
#>   ..$ both  : chr [1:117] "39891" "40211" "40291" "40411" ...
#>   ..$ only.x: chr(0) 
#>   ..$ only.y: chr [1:70] "40201" "40401" "40403" "4254" ...
#>  $ PHTN    :List of 3
#>   ..$ both  : chr [1:121] "4160" "41600" "41601" "41602" ...
#>   ..$ only.x: chr(0) 
#>   ..$ only.y: chr [1:45] "4150" "41500" "41501" "41502" ...
#>  $ HTN     :List of 3
#>   ..$ both  : chr [1:22] "4011" "40110" "40111" "40112" ...
#>   ..$ only.x: chr(0) 
#>   ..$ only.y: chr [1:89] "401" "4010" "40100" "40101" ...
#>  $ Valvular:List of 3
#>   ..$ both  : chr [1:526] "09320" "09321" "09322" "09323" ...
#>   ..$ only.x: chr(0) 
#>   ..$ only.y: chr [1:106] "0932" "09325" "09326" "09327" ...

Which pulmonary hypertension codes are only in Quan’s version?

difference$PHTN$only.y %>% icd_get_defined %>% icd_explain
#> [1] "Acute pulmonary heart disease"                    
#> [2] "Chronic pulmonary heart disease"                  
#> [3] "Arteriovenous fistula of pulmonary vessels"       
#> [4] "Other specified diseases of pulmonary circulation"

(Passing through icd_get_defined stops icd_explain complaining that some of the input codes don’t exist. This is because the comorbidity mappings have every possible numerical ICD-9 code, not just the official ones. Could also use warn = FALSE option in icd_explain)

Find comorbidities for a large number of patients.

I understand that comorbiditity assignment using SAS is a lengthy business. Let’s generate 100,000 patients with a random selection of comorbidities:

# codes selected from AHRQ mapping
many_patients <- icd:::generate_random_pts(1e7) 
system.time(
  icd999999999_comorbid_ahrq(many_patients)
  )[["elapsed"]] 

This is not run to avoid unnecessary load when CRAN builds the package. I get about 5 seconds for 10 million rows of comorbidities on a moderately powerful workstation.

Arbitrary ICD-9 mapping

The user can provide any ICD-9, ICD-10 or other code mapping to comorbidities they wish. Submissions of other peer-reviewed published mappings could be included in this package, if their license permits. Create an issue in github or email me at jack@jackwasey.com) Included in this package is a small data set called icd9_chapters, which lists the ICD-9-CM (and indeed ICD-9) Chapters. These can easily be expanded out and used as a mapping, so instead of a comorbidity, we see which patients have codes in each chapter of the ICD-9 defintion.

names(icd9_chapters)[c(1:5, 14)]
#> [1] "Infectious And Parasitic Diseases"                                    
#> [2] "Neoplasms"                                                            
#> [3] "Endocrine, Nutritional And Metabolic Diseases, And Immunity Disorders"
#> [4] "Diseases Of The Blood And Blood-Forming Organs"                       
#> [5] "Mental Disorders"                                                     
#> [6] "Congenital Anomalies"
my_map <- icd:::icd9_chapters_to_map(icd9_chapters[c(2, 5, 14)])
icd9_comorbid(patients_icd9, my_map) # no positive 
#>      Neoplasms Mental Disorders Congenital Anomalies
#> 1000      TRUE            FALSE                FALSE
#> 1001     FALSE            FALSE                FALSE
#> 1002     FALSE            FALSE                FALSE

Reduce comorbidity mapping from possible values to defined diagnostic codes.

Suppose we want to exact match only real ICD-9 codes when looking up comorbdities for some patients. E.g. if the coder accidentally omitted a trailing zero, e.g. code 003.20 (Localized salmonella infection, unspecified) might have been written as 003.2 which has a heading (Localized salmonella infections) but is not itself billable. Use of ICD-9 codes for comorbidities generally assumes the codes are either right or wrong. How do we match only real codes, for a strict interpretation of comorbidities? It’s one line or R code:

ahrq_strict <- lapply(icd9_map_ahrq, icd_get_defined)
str(icd9_map_ahrq[1:5]) # first five of the original:
#> List of 5
#>  $ CHF     :Classes 'icd9', 'character'  atomic [1:121] 39891 40201 40211 40291 ...
#>   .. ..- attr(*, "icd_short_diag")= logi TRUE
#>  $ Valvular:Classes 'icd9', 'character'  atomic [1:554] 0932 09320 09321 09322 ...
#>   .. ..- attr(*, "icd_short_diag")= logi TRUE
#>  $ PHTN    :Classes 'icd9', 'character'  atomic [1:133] 4151 41510 41511 41512 ...
#>   .. ..- attr(*, "icd_short_diag")= logi TRUE
#>  $ PVD     :Classes 'icd9', 'character'  atomic [1:598] 440 4400 44000 44001 ...
#>   .. ..- attr(*, "icd_short_diag")= logi TRUE
#>  $ HTN     :Classes 'icd9', 'character'  atomic [1:33] 4011 40110 40111 40112 ...
#>   .. ..- attr(*, "icd_short_diag")= logi TRUE
str(icd9_map_ahrq[1:5]) # and first five of the result:
#> List of 5
#>  $ CHF     :Classes 'icd9', 'character'  atomic [1:121] 39891 40201 40211 40291 ...
#>   .. ..- attr(*, "icd_short_diag")= logi TRUE
#>  $ Valvular:Classes 'icd9', 'character'  atomic [1:554] 0932 09320 09321 09322 ...
#>   .. ..- attr(*, "icd_short_diag")= logi TRUE
#>  $ PHTN    :Classes 'icd9', 'character'  atomic [1:133] 4151 41510 41511 41512 ...
#>   .. ..- attr(*, "icd_short_diag")= logi TRUE
#>  $ PVD     :Classes 'icd9', 'character'  atomic [1:598] 440 4400 44000 44001 ...
#>   .. ..- attr(*, "icd_short_diag")= logi TRUE
#>  $ HTN     :Classes 'icd9', 'character'  atomic [1:33] 4011 40110 40111 40112 ...
#>   .. ..- attr(*, "icd_short_diag")= logi TRUE

Note the much smaller numbers of codes in each group, now we have discarded all the ones which are not defined as diagnoses.

Which three-character ICD-9 codes have no child codes

The ICD-9-CM scheme is structured as follows: - Chapter - Sub-chapter - Major part (three-digit codes) - sub-division (first decimal place) - sub-sub-division (second decimal place)

For most combinations of zero to nine, nothing is defined. Sometimes, nodes at one level in the hierarchy are descriptive only of their children (branch nodes), whereas some are themselves billable. For this example, let’s find those numeric-only codes which have no children, and by implication are themselves directly billable codes. Here are the first ten:

icd9cm_hierarchy$code %>% icd_get_defined -> all_real
# select the non-V and non-E codes
three_digit_real <- all_real[icd9_is_n(all_real)]
# display
three_digit_df <- data.frame(code = three_digit_real, description = icd_explain(three_digit_real, condense = FALSE))
print(three_digit_df[1:10, ], row.names = FALSE)
#>  code                                  description
#>   001                                      Cholera
#>  0010               Typhoid and paratyphoid fevers
#>  0011                  Other salmonella infections
#>  0019                                  Shigellosis
#>   002             Other food poisoning (bacterial)
#>  0020                                    Amebiasis
#>  0021          Other protozoal intestinal diseases
#>  0022 Intestinal infections due to other organisms
#>  0023            Ill-defined intestinal infections
#>  0029                Primary tuberculous infection

Which ICD-9 codes have changed between versions of ICD-9?

new_since_27 <- setdiff(icd9cm_billable[["32"]][["code"]],
                         icd9cm_billable[["27"]][["code"]]) %>% head
lost_since_27 <- setdiff(icd9cm_billable[["27"]][["code"]],
                         icd9cm_billable[["32"]][["code"]]) %>% tail
# we know this is an ICD-9-CM code, so declare this using nice magrittr motif:
lost_since_27 %<>% as.icd9cm
lost_since_27 %<>% as.icd9cm

# these are a few which were gained since v27
data.frame(code = new_since_27, desc = new_since_27 %>% icd_explain)
#>    code                                          desc
#> 1 04141                     Escherichia coli [E.coli]
#> 2 04142 Unspecified malignant neoplasm of skin of lip
#> 3 04143           Basal cell carcinoma of skin of lip
#> 4 04149                     Escherichia coli [E.coli]
#> 5 17300 Unspecified malignant neoplasm of skin of lip
#> 6 17301           Basal cell carcinoma of skin of lip
# these are a few which were lost since v27
data.frame(code = lost_since_27, desc = lost_since_27 %>% icd_explain)
#>   code
#> 1 V122
#> 2 V138
#> 3 V191
#> 4 V251
#> 5 V403
#> 6 V854
#>                                                                      desc
#> 1                            Endocrine, metabolic, and immunity disorders
#> 2                                                Other specified diseases
#> 3                                                     Other eye disorders
#> 4 Encounter for insertion or removal of intrauterine contraceptive device
#> 5                                               Other behavioral problems
#> 6                                      Body Mass Index 40 and over, adult

Conclusion

This package allows fluid, fast and accurate manipulation of ICD-9 and ICD-10 codes, especially when combined with magrittr. Suggestions, contributions and comments are welcome via github.