Introducing europepmc, an R interface to Europe PMC RESTful API

Najko Jahn

2017-12-11

What is searched?

Europe PMC is a repository of life science literature. Europe PMC ingests all PubMed content and extends its index with other sources, including Agricola, a bibliographic database of citations to the agricultural literature, or Biological Patents.

Index coverage

Index coverage

For more background on Europe PMC, see:

https://europepmc.org/About

Europe PMC: a full-text literature database for the life sciences and platform for innovation. (2014). Nucleic Acids Research, 43(D1), D1042–D1048. http://doi.org/10.1093/nar/gku1061

How to search Europe PMC with R?

This client supports the Europe PMC search syntax. If you are unfamiliar with searching Europe PMC, check out the Europe PMC query builder, a very nice tool that helps you to create your queries. To make use of your Europe PMC queries in R, simply copy & paste the search string to the search functions of this package.

In the following, some examples how to search Europe PMC are presented.

Managing search results

By default, 100 records are returned, but the number of results can be expanded or limited with the limit parameter.

europepmc::epmc_search('"Human malaria parasites"', limit = 10)
#> # A tibble: 10 x 27
#>            id source     pmid                           doi
#>         <chr>  <chr>    <chr>                         <chr>
#>  1   29109165    MED 29109165          10.1128/aac.01161-17
#>  2   28902970    MED 28902970             10.1111/cmi.12789
#>  3   27894375    MED 27894375     10.1017/s0031182016002110
#>  4   28900620    MED 28900620          10.1155/2017/2847548
#>  5   28525963    MED 28525963 10.1080/14760584.2017.1333426
#>  6   27748213    MED 27748213                          <NA>
#>  7 PMC5576395    PMC     <NA>                          <NA>
#>  8   27381764    MED 27381764  10.1016/j.ijpara.2016.05.008
#>  9   28531172    MED 28531172  10.1371/journal.pone.0177304
#> 10   27667688    MED 27667688     10.1016/j.dci.2016.09.012
#> # ... with 23 more variables: title <chr>, authorString <chr>,
#> #   journalTitle <chr>, pubYear <chr>, journalIssn <chr>, pubType <chr>,
#> #   isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> #   hasBook <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>,
#> #   hasLabsLinks <chr>, hasTMAccessionNumbers <chr>,
#> #   firstPublicationDate <chr>, issue <chr>, journalVolume <chr>,
#> #   pageInfo <chr>, pmcid <chr>, hasSuppl <chr>

Results are sorted by relevance. Other options via the sort parameter are

Loop over queries

Sometimes, you would like to send more than one search to Europe PMC at once. A simple solution is using plyr::ldply():

my_dois <- c(
  "10.1159/000479962",
  "10.1002/sctm.17-0081",
  "10.1161/strokeaha.117.018077",
  "10.1007/s12017-017-8447-9"
  )
  plyr::ldply(my_dois, function(x) {
  europepmc::epmc_search(paste0("DOI:", x))
  })
#>         id source     pmid                          doi
#> 1 28957815    MED 28957815            10.1159/000479962
#> 2 28941317    MED 28941317         10.1002/sctm.17-0081
#> 3 29018132    MED 29018132 10.1161/strokeaha.117.018077
#> 4 28623611    MED 28623611    10.1007/s12017-017-8447-9
#>                                                                                                                                 title
#> 1 Clinical Relevance of Patent Foramen Ovale and Atrial Septum Aneurysm in Stroke: Findings of a Single-Center Cross-Sectional Study.
#> 2                                 Concise Review: Extracellular Vesicles Overcoming Limitations of Cell Therapies in Ischemic Stroke.
#> 3                                                 One-Stop Management of Acute Stroke Patients: Minimizing Door-to-Reperfusion Times.
#> 4              Deferiprone Rescues Behavioral Deficits Induced by Mild Iron Exposure in a Mouse Model of Alpha-Synuclein Aggregation.
#>                                                                                                    authorString
#> 1                                 Schnieder M, Siddiqui T, Karch A, Bähr M, Hasenfuss G, Liman J, Schroeter MR.
#> 2                                                                    Doeppner TR, Bähr M, Hermann DM, Giebel B.
#> 3 Psychogios MN, Behme D, Schregel K, Tsogkas I, Maier IL, Leyhe JR, Zapf A, Tran J, Bähr M, Liman J, Knauth M.
#> 4                                     Carboni E, Tatenhorst L, Tönges L, Barski E, Dambeck V, Bähr M, Lingor P.
#>            journalTitle issue journalVolume pubYear            journalIssn
#> 1            Eur Neurol   5-6            78    2017 0014-3022; 1421-9913; 
#> 2 Stem Cells Transl Med    11             6    2017 2157-6564; 2157-6580; 
#> 3                Stroke    11            48    2017 0039-2499; 1524-4628; 
#> 4    Neuromolecular Med   2-3            19    2017 1535-1084; 1559-1174; 
#>    pageInfo                             pubType isOpenAccess inEPMC inPMC
#> 1   264-269                     journal article            N      N     N
#> 2 2044-2052           review; journal article;             N      N     N
#> 3 3152-3155   clinical trial; journal article;             N      N     N
#> 4   309-321 research-article; journal article;             Y      Y     N
#>   hasPDF hasBook citedByCount hasReferences hasTextMinedTerms
#> 1      N       N            0             N                 N
#> 2      N       N            0             N                 N
#> 3      N       N            0             N                 N
#> 4      Y       N            0             Y                 Y
#>   hasDbCrossReferences hasLabsLinks hasTMAccessionNumbers
#> 1                    N            Y                     N
#> 2                    N            Y                     N
#> 3                    N            Y                     N
#> 4                    N            Y                     Y
#>   firstPublicationDate      pmcid hasSuppl
#> 1           2017-09-28       <NA>     <NA>
#> 2           2017-09-23       <NA>     <NA>
#> 3           2017-10-10       <NA>     <NA>
#> 4           2017-06-16 PMC5570801        Y

Output options

By default, a non-nested data frame printed as tibble is returned. Other formats are output = "id_list"" returning a list of IDs and sources, and output = “‘raw’”" to get full metadata as list. Please be aware that these lists can become very large.

More advanced options to search Europe PMC

Text-mined terms

Europe PMC parses article metadata for various concepts and terms.

Semantic types Description/Examples
accession A unique identifier given to a DNA or protein sequence record
chemical e.g. Granzymes, Peptides, Hydrogen
disease e.g. dysthymias, gid, icterohemorrhagic
efo Experimental Factor Ontology e.g. generation, health, mortality rate, scale, findings, genome etc.
gene_protein e.g. atp, cl-43, ecoriir, gng11, ipt1, mlks
go_term A Gene Ontology (GO) term e.g. annealing, neuroblasts
organism e.g. pneumocystidomycetes, sarus, terebratulide

Here’s how to search for publications about meningitis:

europepmc::epmc_search('disease:meningitis')
#> # A tibble: 100 x 27
#>            id source     pmid      pmcid                           doi
#>         <chr>  <chr>    <chr>      <chr>                         <chr>
#>  1   29095907    MED 29095907 PMC5667755  10.1371/journal.pone.0187466
#>  2   29084241    MED 29084241 PMC5662171  10.1371/journal.pone.0186985
#>  3   29038446    MED 29038446 PMC5643306    10.1038/s41598-017-13605-8
#>  4 PMC5631614    PMC     <NA> PMC5631614                          <NA>
#>  5   29207725    MED 29207725 PMC5713747 10.7860/jcdr/2017/28114.10532
#>  6   29057217    MED 29057217 PMC5635059      10.3389/fcimb.2017.00436
#>  7   29148389    MED 29148389 PMC5708259        10.3201/eid2312.171107
#>  8   29051603    MED 29051603 PMC5648924    10.1038/s41598-017-13234-1
#>  9 PMC5632230    PMC     <NA> PMC5632230                          <NA>
#> 10 PMC5631130    PMC     <NA> PMC5631130                          <NA>
#> # ... with 90 more rows, and 22 more variables: title <chr>,
#> #   authorString <chr>, journalTitle <chr>, issue <chr>,
#> #   journalVolume <chr>, pubYear <chr>, journalIssn <chr>, pageInfo <chr>,
#> #   pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>,
#> #   hasPDF <chr>, hasBook <chr>, hasSuppl <chr>, citedByCount <int>,
#> #   hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstPublicationDate <chr>

To see, which other terms were text-mined on the article level, use the europepmc::epmc_tm() function.

Data integrations

Another nice feature of Europe PMC is to search for cross-references between Europe PMC to other databases. For instance, to get publications cited by entries in the Protein Data bank in Europe published 2016:

europepmc::epmc_search('(HAS_PDB:y) AND FIRST_PDATE:2016')
#> # A tibble: 100 x 27
#>          id source     pmid                           doi
#>       <chr>  <chr>    <chr>                         <chr>
#>  1 28089452    MED 28089452     10.1016/j.str.2016.12.006
#>  2 28089448    MED 28089448     10.1016/j.str.2016.12.005
#>  3 28065506    MED 28065506     10.1016/j.str.2016.12.001
#>  4 28039433    MED 28039433       10.1073/pnas.1611577114
#>  5 28036383    MED 28036383  10.1371/journal.pone.0168832
#>  6 28039325    MED 28039325           10.1093/nar/gkw1310
#>  7 28035004    MED 28035004       10.1074/jbc.m116.749713
#>  8 28034958    MED 28034958           10.1093/nar/gkw1307
#>  9 28034013    MED 28034013 10.1080/07391102.2016.1278038
#> 10 28031486    MED 28031486       10.1073/pnas.1616198114
#> # ... with 90 more rows, and 23 more variables: title <chr>,
#> #   authorString <chr>, journalTitle <chr>, issue <chr>,
#> #   journalVolume <chr>, pubYear <chr>, journalIssn <chr>, pageInfo <chr>,
#> #   pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>,
#> #   hasPDF <chr>, hasBook <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>,
#> #   hasLabsLinks <chr>, hasTMAccessionNumbers <chr>,
#> #   firstPublicationDate <chr>, pmcid <chr>, hasSuppl <chr>

The following sources are supported

To retrieve metadata about these external database links, use europepmc_epmc_db().

Citations and reference sections

Europe PMC let us also obtain citation metadata and reference sections. For retrieving citation metadata per article, use

europepmc::epmc_citations("9338777", limit = 500)
#> # A tibble: 211 x 12
#>          id source
#>       <chr>  <chr>
#>  1 10221475    MED
#>  2 10342317    MED
#>  3 10440384    MED
#>  4  9696842    MED
#>  5  9703304    MED
#>  6  9728974    MED
#>  7  9728985    MED
#>  8  9728986    MED
#>  9  9728987    MED
#> 10  9756815    MED
#> # ... with 201 more rows, and 10 more variables: citationType <chr>,
#> #   title <chr>, authorString <chr>, journalAbbreviation <chr>,
#> #   pubYear <int>, volume <chr>, issue <chr>, pageInfo <chr>,
#> #   citedByCount <int>, text <chr>

For reference section from an article:

europepmc::epmc_refs("28632490", limit = 200)
#> # A tibble: 169 x 19
#>          id source    citationType
#>       <chr>  <chr>           <chr>
#>  1 12002480    MED JOURNAL ARTICLE
#>  2 18795164    MED JOURNAL ARTICLE
#>  3 18556606    MED JOURNAL ARTICLE
#>  4 17683018    MED JOURNAL ARTICLE
#>  5 15273108    MED JOURNAL ARTICLE
#>  6 18207219    MED JOURNAL ARTICLE
#>  7 17007908    MED JOURNAL ARTICLE
#>  8 26948762    MED JOURNAL ARTICLE
#>  9 23192912    MED JOURNAL ARTICLE
#> 10 25837385    MED JOURNAL ARTICLE
#> # ... with 159 more rows, and 16 more variables: title <chr>,
#> #   authorString <chr>, journalAbbreviation <chr>, issue <chr>,
#> #   pubYear <int>, volume <chr>, pageInfo <chr>, citedOrder <int>,
#> #   match <chr>, essn <chr>, issn <chr>, publicationTitle <chr>,
#> #   publisherLoc <chr>, publisherName <chr>, externalLink <chr>, doi <chr>

Fulltext access

Europe PMC gives not only access to metadata, but also to full-texts. Adding AND (OPEN_ACCESS:y) to your search query, returns only those articles where Europe PMC has also the fulltext.

Fulltext as xml can accessed via the PubMed Central ID (PMCID):

europepmc::epmc_ftxt("PMC3257301")
#> {xml_document}
#> <article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
#> [1] <front>\n  <journal-meta>\n    <journal-id journal-id-type="nlm-ta"> ...
#> [2] <body>\n  <sec id="s1">\n    <title>Introduction</title>\n    <p>Atm ...
#> [3] <back>\n  <ack>\n    <p>We would like to thank Dr. C. Gourlay and Dr ...