2. Data Search and Discovery

2021-07-26

Searching for data within Dataverse is quite easy using the dataverse_search() function. The simplest searches simply consist of a query string:

library("dataverse")
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
dataverse_search("Gary King")[c("name")]
## 10 of 1519 results retrieved
##                                                                                                                   name
## 1                                                                            004_informal_food_retail_Nigeria_2018.tab
## 2                                                                                00698McArthur-King-BoxCoverSheets.pdf
## 3                                                                               00698McArthur-King-MemoOfAgreement.pdf
## 4                                                                              00698McArthur-King-StudyDescription.pdf
## 5  01 ReadMe Unlocking history through automated virtual unfolding of sealed documents imaged by X-ray microtomography
## 6                                     03 Brienne Collection letterlocking data: Images folder 02/16, DB-0874_2–DB-0903
## 7                                    03 Brienne Collection letterlocking data: Images folder 04/16, DB-0988–DB-1109_03
## 8                                 03 Brienne Collection letterlocking data: Images folder 06/16, DB-1241_02–DB-1339_06
## 9                                 03 Brienne Collection letterlocking data: Images folder 08/16, DB-1455_02–DB-1564_01
## 10                                   03 Brienne Collection letterlocking data: Images folder 12/16, DB-1868–DB-1963_03

The results are paginated, so users can rely upon the per_page and start argument to requested subsequent pages of results. We’ll start at 6 and to show that we retrieve the last five results from the previous query plus 15 more (due to per_page = 20):

dataverse_search("Gary King", start = 6, per_page = 20)[c("name")]
## 20 of 1519 results retrieved
##                                                                                                                           name
## 1                                            03 Brienne Collection letterlocking data: Images folder 04/16, DB-0988–DB-1109_03
## 2                                         03 Brienne Collection letterlocking data: Images folder 06/16, DB-1241_02–DB-1339_06
## 3                                         03 Brienne Collection letterlocking data: Images folder 08/16, DB-1455_02–DB-1564_01
## 4                                            03 Brienne Collection letterlocking data: Images folder 12/16, DB-1868–DB-1963_03
## 5                                            03 Brienne Collection letterlocking data: Images folder 14/16, DB-2064_01–2155_03
## 6                                                                                07 Letterlocking Categories and Formats Chart
## 7                                                                                                             077_mod1_s2m.tab
## 8  10 Foldable: Launch Little Book of Locks (UH6089), with Categories and Formats Chart. Letterlocking Instructional Resources
## 9                                                                                       10 Million International Dyadic Events
## 10                                                                     12070002_Wolfville T and Kings Subd D SC 2016-92640.pdf
## 11                                                                                     12070005-Kings Subd C SC 2016-92640.pdf
## 12                                                                           1479 data points of covid19 policy response times
## 13                                                             1998 Jewish Community Study of the Coachella Valley, California
## 14                                                                                               2002 State Legislative Survey
## 15                                                                          2007 White Sands Dune Field lidar topographic data
## 16                                                                          2008 White Sands Dune Field lidar topographic data
## 17                                                                                                         2012 STATA Data.tab
## 18                                                                                                                2012pres.tab
## 19                                                                                                          2014 SPSS Data.tab
## 20                                                                                                         2014 STATA Data.tab

More complicated searches can specify metadata fields like title and restrict results to a specific type of Dataverse object (a “dataverse”, “dataset”, or “file”):

ei <- dataverse_search(author = "Gary King", title = "Ecological Inference", type = "dataset", per_page = 20)
## 20 of 1367 results retrieved
# fields returned
names(ei)
# names of datasets
ei$name
##  [1] "name"                    "type"                    "url"                     "global_id"              
##  [5] "description"             "published_at"            "publisher"               "citationHtml"           
##  [9] "identifier_of_dataverse" "name_of_dataverse"       "citation"                "storageIdentifier"      
## [13] "keywords"                "subjects"                "fileCount"               "versionId"              
## [17] "versionState"            "majorVersion"            "minorVersion"            "createdAt"              
## [21] "updatedAt"               "contacts"                "authors"                 "publications"           
## [25] "geographicCoverage"     
##  [1] "01 ReadMe Unlocking history through automated virtual unfolding of sealed documents imaged by X-ray microtomography"        
##  [2] "03 Brienne Collection letterlocking data: Images folder 02/16, DB-0874_2–DB-0903"                                           
##  [3] "03 Brienne Collection letterlocking data: Images folder 04/16, DB-0988–DB-1109_03"                                          
##  [4] "03 Brienne Collection letterlocking data: Images folder 06/16, DB-1241_02–DB-1339_06"                                       
##  [5] "03 Brienne Collection letterlocking data: Images folder 08/16, DB-1455_02–DB-1564_01"                                       
##  [6] "03 Brienne Collection letterlocking data: Images folder 12/16, DB-1868–DB-1963_03"                                          
##  [7] "03 Brienne Collection letterlocking data: Images folder 14/16, DB-2064_01–2155_03"                                          
##  [8] "07 Letterlocking Categories and Formats Chart"                                                                              
##  [9] "10 Foldable: Launch Little Book of Locks (UH6089), with Categories and Formats Chart. Letterlocking Instructional Resources"
## [10] "10 Million International Dyadic Events"                                                                                     
## [11] "1479 data points of covid19 policy response times"                                                                          
## [12] "2016 Census of Population: ADA and DA Maps for Kings County Nova Scotia"                                                    
## [13] "3D Dust map from Green et al. (2015)"                                                                                       
## [14] "3D dust map from Green et al. (2017)"                                                                                       
## [15] "3D dust map from Green et al. (2019)"                                                                                       
## [16] "A 1D Lyman-alpha Profile Camera for Plasma Edge Neutral Studies  on the DIII-D Tokamak"                                     
## [17] "A Comparative Analysis of Brazil's Foreign Policy Drivers Towards the USA: Comment on Amorim Neto (2011)"                   
## [18] "A Critique of Dyadic Design"                                                                                                
## [19] "A Framework to Quantify the Signs of Abandonment in Online Digital Humanities Projects"                                     
## [20] "A Lexicial Index of Electoral Democracy"

Once datasets and files are identified, it is easy to download and use them directly in R. See the “Data Download” vignette for details.