The rppo package contains just two functions. One to query terms from the Plant Phenology Ontology (PPO) and another to query the data global plant phenology data portal (PPO data portal). Following are three examples which illustrate use of these functions: the first two sections illustrate the ppo_data and ppo_terms functions and the third section illustrates how to use the functions together.

ppo_terms function

It is frequently useful to look through the list of present and absent terms contained in the PPO. The ppo_terms function returns present terms, absent terms, or both, with columns containing a termID, label, definition and full URI for each term. Use the termIDs returned from this function to query terms in the ppo_data function. The following example returns the present terms into a “present_terms” data frame and a sample slice from the dataframe.

present_terms <- ppo_terms(present = TRUE)
# print the first five rows, with just the termIDs and labels
print(present_terms[1:5,c("termID","label")])
#>            termID                            label
#> 1 obo:PPO_0002359  abscised cones or seeds present
#> 2 obo:PPO_0002358 abscised fruits or seeds present
#> 3 obo:PPO_0002357          abscised leaves present
#> 4 obo:PPO_0002311       breaking leaf buds present
#> 5 obo:PPO_0002346                    cones present

ppo_data function

The ppo_data function queries the PPO Data Portal, passing values to the database and extracting matching results. The results of the ppo_data function are returned as a list with five elements: 1) a data frame containing data, 2) a readme string containing usage information and some statistics about the query itself, 3) a citation string containing information about proper citation, 4) a number_possible integer indicating the total number of results if a limit has been specified, and 5) a status code returned from the service.

The “df” variable below is populated with results from the data element in the results list, with an example slice of data showing the first record.

results <- ppo_data(genus = "Quercus", fromYear = 2013, toYear = 2013, fromDay = 100, toDay = 110, termID = 'obo:PPO_0002313', limit = 10)
df <- results$data
print(df[1:1,])
#>   dayOfYear year   genus specificEpithet latitude longitude
#> 1       106 2013 Quercus          lobata 34.67545 -120.0407
#>                                                                                                                                                                                                                                            termID
#> 1 obo:PPO_0002316,obo:PPO_0002322,obo:PPO_0002018,obo:PPO_0002318,obo:PPO_0002022,obo:PPO_0002312,obo:PPO_0002313,obo:PPO_0002014,obo:PPO_0002024,obo:PPO_0002015,obo:PPO_0002000,obo:PPO_0002017,obo:PPO_0002020,obo:PPO_0002320,obo:PPO_0002315
#>    source                               eventId
#> 1 USA-NPN http://n2t.net/ark:/21547/Amg22054478

The readme and citation files returned by the list of results can be accessed by calling the readme and citation elements. Note that the the file “citation_and_data_use_policies.txt” that is referred to in the readme file can be accessed using cat(results$citation)

cat(results$readme)
#> The following contains information about your download from the Global Plant 
#> Phenology Database.  Please refer to the citation_and_data_use_policies.txt 
#> file for important information about data usage policies, licensing, and 
#> citation protocols for each dataset.  This file contains summary information 
#> about the query that was run.  
#> 
#> data file = data.csv
#> date query ran = Tue Jun 05 2018 19:44:07 GMT-0400 (EDT)
#> query = +genus:Quercus AND +plantStructurePresenceTypes:"http://purl.obolibrary.org/obo/PPO_0002313" AND +year:>=2013 AND +year:<=2013 AND +dayOfYear:>=100 AND +dayOfYear:<=110 AND source:USA-NPN,NEON
#> fields returned = dayOfYear,year,genus,specificEpithet,latitude,longitude,source,eventId
#> user specified limit = 10
#> total results possible = 518
#> total results returned = 0

The results lists also shows the number of possible results in the results set, which is useful if the submitted query had a limit. For example, in the query above, the limit is set to 10 but we want to know how many records were possible if the limit was not set.

cat(results$number_possible)
#> 518

working with terms and data together

Here we will generate a data frame showing the frequency of “present” and “absent” terms for a particular query. The query is for genus = “Quercus” and latitude > 47. For each row in the returned data frame ppo_data will typically return multiple terms in the termID field, corresponding to phenological stages as defined by the PPO. For our example, we will generate a frequency table of the number of times “present” or “absent” term occur in the entire returned dataset. Note that the termID field returned by ppo_data will return “presence” terms in addition to “present” and “absent” terms, while the ppo_terms function only returns “present” and “absent” terms. Thus, our frequency distribution only counts the number of “present” and “absent” terms [For an in-depth discussion of the difference between “presence”, “present”, and “absent”, see https://www.frontiersin.org/articles/10.3389/fpls.2018.00517/full]. Finally, since termIDs are returned as URI identifiers and not easily readable text, this example maps termIDs to labels. The resulting data frame shows two columns: 1) a column of term labels, and 2) a frequency of the number of times this label appeared in the result set.

###############################################################################
# Generate a frequency data frame showing the number of times each termID
# is populated for genus equals "Quercus" above latitude of 47
# Note that all latitude/longitude queries need to be in the format of a
# bounding box
###############################################################################
df <- ppo_data(
  genus = "Quercus", 
  bbox="47,-180,90,180")
#> sending request for data ...
#> https://www.plantphenology.org/api/v2/download/?q=%2Bgenus:Quercus+AND+%2Blatitude:>=47+AND+%2Blatitude:<=90+AND+%2Blongitude:>=-180+AND+%2Blongitude:<=180+AND+source:USA-NPN,NEON&source=latitude,longitude,year,dayOfYear,termID
# return just the termID column
t1 <- df$data[,c('termID')]
# paste each cell into one string
t2<-paste(t1, collapse = ",")
# split strings at ,
t3<-strsplit(t2, ",")
# create a frequency table as a data frame
freqFrame <- as.data.frame(table(t3))

# create a new data frame that we want to populate
resultFrame <- data.frame(
  label = character(), 
  frequency = integer(), 
  stringsAsFactors = FALSE)

###############################################################################
# Replace termIDs with labels in frequency frame
###############################################################################
# fetch "present" and "absent" terms using `ppo_terms`
termList <- ppo_terms(absent = TRUE, present = TRUE);
#> sending request for terms ...

# loop all "present"" and "absent" terms
for (term in 1:nrow(termList)) {
  termListTermID<-termList[term,'termID'];
  termListLabel<-termList[term,'label'];
  # loop all rows that have a frequency generated
  for (row in 1:nrow(freqFrame)) {
    freqFrameTermID = freqFrame[row,'t3']
    freqFrameFrequency = freqFrame[row,'Freq']
    # Populate resultFrame with matching "present" or "absent" labels.
    # In this step, we will ignore "presence" terms
    # found in the frequency frame since the ppo_terms only returns
    # "present" and "absent" terms. 
    if (freqFrameTermID == termListTermID) {
      resultFrame[nrow(resultFrame)+1,] <- c(termListLabel,freqFrameFrequency)
    }
  }
}

# print results, showing term labels and a frequency count
print(resultFrame)
#>                                                 label frequency
#> 1                      abscised cones or seeds absent       365
#> 2                     abscised fruits or seeds absent       365
#> 3                              abscised leaves absent       365
#> 4                             abscised leaves present         4
#> 5                           breaking leaf buds absent       159
#> 6                          breaking leaf buds present        32
#> 7                expanded immature true leaves absent       159
#> 8                       expanding true leaves present        54
#> 9               expanding unfolded true leaves absent       159
#> 10             expanding unfolded true leaves present        22
#> 11                          floral structures present        16
#> 12                                    flowers present         7
#> 13                                     fruits present        12
#> 14               immature unfolded true leaves absent       159
#> 15              immature unfolded true leaves present        22
#> 16                                  leaf buds present        32
#> 17                          mature true leaves absent       159
#> 18 new above-ground shoot-borne shoot systems present        32
#> 19                           new shoot system present        32
#> 20                      non-dormant leaf buds present        32
#> 21              non-senesced floral structures absent       175
#> 22             non-senesced floral structures present         9
#> 23                       non-senesced flowers present         7
#> 24          non-senescing unfolded true leaves absent       159
#> 25         non-senescing unfolded true leaves present        22
#> 26                      open floral structures absent       175
#> 27                           open flower heads absent       181
#> 28                                open flowers absent       181
#> 29                               open flowers present         7
#> 30          pollen-releasing floral structures absent       533
#> 31               pollen-releasing flower heads absent       358
#> 32                    pollen-releasing flowers absent       358
#> 33                   pollen-releasing flowers present         4
#> 34                    reproductive structures present        28
#> 35                                 ripe fruits absent       362
#> 36                             ripening fruits absent       176
#> 37                            ripening fruits present        12
#> 38                       senescing true leaves absent       342
#> 39                      senescing true leaves present         6
#> 40                                true leaves present       101
#> 41                        unfolded true leaves absent       159
#> 42                       unfolded true leaves present        69
#> 43                      unfolding true leaves present        32
#> 44                  unopened floral structures absent       175
#> 45                               unripe fruits absent       176
#> 46                            vascular leaves present       101