Google Polylines

D Cooley

2018-05-25

The goal of googlePolylines is to encode and decode coordinates using Google’s polyline encoding algorithm

Polyline encoding is a lossy compression algorithm that allows you to store a series of coordinates as a single string.

Encoded polylines are used by Google Maps to draw lines and polygons, and are therefore supported in the googleway package.

I am intending to update googleway to support plotting sf objects using these encoded polylines.

Caution

The word lossy is important to keep in mind, as the encoding process could reduce precision of your data. If you are after highly accurate coordinates this process probably isn’t for you.

However, if you need to reduce the size of spatial objects/data, and want quicker plots (see the Benchmarking section), then this could help.

Encoding

Encoding is split across two functions

vectors

Given two vectors of longitude and latitude coordinates:

## [1] "xgweFcsysZToA?g@f@oAPsD?oApFf@|Hz@`DmAvViJf@jC"

data.frame

The encode() function will attempt to find the lon & lat columns inside a data.frame using regex matching. However, you can also specify the columns of coordinates:

## [1] "xgweFcsysZToA?g@f@oAPsD?oApFf@|Hz@`DmAvViJf@jC"

sf

encode() will currently work on sf objects with geometry types

It will not work on GEOMETRYCOLLECTION objects.

## Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
## Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.5/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
## Simple feature collection with 100 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs
## Simple feature collection with 5 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -81.74107 ymin: 36.07282 xmax: -75.77316 ymax: 36.58965
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs
##    AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74
## 1 0.114     1.442  1825    1825        Ashe 37009  37009        5  1091
## 2 0.061     1.231  1827    1827   Alleghany 37005  37005        3   487
## 3 0.143     1.630  1828    1828       Surry 37171  37171       86  3188
## 4 0.070     2.968  1831    1831   Currituck 37053  37053       27   508
## 5 0.153     2.206  1832    1832 Northampton 37131  37131       66  1421
##   SID74 NWBIR74 BIR79 SID79 NWBIR79                       geometry
## 1     1      10  1364     0      19 MULTIPOLYGON (((-81.47276 3...
## 2     0      10   542     3      12 MULTIPOLYGON (((-81.23989 3...
## 3     5     208  3616     6     260 MULTIPOLYGON (((-80.45634 3...
## 4     1     123   830     2     145 MULTIPOLYGON (((-76.00897 3...
## 5     9    1066  1606     3    1197 MULTIPOLYGON (((-77.21767 3...

When used on an sf object, an sfencoded object is returned, with the encoded polylines replacing the sf::st_geometry column.

## Classes 'sfencoded' and 'data.frame':    100 obs. of  15 variables:
##  $ AREA     : num  0.114 0.061 0.143 0.07 0.153 0.097 0.062 0.091 0.118 0.124 ...
##  $ PERIMETER: num  1.44 1.23 1.63 2.97 2.21 ...
##  $ CNTY_    : num  1825 1827 1828 1831 1832 ...
##  $ CNTY_ID  : num  1825 1827 1828 1831 1832 ...
##  $ NAME     : Factor w/ 100 levels "Alamance","Alexander",..: 5 3 86 27 66 46 15 37 93 85 ...
##  $ FIPS     : Factor w/ 100 levels "37001","37003",..: 5 3 86 27 66 46 15 37 93 85 ...
##  $ FIPSNO   : num  37009 37005 37171 37053 37131 ...
##  $ CRESS_ID : int  5 3 86 27 66 46 15 37 93 85 ...
##  $ BIR74    : num  1091 487 3188 508 1421 ...
##  $ SID74    : num  1 0 5 1 9 7 0 0 4 1 ...
##  $ NWBIR74  : num  10 10 208 123 1066 ...
##  $ BIR79    : num  1364 542 3616 830 1606 ...
##  $ SID79    : num  0 3 6 2 3 5 2 2 2 5 ...
##  $ NWBIR79  : num  19 12 260 145 1197 ...
##  $ geometry :encoded_column of length 100; first element: List of 1
##   ..$ : chr "u_d|EtsgpNmmFphLyEbcCibLf{Lk~H`bT}rNmjGihHd[kvL_lEzgBkl~@lyE`MnvCimCbmEqfAxnGieH~gEeTd_DmiCxvA_D|oAdwCidAtsKr_A"| __truncated__
##   .. ..- attr(*, "sfc")= chr  "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "encoded_column")= chr "geometry"
##  - attr(*, "sfAttributes")=List of 7
##   ..$ type: chr "MULTIPOLYGON"
##   ..$ dim : chr "XY"
##   ..$ bbox: 'bbox' Named num  -84.3 33.9 -75.5 36.6
##   .. ..- attr(*, "names")= chr  "xmin" "ymin" "xmax" "ymax"
##   ..$ epsg: int 4267
##   ..$ proj: chr "+proj=longlat +datum=NAD27 +no_defs"
##   ..$ prec: num 0
##   ..$ n_em: int 0
## [1] "geometry"
## [[1]]
## [1] "u_d|EtsgpNmmFphLyEbcCibLf{Lk~H`bT}rNmjGihHd[kvL_lEzgBkl~@lyE`MnvCimCbmEqfAxnGieH~gEeTd_DmiCxvA_D|oAdwCidAtsKr_AjiEdwA~YztFggAjbCfrAwf@hrF~mBdd@bh@zsB`cCpgCt_@d{B"
## attr(,"sfc")
## [1] "XY"           "MULTIPOLYGON" "sfg"         
## 
## attr(,"class")
## [1] "list"

As you can see, the geometry attributes are kept on the encoded object. However, you can remove them by specifying strip = TRUE

## Classes 'sfencodedLite' and 'data.frame':    100 obs. of  15 variables:
##  $ AREA     : num  0.114 0.061 0.143 0.07 0.153 0.097 0.062 0.091 0.118 0.124 ...
##  $ PERIMETER: num  1.44 1.23 1.63 2.97 2.21 ...
##  $ CNTY_    : num  1825 1827 1828 1831 1832 ...
##  $ CNTY_ID  : num  1825 1827 1828 1831 1832 ...
##  $ NAME     : Factor w/ 100 levels "Alamance","Alexander",..: 5 3 86 27 66 46 15 37 93 85 ...
##  $ FIPS     : Factor w/ 100 levels "37001","37003",..: 5 3 86 27 66 46 15 37 93 85 ...
##  $ FIPSNO   : num  37009 37005 37171 37053 37131 ...
##  $ CRESS_ID : int  5 3 86 27 66 46 15 37 93 85 ...
##  $ BIR74    : num  1091 487 3188 508 1421 ...
##  $ SID74    : num  1 0 5 1 9 7 0 0 4 1 ...
##  $ NWBIR74  : num  10 10 208 123 1066 ...
##  $ BIR79    : num  1364 542 3616 830 1606 ...
##  $ SID79    : num  0 3 6 2 3 5 2 2 2 5 ...
##  $ NWBIR79  : num  19 12 260 145 1197 ...
##  $ geometry :encoded_column of length 100; first element: List of 1
##   ..$ : chr "u_d|EtsgpNmmFphLyEbcCibLf{Lk~H`bT}rNmjGihHd[kvL_lEzgBkl~@lyE`MnvCimCbmEqfAxnGieH~gEeTd_DmiCxvA_D|oAdwCidAtsKr_A"| __truncated__
##  - attr(*, "encoded_column")= chr "geometry"
## [[1]]
## [1] "u_d|EtsgpNmmFphLyEbcCibLf{Lk~H`bT}rNmjGihHd[kvL_lEzgBkl~@lyE`MnvCimCbmEqfAxnGieH~gEeTd_DmiCxvA_D|oAdwCidAtsKr_AjiEdwA~YztFggAjbCfrAwf@hrF~mBdd@bh@zsB`cCpgCt_@d{B"
## 
## attr(,"class")
## [1] "list"

The benefit of stripping the attributes is to reduce the size of the object, which can be useful for web plotting if bandwidth/data transfer speeds are an issue.

##         nc        enc    encLite 
## "140.9 Kb"  "92.1 Kb"  "54.1 Kb"

Well-known Text

The two functions polyline_wkt and wkt_polylne can be used to convert sfencoded objects to and from well-known text.

wkt <- polyline_wkt(enc)
wkt[1, ]
##    AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
## 1 0.114     1.442  1825    1825 Ashe 37009  37009        5  1091     1
##   NWBIR74 BIR79 SID79 NWBIR79                          geometry
## 1      10  1364     0      19 MULTIPOLYGON (((-81.4727 36.23...
enc2 <- wkt_polyline(wkt)
enc2[1, ]
##    AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
## 1 0.114     1.442  1825    1825 Ashe 37009  37009        5  1091     1
##   NWBIR74 BIR79 SID79 NWBIR79                              geometry
## 1      10  1364     0      19 MULTIPOLYGON: i_d|EjsgpNymFrhL{Enc...

I’ve provided these functions to enable the conversion of encoded polylines into other geometry formats, should they be required by other packages.

Converting to sf

sf can read well-known text, so you can convert the wkt object back to sf / sfc objects

## Geometry set for 100 features 
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.3238 ymin: 33.882 xmax: -75.457 ymax: 36.5896
## epsg (SRID):    NA
## proj4string:    NA
## First 5 geometries:
## MULTIPOLYGON (((-81.4727 36.2343, -81.5408 36.2...
## MULTIPOLYGON (((-81.2399 36.3654, -81.2407 36.3...
## MULTIPOLYGON (((-80.4563 36.2425, -80.4764 36.2...
## MULTIPOLYGON (((-76.009 36.3196, -76.0173 36.33...
## MULTIPOLYGON (((-77.2177 36.241, -77.2346 36.21...
## Simple feature collection with 6 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -81.7411 ymin: 36.0728 xmax: -75.7731 ymax: 36.5896
## epsg (SRID):    NA
## proj4string:    NA
##    AREA PERIMETER                       geometry
## 1 0.114     1.442 MULTIPOLYGON (((-81.4727 36...
## 2 0.061     1.231 MULTIPOLYGON (((-81.2399 36...
## 3 0.143     1.630 MULTIPOLYGON (((-80.4563 36...
## 4 0.070     2.968 MULTIPOLYGON (((-76.009 36....
## 5 0.153     2.206 MULTIPOLYGON (((-77.2177 36...
## 6 0.097     1.670 MULTIPOLYGON (((-76.7451 36...

Decoding

Use decode() to decode polylines into coordinates. This function will return a list of data.frames with lon/lat column.

polylines <- c(
 "ohlbDnbmhN~suq@am{tAw`qsAeyhGvkz`@fge}A",
 "ggmnDt}wmLgc`DesuQvvrLofdDorqGtzzV"
)

decode(polylines)
## [[1]]
##      lat       lon
## 1 26.774 -80.18999
## 2 18.466 -66.11799
## 3 32.321 -64.75700
## 4 26.774 -80.18999
## 
## [[2]]
##      lat       lon
## 1 28.745 -70.57899
## 2 29.570 -67.51400
## 3 27.339 -66.66800
## 4 28.745 -70.57899

Benchmarking

You will note that the encoded strings are different between the two enc objects created earlier. For example

enc[1, 'geometry'][[1]] == enc2[1, 'geometry'][[1]]
## [1] FALSE

This results from the lossy-ness of the encoding. However, the general shape of the information is preserved. These two maps are plots of the enc and enc2 objects respectively:

library(googleway)

## You'll need a Google Map API key to run this code
mapKey <- "your_api_key"

google_map(key = mapKey) %>%
  add_polygons(data = nc, polyline = "geometry", fill_colour = "#00FF00", fill_opacity = 0.2)
map_enc

map_enc

google_map(key = mapKey) %>%
  add_polygons(data = enc2, polyline = "geometry", fill_colour = "#FF00FF", fill_opacity = 0.2)
map_enc2

map_enc2

Plotting

This benchmark compares plotting an sf object through leaflet vs plotting the encoded object through googleway


library(microbenchmark)
library(leaflet)

microbenchmark(

  goog = { 
    google_map(key = mapKey) %>%
      add_polygons(data = enc, polyline = "polyline")
      },
      
  leaf = { 
    leaflet() %>%
      addTiles() %>%
      addPolygons(data = nc)
    }, 
    times = 25
)

Unit: milliseconds
 expr       min        lq      mean    median        uq        max neval
 goog  5.457699  5.972203  6.379053  6.162676  6.467225   8.876789   100
 leaf 29.915151 32.093100 34.251388 33.088324 34.629769 106.124047   100
 

These benchmarks don’t account for the time taken for the browswer to render the maps