Google Cloud Translation API

Mark Edmondson

2017-11-16

The Google Cloud Translation API provides a simple programmatic interface for translating an arbitrary string into any supported language. Translation API is highly responsive, so websites and applications can integrate with Translation API for fast, dynamic translation of source text from the source language to a target language (e.g. French to English).

Read more on the Google Cloud Translation Website

You can detect the language via gl_translate_detect, or translate and detect language via gl_translate

Language Translation

Translate text via gl_translate. Note this is a lot more refined than the free version on Google’s translation website.

library(googleLanguageR)

text <- "to administer medicince to animals is frequently a very difficult matter, and yet sometimes it's necessary to do so"
## translate British into Danish
gl_translate(text, target = "da")$translatedText

You can choose the target language via the argument target. The function will automatically detect the language if you do not define an argument source. This function which will also detect the langauge. As it costs the same as gl_translate_detect, its usually cheaper to detect and translate in one step.

You can pass a vector of text which will first be attempted to translate in one API call - if that fails due to being greater than the API limits, it will attempt again but vectorising the API calls. This will result in more calls and be slower, but cost the same as you are charged per character translated, not per API call.

HTML support

You can also supply web HTML and select the format='html' which will handle HTML tags to give you a cleaner translation.

Consider removing anything not needed to be translated first, such as JavaScript and CSS scripts using the tools of rvest - an example is shown below:

# translate webpages
library(rvest)
library(googleLanguageR)

my_url <- "http://www.dr.dk/nyheder/indland/greenpeace-facebook-og-google-boer-foelge-apples-groenne-planer"

## in this case the content to translate is in css select .wcms-article-content
read_html(my_url) %>% # read html
  html_node(css = ".wcms-article-content") %>%   # select article content
  html_text %>% # extract text
  gl_translate(format = "html") %>% # translate with html flag
  dplyr::select(translatedText) # show translatedText column of output tibble

Language Detection

This function only detects the language:

## which language is this?
gl_translate_detect("katten sidder på måtten")

The more text it has, the better. And it helps if its not Danish…

It may be better to use cld2 to translate offline first, to avoid charges if the translation is unnecessary (e.g. already in English). You could then verify online for more uncertain cases.

cld2::detect_language("katten sidder på måtten")

Translation API limits

The API limits in three ways: characters per day, characters per 100 seconds, and API requests per 100 seconds. All can be set in the API manager in Google Cloud console: https://console.developers.google.com/apis/api/translate.googleapis.com/quotas

The library will limit the API calls for the characters and API requests per 100 seconds. The API will automatically retry if you are making requests too quickly, and also pause to make sure you only send 100000 characters per 100 seconds.