Table of Contents

Parsing command-line arguments by Getopt::Long

Author: Zuguang Gu ( z.gu@dkfz.de )

Date: 2017-03-06


There are already several R packages which parse command-line arguments such as getopt or Python-style optparse. Here GetoptLong is another command-line argument parser which wraps the powerful Perl module Getopt::Long, also provides some adaptation for easier use in R.

Using GetoptLong is simple especially for Perl users because the specification is almost the same as in Perl. The original website of Getopt::Long is always your best reference.

Workflow of the wrapping

Following figure shows how the R package works for parsing the command-line arguments.

A quick example

library(GetoptLong)

cutoff = 0.05
GetoptLong(
    "number=i", "Number of items, integer, mandatory option",
    "cutoff=f", "cutoff to filter results, optional, default (0.05)",
    "verbose",  "print messages"
)

The number of arguments in GetoptLong() should be even number and the specification and description should always be paried.

Save the code as test.R and we can execute the R script as:

~\> Rscript test.R --number 4 --cutoff 0.01 --verbose
~\> Rscript test.R -n 4 -c 0.01 -v
~\> Rscript test.R -n 4 --verbose

In above example, number is a mandatory option and should only be integer mode. cutoff is optional and already has a default value. verbose is a logical option. If parsing is successful, two variables with name number and verbose will be imported into the working environment with specified values, and value for cutoff will be updated if it is specified in command-line argument.

Customize your options

Each specifier in options consists of two parts: the name specification and the argument specification:

length|size|l=i@

Here length|size|l is a list of alternative names seperated by |. The remaining part is argument specification which defines the mode and amount of arguments. The argument specification is optional.

Specify any one of alternative option name from command-line is OK and it doesn't matter whether using one or two slash in front of the option name. Sometimes you even don't need to specify complete option names, you only need to make sure the partial name match is unique. If the partial match is not uniqe, it will throw an error. For above example, we can specify the argument like:

~\> Rscript foo.R --length 1
~\> Rscript foo.R -len 1
~\> Rscript foo.R --size 1
~\> Rscript foo.R -l 1

Options for argument specification are:

Please note :[type][desttype] is not supported here (If you don't know what it is, just ignore it). We use another way to define mandatory options and optional options.

Available type options are:

Available desttype settings are:

Available repeat settings are formatted as {\d, \d}. Note there is no blank character inside:

Note although @ and {\d, \d} are all for array, their usages are different. If option is specified as tag=i@, --tag 1 --tag 2 is only valid. And if option is specified as tag=i{2}, --tag 1 2 is only valid.

Following table contains detailed examples for each type of option specification:

OptionsCommand-line argumentsValue of tag
tag=i--tag 11
--tag 1 --tag 22, only take the last one
--tag 0.1Error: Value “0.1” invalid for option tag (number expected)
--tag aError: Value “a” invalid for option tag (number expected)
--tagError: Option tag requires an argument
no argumenttag is mandatory, please specify it
tag=s--tag 11. Here double quote is used because it is specified as a string.
--tag 0.10.1
--tag aa
tag=f--tag 11
--tag 0.10.1
--tag aError: Value “a” invalid for option tag (real number expected)
tag=o--tag 11
--tag 0b0010019
--tag 0721465
--tag 0xaf22802
-tag 0.1Error: Value “0.1” invalid for option tag (extended number expected)
--tag aError: Value “a” invalid for option tag (extended number expected)
tag--tag 1TRUE
--tag 0TRUE, it doesn’t care the value for the option.
--tag 0.1TRUE
--tag aTRUE
--tagTRUE
no argumentFALSE
tag!--tagTRUE
--no-tagFALSE
tag=i@--tag 11
--tag 1 --tag 2c(1, 2)
tag=i%--tag 1Error: Option tag, key “1”, requires a value
--tag name=1tag$name = 1, tag is a list.
tag=i{2}--tag 1Error: Insufficient arguments for option tag
--tag 1 2c(1 2)
--tag 1 --tag 2Error: Value “–tag” invalid for option tag

Set default value and import options as variables

Options will be imported into user's environment as R variables by default. The first option name in option alternative names will be taken as variable name, (e.g. for specification of length|size=s, length will be used as the variable name.) which means, it must be a valid R variable name. Any definition of these variables in front of GetoptLong() will be treated as default values for corresponding options. If options already have default values, they are optional in command-line. If the variable is defined as a function before GetoptLong() is called, it is treated as undefined. Please note your option names should not start with the dot. Although it is valid for R variables but it is not allowed for Getopt::Long module.

Help and version options

help and version are two universal options. By default, these two options will be inferred from user's source code.

By default, GetoptLong() only provides descriptions of all specified options. Users can set the option by head and foot arguments to add informaiton for a complete help message. And version is from VERSION variable defined in user's environment (Of course, VERSION should be defined before GetoptLong()).

VERSION = "0.0.1"
GetoptLong(
    "tag=i", "this is a description of tag which is long long and very long and extremly long...", 
    head = 'An example to show how to use the packages',
    foot = 'Please contact author@gmail.com for comments'
)

Then you can specify --help:

~\> Rscript command.R --help
An example to show how to use the packages
Usage: Rscript test.R [options]

  --tag integer
    this is a description of tag which is long long and very long and extremly
    long...

  --help
    Print help message and exit

  --version
    Print version information and exit

Please contact author@gmail.com for comments

Or print version of your script:

~\> Rscript command.R --version
0.0.1

Configuring Getopt::Long

Configuration of Getopt::Long can be set by GetoptLong.options("config"):

GetoptLong.options("config" = "bundling")
GetoptLong.options("config" = c("no_ignore_case", "bundling"))

With different configuration, it can support more types of option specifications:

-a -b -c  -abc
-s 24 -s24 -s=24

Please refer to website of Getopt::Long for more information.

Specify path of Perl in command line

In some conditions that path of binary Perl is not in your PATH environment variable and you do not have permission to modify PATH. You can specify your Perl path from command line like:

~\> Rscript test.R -a -b -c -- /your/perl/bin/perl

Since arguments following after -- will be ignored by Getopt::Long, we take the first argument next to -- as the path of user-specified Perl path.

Specify command-line options within R session

When in an interactive R session, arguments can be set when calling GetoptLong:::source(), so it would be convinient to control variables even you are in an interactive R session:

GetoptLong:::source("foo.R", argv = "--cutoff 0.01 --input file=foo.txt --verbose")

Session info

sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.3
## 
## locale:
## [1] C/en_US.UTF-8/C/C/C/C
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.15.1   markdown_0.7.7
## 
## loaded via a namespace (and not attached):
## [1] magrittr_1.5  tools_3.3.2   stringi_1.1.2 stringr_1.1.0 evaluate_0.10