Numbers in engineering format

Richard Layton

2017-03-09

This vignette demonstrates the use of two functions from the docxtools package:

format_engr()

The primary goal of format_engr() is to present numeric variables in a data frame in engineering format, that is, scientific notation with exponents that are multiples of 3. Compare:

Syntax Expression
conventional computer syntax \(1.011e+5\)
mathematical syntax \(1.011\times10^{5}\)
engineering format \(101.1\times10^{3}\)

This example uses a small temperature-pressure data set to compute air density and display the results in a table. Density is computed using the ideal gas law, \(\rho = p / (RT)\).

Start by loading packages,

library(knitr)
opts_knit$set(root.dir = "../")
suppressPackageStartupMessages(library(dplyr))
library(docxtools)

Create some data for the example,

# temperature in K
T_K  <- c(294.05, 294.15, 294.65, 293.35, 293.85)
# convert pressure from hPa to Pa
p_Pa <- c(1011, 1010, 1011, 1010, 1011) * 100
# gas constant in J / (kg K)
R    <- 287
# density in kg / m^3
density_data <- data.frame(T_K, p_Pa, R)

Compute air density for each observation,

density_data <- mutate(density_data, density = p_Pa / (R * T_K))
knitr::kable(density_data)
T_K p_Pa R density
294.05 101100 287 1.197976
294.15 101000 287 1.196384
294.65 101100 287 1.195536
293.35 101000 287 1.199647
293.85 101100 287 1.198791

The first argument of format_engr() is the data frame, the second is the array of significant digits in column order.

engr_density_data <- format_engr(density_data, sigdig = c(5, 4, 0, 5))

Data frame values are returned as character strings with math formatting. By setting sigdig = 0 for R, the third column will be displayed in its original form.

str(engr_density_data)
## 'data.frame':    5 obs. of  4 variables:
##  $ T_K    : chr  "$294.05$" "$294.15$" "$294.65$" "$293.35$" ...
##  $ p_Pa   : chr  "${101.1}\\times 10^{3}$" "${101.0}\\times 10^{3}$" "${101.1}\\times 10^{3}$" "${101.0}\\times 10^{3}$" ...
##  $ R      : chr  "$287$" "$287$" "$287$" "$287$" ...
##  $ density: chr  "$1.1980$" "$1.1964$" "$1.1955$" "$1.1996$" ...

The math formatting is applied when the data frame is printed in the output document.

knitr::kable(engr_density_data)
T_K p_Pa R density
\(294.05\) \({101.1}\times 10^{3}\) \(287\) \(1.1980\)
\(294.15\) \({101.0}\times 10^{3}\) \(287\) \(1.1964\)
\(294.65\) \({101.1}\times 10^{3}\) \(287\) \(1.1955\)
\(293.35\) \({101.0}\times 10^{3}\) \(287\) \(1.1996\)
\(293.85\) \({101.1}\times 10^{3}\) \(287\) \(1.1988\)

Comments:

align_pander()

This function uses pander() to print a table and panderOptions('table.alignment.default') to align columns. Usage is: align_pander(x, align_idx = NULL, caption = NULL)

align_pander(engr_density_data, align_idx = "cccc")
T_K p_Pa R density
\(294.05\) \({101.1}\times 10^{3}\) \(287\) \(1.1980\)
\(294.15\) \({101.0}\times 10^{3}\) \(287\) \(1.1964\)
\(294.65\) \({101.1}\times 10^{3}\) \(287\) \(1.1955\)
\(293.35\) \({101.0}\times 10^{3}\) \(287\) \(1.1996\)
\(293.85\) \({101.1}\times 10^{3}\) \(287\) \(1.1988\)

Finally, the heading can be edited for presentation. The heading row, like the numeric variables, are formatted in R Markdown math format.

names(engr_density_data) <- c("$T\\text{ (K)}$"
    , "$p\\text{ (Pa)}$"
    , "$R\\text{ (J kg}^{-1}\\text{ K}^{-1}\\text{)}$"
    , "$\\rho\\text{ (kg/m}^{3}\\text{)}$"
    )
align_pander(engr_density_data, "cccc", caption = "Air density measurements")
Air density measurements
\(T\text{ (K)}\) \(p\text{ (Pa)}\) \(R\text{ (J kg}^{-1}\text{ K}^{-1}\text{)}\) \(\rho\text{ (kg/m}^{3}\text{)}\)
\(294.05\) \({101.1}\times 10^{3}\) \(287\) \(1.1980\)
\(294.15\) \({101.0}\times 10^{3}\) \(287\) \(1.1964\)
\(294.65\) \({101.1}\times 10^{3}\) \(287\) \(1.1955\)
\(293.35\) \({101.0}\times 10^{3}\) \(287\) \(1.1996\)
\(293.85\) \({101.1}\times 10^{3}\) \(287\) \(1.1988\)

non-numeric variables

Create some alphanumeric test data,

# create test input arguments
set.seed(20161221)
n  <- 5
a  <- sample(letters, n)
b  <- sample(letters, n)
x  <- runif(n, min =  -5, max =  50) * 1e+5
y  <- runif(n, min = -25, max =  40) / 1e+3
z  <- runif(n, min =  -5, max = 100)
alpha_num <- data.frame(z, b, y, a, x, stringsAsFactors = FALSE)

Format the entire data frame with the default 4 significant digits.

engr_alpha_num <- format_engr(alpha_num)
align_pander(engr_alpha_num, "rcrcr")
z b y a x
\(6.501\) c \({1.052}\times 10^{-3}\) q \({2.847}\times 10^{6}\)
\(28.37\) o \({347.6}\times 10^{-6}\) y \({4.874}\times 10^{6}\)
\(-3.850\) i \({4.600}\times 10^{-3}\) g \({-111.7}\times 10^{3}\)
\(44.50\) a \({-3.045}\times 10^{-3}\) a \({1.315}\times 10^{6}\)
\(92.41\) x \({-1.069}\times 10^{-3}\) i \({417.4}\times 10^{3}\)

Variables can be re-ordered in the usual way, e.g.,

alpha_num <- select(alpha_num, a, b, x, y, z)
engr_alpha_num <- format_engr(alpha_num)
align_pander(engr_alpha_num, "ccrrr")
a b x y z
q c \({2.847}\times 10^{6}\) \({1.052}\times 10^{-3}\) \(6.501\)
y o \({4.874}\times 10^{6}\) \({347.6}\times 10^{-6}\) \(28.37\)
g i \({-111.7}\times 10^{3}\) \({4.600}\times 10^{-3}\) \(-3.850\)
a a \({1.315}\times 10^{6}\) \({-3.045}\times 10^{-3}\) \(44.50\)
i x \({417.4}\times 10^{3}\) \({-1.069}\times 10^{-3}\) \(92.41\)

significant zeros

Leading zeros are not significant.

Trailing zeros generally should be significant. For example, isolate a number from the y column:

y2 <- alpha_num$y[2]
y2
## [1] 0.000347614

Formatting y2 with different significant digits using format_engr(y2, sigdig) yields the table below. With 3 digits, y2 has 3 unambiguous significant digits. However, reducing the number of digits to 2 would produce a coefficient of \(350\) with an ambiguous zero before the decimal point. In such cases, format_engr() changes the exponent to produce \({0.35}\times 10^{-3}\) with two unambiguous significant digits.

sigdig \(y_2\)
4 \({347.6}\times 10^{-6}\)
3 \({348}\times 10^{-6}\)
2 \({0.35}\times 10^{-3}\)
1 \({0.3}\times 10^{-3}\)

Exceptions to the significant trailing zero rule. Consider \(z_2\) from the data frame,

z2 <- alpha_num$z[2]
z2
## [1] 28.37409

For numbers like z2 that in scientific notation would have exponents = 0, 1, or 2, format_engr() foregoes powers of ten notation.

sigdig \(z_2\)
4 \(28.37\)
3 \(28.4\)
2 \(28\)
1 \(30\)

As we reduce the number of significant digits, we can eventually obtain a trailing zero whose significance is ambiguous, as in \(z_2 =\) \(30\). In such cases, format_engr() leaves the ambiguous significant zero instead of imposing scientific notation that the audience might find distracting.

scalars and vectors are returned as data frames

The preferred input to format_engr() is a data frame. If the input is a numeric vector, it will be formatted and returned as a data frame with the variable name value.

For example, using y2 again,

str(y2)
##  num 0.000348

Now formatting the number returns a data frame with one row and column.

engr_y2 <- format_engr(y2, 4)
str(engr_y2)
## 'data.frame':    1 obs. of  1 variable:
##  $ value: chr "${347.6}\\times 10^{-6}$"

which can be used in an inline code chunk, e.g.,

$y_2 =$ `r engr_y2$value`

to produce \(y_2 =\) \({347.6}\times 10^{-6}\).

A numeric vector is also returned as a data frame.

# the x array
cat(x, sep = "\n")
## 2846529
## 4874357
## -111651.4
## 1314716
## 417385
class(x)
## [1] "numeric"

# formatted 
engr_x <- format_engr(x, 3)
class(engr_x)
## [1] "data.frame"
engr_x
##                    value
## 1 ${2.85}\\times 10^{6}$
## 2 ${4.87}\\times 10^{6}$
## 3 ${-112}\\times 10^{3}$
## 4 ${1.31}\\times 10^{6}$
## 5  ${417}\\times 10^{3}$

The input to format_engr() must have at least one numeric variable, or an error is thrown, e.g., running format_engr(a) produces the error

Error: m_numeric_cols > 0 is not TRUE Execution halted.

conclusion

These two functions provide the means for consistently rendering numbers with the desired number of significant digits, including trailing zeros, and align them in output tables without affecting character data in the same data frame.