Using GetITRData to obtain quarterly financial reports from Bovespa

Marcelo Perlin

2017-10-21

Financial statements of companies traded at B3 (formerly Bovespa), the Brazilian stock exchange, are available in its website. Accessing the data for a single company is straightforward. In the website one can find a simple interface for accessing this dataset. An example is given here. However, gathering and organizing the data for a large scale research, with many companies and many dates, is painful. Financial reports must be downloaded or copied individually and later aggregated. Changes in the accounting format thoughout time can make this process slow, unreliable and irreproducible.

Package GetITRData provides a R interface to all financial statements available in the website. It not only downloads the data but also organizes it in a tabular format and allows the use of inflation indexes. Users can select companies and a time period to download all available data. Several information about current companies, such as sector and available quarters are also at reach. The main purpose of the package is to make it easy to access quarterly financial statements in large scale research, facilitating the reproducibility of corporate finance studies with B3 data.

Installation

The package is available in CRAN (release version) and in Github (development version). You can install any of those with the following code:

# Release version in CRAN
install.packages('GetITRData') # not in CRAN yet

# Development version in Github
devtools::install_github('msperlin/GetITRData')

How to use GetITRData

The starting point of GetITRData is to find the official names of companies in Bovespa. Function gitrd.search.company serves this purpose. Given a string (text), it will search for a partial matches in companies names. As an example, let’s find the official name of Petrobras, one of the largest companies in Brazil:

library(GetITRData)
library(tibble)

gitrd.search.company('petrobras')
## 
## Reading info file from github
## Found 33351 lines for 626 companies  [Actives =  474  Inactives =  152 ]
## Last file update:  2017-10-19
## Caching RDATA into tempdir()
## 
## Found 1 companies:
## PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS | situation = ATIVO | first date = 1998-09-30 | last date - 2017-06-30

Its official name in Bovespa records is PETRÓLEO BRASILEIRO S.A. - PETROBRAS. Data for quarterly and annual statements are available from 1998 to 2017. The situation of the company, active or canceled, is also given. This helps verifying the availability of data.

The content of all available quarterly statements can be accessed with function gitrd.get.info.companies. It will read and parse a .csv file from my github repository. This will be periodically updated for new quarterly statements. Let’s try it out:

df.info <- gitrd.get.info.companies(type.data = 'companies')
## 
## Reading info file from github
## Found 33351 lines for 626 companies  [Actives =  474  Inactives =  152 ]
## Last file update:  2017-10-19
## Caching RDATA into tempdir()
glimpse(df.info)
## Observations: 627
## Variables: 9
## $ name.company <chr> "521 PARTICIPAÇOES S.A. - EM LIQUIDAÇÃO EXTRAJUDI...
## $ id.company   <int> 16330, 16284, 21725, 19313, 18350, 18970, 18104, ...
## $ situation    <chr> "ATIVO", "ATIVO", "ATIVO", "ATIVO", "CANCELADA", ...
## $ main.sector  <chr> NA, "Financeiro e Outros", "Consumo não Cíclico",...
## $ sub.sector   <chr> NA, "Outros", "Produtos de Uso Pessoal e de Limpe...
## $ segment      <chr> NA, "Outros", "Produtos de Uso Pessoal", "Energia...
## $ tickers      <chr> NA, "QVQP3B", NA, "AELP3", NA, "TIET11;TIET3;TIET...
## $ first.date   <date> 1998-12-31, 2001-09-30, 2008-12-31, 2001-12-31, ...
## $ last.date    <date> 2015-03-31, 2017-06-30, 2017-03-31, 2017-06-30, ...

This file includes several information that are gathered from Bovespa: names of companies, sectors, dates quarterly statements and, most importantly, the links to download the files. The resulting dataframe can be used to filter and gather information for large scale research such as downloading financial data for a specific sector.

Downloading financial information for ONE company

All you need to download financial data with GetITRData are the official names of companies, which can be found with gitrd.search.company, the desired starting and ending dates and the type of financial information (individual or consolidated). Let’s try it for PETROBRAS:

name.companies <- 'PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS'
first.date <- '2004-01-01'
last.date  <- '2006-01-01'
type.statements <- 'individual'
periodicy.fin.report <- 'annual'

df.reports <- gitrd.GetITRData(name.companies = name.companies, 
                               periodicy.fin.report = periodicy.fin.report, 
                               first.date = first.date,
                               last.date = last.date,
                               type.info = type.statements)
## Found cache file. Loading data..
## 
## Downloading data for 1 companies
## Reach of financial reports: individual
## Periodicy of financial reports: annual (dfp system)
## First Date: 2004-01-01
## Laste Date: 2006-01-01
## Inflation index: dollar
## 
## Downloading dollar data using BETS   Done
## 
## 
## WARNING: For data before 2009, the cash flow statements are not available
## 
## Inputs looks good! Downloading data:
## PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS
##  Available periods: 2005-12-31   2004-12-31
## 
## 
## Processing PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS, Date = 2005-12-31 Downloading | Reading file
## Processing PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS, Date = 2004-12-31 Downloading | Reading file

The resulting object is a tibble, a data.frame type of object that allows for list columns. Let’s have a look in its content:

glimpse(df.reports)
## Observations: 1
## Variables: 13
## $ company.name      <chr> "PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS"
## $ company.code      <int> 9512
## $ type.info         <chr> "individual"
## $ min.date          <date> 2004-12-31
## $ max.date          <date> 2005-12-31
## $ n.periods         <int> 2
## $ stock.holders     <list> [<c("Bndes + Fps", "Caixa Previdência Funci...
## $ stock.composition <list> [<c("Ordinárias", "Preferenciais", "Total")...
## $ dividends.history <list> [<c("ON", "ON", "ON", "ON", "ON", "ON", "ON...
## $ fr.assets         <list> [<# A tibble: 28 x 6,                      ...
## $ fr.liabilities    <list> [<# A tibble: 46 x 6,                      ...
## $ fr.income         <list> [<# A tibble: 48 x 6,                      ...
## $ fr.cashflow       <list> [<character(0), character(0), character(0),...

Object df.reports only has one row since we only asked for data of one company. The number of rows increases with the number of companies, as we will soon learn with the next example. All financial statements for the different years are available within df.reports. For example, the income statements for all desired years of PETROBRAS are:

df.income.long <- df.reports$fr.income[[1]]

glimpse(df.income.long)
## Observations: 48
## Variables: 6
## $ company.name       <chr> "PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS", "...
## $ ref.date           <date> 2005-12-31, 2005-12-31, 2005-12-31, 2005-1...
## $ acc.number         <chr> "3.01", "3.02", "3.03", "3.04", "3.05", "3....
## $ acc.desc           <chr> "Receita Bruta de Vendas e/ou Serviços", "D...
## $ acc.value          <int> 143665730, -37843204, 105822526, -57512113,...
## $ acc.value.infl.adj <dbl> 61377250.40, -16167472.98, 45209777.42, -24...

The resulting dataframe is in the long format, ready for processing. In the long format, financial statements of different quarters are stacked. In the wide format, we have the quarters as dates. If you want the wide format, which I believe is most common in financial analysis, you can use function gitrd.convert.to.wide. See an example next:

df.income.wide <- gitrd.convert.to.wide(df.income.long)

knitr::kable(df.income.wide )
acc.number acc.desc company.name 2004-12-31 2005-12-31
3.01 Receita Bruta de Vendas e/ou Serviços PETRÓLEO BRASILEIRO S.A. - PETROBRAS 120024727 143665730
3.02 Deduções da Receita Bruta PETRÓLEO BRASILEIRO S.A. - PETROBRAS -34450292 -37843204
3.03 Receita Líquida de Vendas e/ou Serviços PETRÓLEO BRASILEIRO S.A. - PETROBRAS 85574435 105822526
3.04 Custo de Bens e/ou Serviços Vendidos PETRÓLEO BRASILEIRO S.A. - PETROBRAS -48607576 -57512113
3.05 Resultado Bruto PETRÓLEO BRASILEIRO S.A. - PETROBRAS 36966859 48310413
3.06 Despesas/Receitas Operacionais PETRÓLEO BRASILEIRO S.A. - PETROBRAS -11110540 -14810467
3.06.01 Com Vendas PETRÓLEO BRASILEIRO S.A. - PETROBRAS -2858630 -4195157
3.06.02 Gerais e Administrativas PETRÓLEO BRASILEIRO S.A. - PETROBRAS -2599552 -3453753
3.06.03 Financeiras PETRÓLEO BRASILEIRO S.A. - PETROBRAS -1019901 126439
3.06.04 Outras Receitas Operacionais PETRÓLEO BRASILEIRO S.A. - PETROBRAS 0 0
3.06.05 Outras Despesas Operacionais PETRÓLEO BRASILEIRO S.A. - PETROBRAS -5982336 -9070019
3.06.06 Resultado da Equivalência Patrimonial PETRÓLEO BRASILEIRO S.A. - PETROBRAS 1349879 1782023
3.07 Resultado Operacional PETRÓLEO BRASILEIRO S.A. - PETROBRAS 25856319 33499946
3.08 Resultado Não Operacional PETRÓLEO BRASILEIRO S.A. - PETROBRAS -550694 -199982
3.08.01 Receitas PETRÓLEO BRASILEIRO S.A. - PETROBRAS 46611 1256194
3.08.02 Despesas PETRÓLEO BRASILEIRO S.A. - PETROBRAS -597305 -1456176
3.09 Resultado Antes Tributação/Participações PETRÓLEO BRASILEIRO S.A. - PETROBRAS 25305625 33299964
3.10 Provisão para IR e Contribuição Social PETRÓLEO BRASILEIRO S.A. - PETROBRAS -5199166 -8581490
3.11 IR Diferido PETRÓLEO BRASILEIRO S.A. - PETROBRAS -1692288 -422392
3.12 Participações/Contribuições Estatutárias PETRÓLEO BRASILEIRO S.A. - PETROBRAS -660000 -846000
3.12.01 Participações PETRÓLEO BRASILEIRO S.A. - PETROBRAS -660000 -846000
3.12.02 Contribuições PETRÓLEO BRASILEIRO S.A. - PETROBRAS 0 0
3.13 Reversão dos Juros sobre Capital Próprio PETRÓLEO BRASILEIRO S.A. - PETROBRAS 0 0
3.15 Lucro/Prejuízo do Exercício PETRÓLEO BRASILEIRO S.A. - PETROBRAS 17754171 23450082

Downloading financial information for SEVERAL companies

If you are doing serious research, it is likely that you need financial statements for more than one company. Package GetITRData is specially designed for handling large scale download of data. Let’s build a case with 3 randomly selected companies:

set.seed(2)
my.companies <- sample(unique(df.info$name.company), 5)

first.date <- '2008-01-01'
last.date  <- '2010-01-01'
type.statements <- 'individual'
periodicy.fin.report <- 'annual'

df.reports <- gitrd.GetITRData(name.companies = my.companies, 
                               periodicy.fin.report = periodicy.fin.report,
                               first.date = first.date,
                               last.date = last.date,
                               type.info = type.statements)
## Found cache file. Loading data..
## 
## Downloading data for 5 companies
## Reach of financial reports: individual
## Periodicy of financial reports: annual (dfp system)
## First Date: 2008-01-01
## Laste Date: 2010-01-01
## Inflation index: dollar
## 
## Downloading dollar data using BETS   Done
## 
## 
## WARNING: Cant find available dates for BRZ INVESTIMENTOS S/A
## WARNING: For data before 2009, the cash flow statements are not available
## 
## Inputs looks good! Downloading data:
## BRF S.A.
##  Available periods: 2009-12-31   2008-12-31
## KARSTEN SA
##  Available periods: 2009-12-31   2008-12-31
## ODONTOPREV S/A
##  Available periods: 2009-12-31   2008-12-31
## ULTRAPAR PARTICIPAÇÕES SA
##  Available periods: 2009-12-31   2008-12-31
## 
## 
## Processing BRF S.A., Date = 2009-12-31   Downloading | Reading file
## Processing BRF S.A., Date = 2008-12-31   Downloading | Reading file
## Processing KARSTEN SA, Date = 2009-12-31 Downloading | Reading file
## Processing KARSTEN SA, Date = 2008-12-31 Downloading | Reading file
## Processing ODONTOPREV S/A, Date = 2009-12-31 Downloading | Reading file
## Processing ODONTOPREV S/A, Date = 2008-12-31 Downloading | Reading file
## Processing ULTRAPAR PARTICIPAÇÕES SA, Date = 2009-12-31  Downloading | Reading file
## Processing ULTRAPAR PARTICIPAÇÕES SA, Date = 2008-12-31  Downloading | Reading file

And now we can check the resulting tibble:

glimpse(df.reports)
## Observations: 4
## Variables: 13
## $ company.name      <chr> "BRF S.A.", "KARSTEN SA", "ODONTOPREV S/A", ...
## $ company.code      <int> 16292, 4146, 20125, 18465
## $ type.info         <chr> "individual", "individual", "individual", "i...
## $ min.date          <date> 2008-12-31, 2008-12-31, 2008-12-31, 2008-12-31
## $ max.date          <date> 2009-12-31, 2009-12-31, 2009-12-31, 2009-12-31
## $ n.periods         <int> 2, 2, 2, 2
## $ stock.holders     <list> [<c("Standard Life Aberdeen Plc", "Petros -...
## $ stock.composition <list> [<c("Ordinárias", "Preferenciais", "Total")...
## $ dividends.history <list> [<c("ON", "ON", "ON", "ON", "ON", "ON", "ON...
## $ fr.assets         <list> [<# A tibble: 18 x 6,    company.name   ref...
## $ fr.liabilities    <list> [<# A tibble: 42 x 6,    company.name   ref...
## $ fr.income         <list> [<# A tibble: 52 x 6,    company.name   ref...
## $ fr.cashflow       <list> [<character(0), character(0), character(0),...

Every row of df.reports will provide information for one company. Metadata about the corresponding dataframes such as min/max dates is available in the first columns. Keeping a tabular structure facilitates the organization and future processing of all financial data. We can use tibble df.reports for creating other dataframes in the long format containing data for all companies. See next, where we create dataframes with the assets and liabilities of all companies:

df.assets <- do.call(what = rbind, args = df.reports$fr.assets)
df.liabilities <- do.call(what = rbind, args = df.reports$fr.liabilities)

df.assets.liabilities <- rbind(df.assets, df.liabilities)

As an example, let’s use the resulting dataframe for calculating and analyzing a simple liquidity index of a company, the total of current (liquid) assets (Ativo circulante) divided by the total of current short term liabilities (Passivo Circulante), over time.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
my.tab <- df.assets.liabilities %>%
  group_by(company.name, ref.date) %>%
  summarise(Liq.Index = acc.value[acc.number == '1.01']/ acc.value[acc.number == '2.01'])

my.tab
## # A tibble: 8 x 3
## # Groups:   company.name [?]
##                company.name   ref.date Liq.Index
##                       <chr>     <date>     <dbl>
## 1                  BRF S.A. 2008-12-31 0.8705760
## 2                  BRF S.A. 2009-12-31 1.4471633
## 3                KARSTEN SA 2008-12-31 0.8535343
## 4                KARSTEN SA 2009-12-31 1.3657104
## 5            ODONTOPREV S/A 2008-12-31 1.7151491
## 6            ODONTOPREV S/A 2009-12-31 0.9576243
## 7 ULTRAPAR PARTICIPAÇÕES SA 2008-12-31 0.6841688
## 8 ULTRAPAR PARTICIPAÇÕES SA 2009-12-31 1.2383783

Now we can visualize the information using ggplot2:

library(ggplot2)

p <- ggplot(my.tab, aes(x = ref.date, y = Liq.Index, fill = company.name)) +
  geom_col(position = 'dodge' )
print(p)

As we can see, we could not find available data for all companies. Nonetheless, JPSP is the company with highest liquidity, being able to pay its short term debt with the current assets in all years. We can certainly do a lot more interesting studies based on this data set.

Exporting financial data

The package includes function gitrd.export.ITR.data for exporting the financial data to an Excel file. Users can choose between the long and wide format. See next:

my.basename <- 'MyExcelData'
my.format <- 'xlsx' # only supported so far
gitrd.export.ITR.data(data.in = df.reports, 
                      base.file.name = my.basename,
                      type.export = my.format,
                      format.data = 'long')

The resulting Excel file contains all data available in df.reports.