corpus: Text Corpus Analysis

Text corpus data analysis, with full support for international text (Unicode). Functions for reading data from newline-delimited JSON files, for normalizing and tokenizing text, for searching for term occurrences, and for computing term occurrence frequencies, including n-grams.

Version: 0.9.2
Depends: R (≥ 2.10)
Imports: stats
Suggests: knitr, Matrix, quanteda, testthat, tm
Published: 2017-09-20
Author: Patrick O. Perry [aut, cph, cre], Finn Årup Nielsen [cph, dtc] (AFINN Sentiment Lexicon), Martin Porter and Richard Boulton [ctb, cph, dtc] (Snowball Stemmer and Stopword Lists), Carlo Strapparava and Alessandro Valitutti [cph, dtc] (WordNet-Affect Lexicon), Unicode, Inc. [cph, dtc] (Unicode Character Database)
Maintainer: Patrick O. Perry <pperry at>
License: Apache License (== 2.0) | file LICENSE
NeedsCompilation: yes
Materials: README NEWS
CRAN checks: corpus results


Reference manual: corpus.pdf
Vignettes: Chinese text handling
Introduction to corpus
Text data in Corpus and other packages
Unicode: Emoji, accents, and international text
Package source: corpus_0.9.2.tar.gz
Windows binaries: r-devel:, r-release:, r-oldrel:
OS X El Capitan binaries: r-release: corpus_0.9.2.tgz
OS X Mavericks binaries: r-oldrel: corpus_0.9.1.tgz
Old sources: corpus archive


