How to Build R Packages

This is a refresher and/or crash course on the essentials of building packages for the R language and programming environment, complete with the documentation necessary for distributing these packages on GitHub and CRAN.

Rather than aiming to build a perfect R package, this tutorial aims to provide only the minimal details necessary for building a functional package. The main goal is to create a repository for custom functions, complete with necessary documentation to make the package useful to others, and to publish the package on CRAN. This tutorial should result in a package that is a collection of custom functions that can be relied on to save you time and improve the reproducibility of your data analytic endeavors.

Why Package R Code?

  • Just like many other programming languages for scientific computing, R makes use of a modular system for distributing code, which makes the creation and maintenance of specialized R code both organized and manageable.

  • In R, a variety of especially useful tools have been developed to ease the transition from writing custom functions to building packages consisting of customized R code (and distributing the resulting packages).

  • Custom R code allows for both the fundamental capabilities and idiosyncratic behavior of R to be modified, and a package makes this code both more easily accessible to you (and others who may find your custom code useful).

Step I: Necessary Tools and Dependencies

  • In R, install the packages devtools and roxygen2, like so:
  install.packages(c("devtools", "roxygen2"))
  • The package roxygen2 is necessary for generating proper documentation for the package manual.

  • The package devtools provides numerous utilities that make building packages significantly easier, including devtools::document(), devtools::build(), and devtools::check(), to name but a few.

Step II: Building the Package Repository

  1. Navigate to the parent directory of the package you would like to create (using cd DIR or setwd("DIR").

  2. Build the skeleton for the new package – generating a new directory in the process – using the R command devtools::create("MYPKG").

    • The above will generate a directory with: (1) A subdirectory R/ for R code, (2) A file NAMESPACE which will (later) be populated with function and requirement names, (3) A file DESCRIPTION for required package meta-data, and (4) An RStudio project file MYPKG.Rproj.
  3. Next, navigate into the package directory and set it up as a Git repository using git init (note: this is not strictly necessary but is a good practice for any project).

    • Make sure to regularly use Git version control with the repository contents (git add, git commit, git push) along with GitHub, as the latter will provide public access to the package revision history via GitHub’s site.

Step III: Custom Functions and Documentation

  • Following one of several style guides, set up custom functions in several distinct .R files in the R/ subdirectory of the package; best practices involve thematically organizing functions into distinct .R files.

  • In particular, I recommend following the stylistic advice in Hadley Wickham’s comprehensive book R packages (this book also contains a wealth of other information and tips for writing R packages).

  • After setting up the desired functions in .R files, add in the minimal required documentation for roxygen2.

  • Note that documentation must be added in front of each defined function in .R files, using the required format for the roxygen2 package; this saves time in the long run by allowing auto-generation of manual pages.

Step IV: Unit Testing with testthat + devtools

  1. Formal automated testing of code is an important step in ensuring that work is reproducible - specifically, unit testing ensures that code contains fewer bugs, is more robust, and is structured more clearly.

  2. To begin writing unit tests for a package, in the package directory, run devtools::use_testthat(). This will create a subdirectory tests/testthat to store individual unit tests for each function, as well as a file tests/testthat.R to perform all tests when running R CMD check.

  3. Write individual test files for each function in the package, with multiple test_that statements checking various use cases. For advice on organizing/writing tests, see the this helpful chapter by Hadley Wickham.

  4. After writing appropriate tests for each function in the tests/testthat subdirectory, ensure that all tests are working by using devtools::test().

  5. Repeat the above step as necessary to remove any problems brought to light in the testing process. Once devtools::test() runs successfully without catching any errors, move on to the final steps of building and releasing the package.

Step V: Documentation and Building the Package

  1. Once all desired custom functions, and proper comments for documentation, have been set up in the .R files in the R/ subdirectory, use devtools::document() to generate package documentation and manual.

  2. The use of devtools::document() will generate (1) A subdirectory man/ for manual pages (.Rd files), and (2) a number of .Rd files (one for each, function), all of which may be found in the man/ subdirectory.

  3. After the documentation has been properly generated, the package can now be built and tested: in R, use devtools::build() while in the main package directory; this will produce a zipped version of the package in the parent directory (this can also be done from the command line with R CMD build MYPKG).

  4. To ensure that the package is working appropriately, use either (1) R CMD check MYPKG.tar.gz on the built version of the package; or (2) while in the package directory, from R, run devtools::check().

  5. Ensure that the package is working as intended by resolving all issues marked as WARNING or ERROR in the results produced by running the check.

Step VI: Publish the Package to GitHub and CRAN

  1. Assuming that Git was used with the repository, the package will be available from GitHub, and may be installed using devtools::install_github("USER/REPO") within R.

  2. Submit the package to CRAN (this can also be done with devtools::submit_cran() in R); after it is accepted, the package will be available for download with install.packages("MYPKG").

Useful Commands for Building/Publishing R Packages

  • devtools::create("MYPKG") - generates a package skeleton as described above.

  • devtools::document() - generates package documentation using the roxygen2 style comments preceding each function in the various .R files.

  • devtools::use_build_ignore("FILES") - adds named files to .Rbuildignore with proper syntax. This is necessary for files not approved by CRAN.

  • devtools::use_testthat() - adds a subdirectory tests/testthat for writing individual tests and a file tests/testthat.R to run all tests when R CMD check is used.

  • devtools::use_travis() - adds a .travis.yml config file to the repository to be used with Travis CI.

  • devtools::test() - runs all of the available tests that are present in the subdirectory tests/testthat to ensure that any functions with tests are working as intended.

  • devtools::check() - builds the package and performs necessary checks to ensure that everything is running smoothly (or points out errors). This is a bit more thorough than R CMD check.

  • devtools::build() - generates the package manual and compiles other necessary aspects, ultimately resulting in a zipped (.tar.gz) package file.

  • devtools::build_win() - builds and submits the package to CRAN win-builder for checking, with a status report generated roughly 20 minutes later. This conveniently checks against r-devel.

  • devtools::release() - builds the package, performs R CMD check, asks various questions, then uploads the bundle to CRAN. Preferable to devtools::submit_cran() since this is more thorough.

  • devtools::submit_cran() - builds and submits the package to CRAN, avoiding the (annoying) interface.

  • R CMD build MYPKG - (from the command line) builds the package when run in the parent directory, generating a zipped (.tar.gz) package file.

  • R CMD check MYPKG.tar.gz - (from the command line) runs necessary checks on a built package, pointing out any warnings and errors that need correction.

  • R CMD check --as-cran MYPKG - (from the command line) runs checks similar to the above but with additional requirements specific to CRAN that are necessary for successful submission.

Further reading/resources