'Install R packages using docker file

I have installed R using below line in my docker file. Please suggest how do I specify now packages to be installed in my docker file.

RUN yum -y install R-core R-devel

I'm doing something like this:

RUN R -e "install.packages('methods',dependencies=TRUE, repos='http://cran.rstudio.com/')"\
    && R -e "install.packages('jsonlite',dependencies=TRUE, repos='http://cran.rstudio.com/')" \
    && R -e "install.packages('tseries',dependencies=TRUE, repos='http://cran.rstudio.com/')" 

Is this the right way to do?



Solution 1:[1]

As suggested by @Cameron Kerr's comment, Rscript does not give you a build failure. As of now, the recommended way is to do as the question suggests.

RUN R -e "install.packages('methods',dependencies=TRUE, repos='http://cran.rstudio.com/')"
RUN R -e "install.packages('jsonlite',dependencies=TRUE, repos='http://cran.rstudio.com/')"
RUN R -e "install.packages('tseries',dependencies=TRUE, repos='http://cran.rstudio.com/')" 

If you're fairly certain of no package failures then use this one-liner -

RUN R -e "install.packages(c('methods', 'jsonlite', 'tseries'),
                           dependencies=TRUE, 
                           repos='http://cran.rstudio.com/')"

EDIT: If you're don't use the Base-R image, you can use rocker-org's r-ver or r-studio or tidyverse images. Here's the repo. Here's an example Dockerfile -

FROM rocker/tidyverse:latest

# Install R packages
RUN install2.r --error \
    methods \
    jsonlite \
    tseries

The --error flag is optional, it makes install.packages() throw an error if the package installation fails (which will cause the docker build command to fail). By default, install.packages() only throws a warning, which means that a Dockerfile can build successfully even if it has failed to install the package.

All rocker-org's basically installs the littler package for the install2.R functionality

Solution 2:[2]

Yes, your solution should work. I came across the same problem and found the solution here https://github.com/glamp/r-docker/blob/master/Dockerfile.

In short, use: RUN Rscript -e "install.packages('PACKAGENAME')". I have tried it and it works.

As others have mentioned in the comments, this solution will not raise an error if the package could not be installed.

Solution 3:[3]

This is ugly but it works - see below for real world example of why it's worth doing.

# install packages and check installation success, install.packages itself does not report fails
RUN R -e "install.packages('RMySQL');     if (!library(RMySQL, logical.return=T)) quit(status=10)" \
 && R -e "install.packages('devtools');   if (!library(devtools, logical.return=T)) quit(status=10)" \
 && R -e "install.packages('data.table'); if (!library(data.table, logical.return=T)) quit(status=10)" \
 && R -e "install.packages('purrr');      if (!library(purrr, logical.return=T)) quit(status=10)" \
 && R -e "install.packages('tidyr');      if (!library(tidyr, logical.return=T)) quit(status=10)"

Real world example: devtools install starts failing because it suddenly needs libgit2-dev. install.packages() prints informative info. about the failure, but without a non-zero exit code, that just scrolls away as docker build continues.

Solution 4:[4]

The R -e "install.packages..." approach does not always produce an error when package installation fails.

I wrote a script based on Cameron Kerr's answer here, which produces an error if the package cannot be loaded, and interrupts the Docker build process. It installs packages from either an R package repo, from GitHub, or from source given a full URL. It also prints the time taken to install, to help plan which packages to group together in one command.

Example usage in Dockerfile:

# Install from CRAN repo:
RUN Rscript install_packages_or_die.R https://cran.rstudio.com/ Cairo
RUN Rscript install_packages_or_die.R Cairo # Uses default CRAN repo
RUN Rscript install_packages_or_die.R jpeg png tiff # Multiple packages

# Install from GitHub:
RUN Rscript install_packages_or_die.R github ramnathv/htmlwidgets
RUN Rscript install_packages_or_die.R github timelyportfolio/htmlwidgets_spin spin

# Install from source given full URL of package:
RUN Rscript install_packages_or_die.R https://cran.r-project.org/src/contrib/Archive/curl/curl_4.0.tar.gz curl

Here's the script:

#!/usr/bin/env Rscript

# Install R packages or fail with error.
#
# Arguments:
#   - First argument (optional) can be one of:
#       1. repo URL
#       2. "github" if installing from GitHub repo (requires that package 'devtools' is
#          already installed)
#       3. full URL of package from which to install from source; if used, provide package
#          name in second argument (e.g. 'curl')
#     If this argument is omitted, the default repo https://cran.rstudio.com/ is used.
#   - Remaining arguments are either:
#       1. one or more R package names, or
#       2. if installing from GitHub, the path containing username and repo name, e.g.
#          'timelyportfolio/htmlwidgets_spin', optionally followed by the package name (if
#          it differs from the GitHub repo name, e.g. 'spin').

arg_list = commandArgs(trailingOnly=TRUE)

if (length(arg_list) < 1) {
  print("ERROR: Too few arguments.")
  quit(status=1, save='no')
}

if (arg_list[1] == 'github' || grepl("^https?://", arg_list[1], perl=TRUE)) {
  if (length(arg_list) == 1) {
    print("ERROR: No package name provided.")
    quit(status=1, save='no')
  }
  repo = arg_list[1]
  packages = arg_list[-1]
} else {
  repo = 'https://cran.rstudio.com/'
  packages = arg_list
}

for(i in seq_along(packages)){
    p = packages[i]

    start_time <- Sys.time()
    if (grepl("^https?://[A-Za-z0-9.-]+/.+\\.tar\\.gz$", repo, perl=TRUE)) {
      # If 'repo' is URL with path after domain name, treat it as full path to a package
      # to be installed from source.
      install.packages(repo, repo=NULL, type="source");
    } else if (repo == "github") {
      # Install from GitHub.
      github_path = p
      elems = strsplit(github_path, '/')
      if (lengths(elems) != 2) {
        print("ERROR: Invalid GitHub path.")
        quit(status=1, save='no')
      }
      username = elems[[1]][1]
      github_repo_name = elems[[1]][2]
      if (!is.na(packages[i+1])) {
        # Optional additional argument was given specifying the R package name.
        p = packages[i+1]
      } else {
        # Assume R package name is the same as GitHub repo name.
        p = github_repo_name
      }

      library(devtools)
      install_github(github_path)
    } else {
      # Install from R package repository.
      install.packages(p, dependencies=TRUE, repos=repo);
    }
    end_time <- Sys.time()

    if ( ! library(p, character.only=TRUE, logical.return=TRUE) ) {
      quit(status=1, save='no')
    } else {
      cat(paste0("Time to install ", p, ":\n"))
      print(end_time - start_time)
    }

    if (repo == "github") {
      break
    }
}

Solution 5:[5]

You could write an R script with the desired install commands, then run it using Docker--if I'm reading this documentation correctly (https://hub.docker.com/_/r-base/).

FROM r-base
COPY . /usr/local/src/myscripts
WORKDIR /usr/local/src/myscripts
CMD ["Rscript", "myscript.R"]

Build your image with the command:

$ docker build -t myscript /path/to/Dockerfile

Where myscript.R contains the appropriate package installation commands.

Solution 6:[6]

The best solution I found is with install2.r from the littler package.

  • First install littler
RUN R -e "install.packages('littler', dependencies=TRUE)"
  • Then you can use it from bash in your Dockerfile
RUN install2.r --error --deps TRUE methods
RUN install2.r --error --deps TRUE jsonlite
RUN install2.r --error --deps TRUE tseries

The --error flag makes the build quit if the package has not been installed correctly. The --deps TRUE flag is for automatically installing the dependencies for the package

# corrected used of package name ('litter' to 'littler')

Solution 7:[7]

I would like to recommend the rocker/tidyverse image, on top of which you can install other packages like this:

RUN R -e "install.packages('bigrquery',dependencies=TRUE, repos='http://cran.rstudio.com/')"

The same installation from r-base was followed by an issue with Rserve, which, probably, was preinstalled in r-base image. I found nothing about this on the page about r-base, so I do not recommend r-base as an easy solution.

Installation of R packages could be also done with apt-get install r-cran-*, but maintainers of rocker/tidyverse do not recommend it for this particular image because this will lead to the installation of another R version. However, you may check it out and find out it is fine for your task.

Solution 8:[8]

Are these repositories a solution to this problem?

My solution in this repository is to create two Docker images: The "install image": The first image consists only of the prerequisites for the projects. When running a container from this image it can install R packages in the format it needs inside the container and save them to {renv}'s cache on the host through a mount. The "final image": The second image copies the project along with dependencies from the host into the image.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Terry Brown
Solution 4
Solution 5 Damian
Solution 6 battgo827
Solution 7 Daryna Ivaskevych
Solution 8 Takuro Ikeda