R

Versions of R

The latest version of R on Great Lakes is 4.6.0. You can check what version you are currently set up to use with the command R --version 

A full list of available R, and R packages can be found here: https://umich-arc.github.io/arc-software-catalog/

The Library Search Path

An R library is a directory containing one or more packages, each of which is a subdirectory immediately inside the library.

Libraries are located in R by searching a set of directories in a well-defined order. The core R installation only contains a very small set of packages, always in the default search path. R also checks for a personal R library in your home directory and includes that, if it exists. The location is generically

<<home directory>>/R/<<architecture>>/<<R-version>>

e.g. ~/R/x86_64-pc-linux-gnu-library/4.5 (where “~” represents your home directory, “x86_64-pc-gnu-library” is an architecture label for 64-bit Intel, and “4.5” is the R version). The .libPaths() function is used to view or modify the set of directories that will be searched, in order, for packages. You can add any library directory that you have read access to, so this is how a shared R library can be set up for a lab, or for a specific R application. The library search path can also be initialized using environment variables (R_LIBS), or a .Rprofile file.

Be careful not to mix packages built under different versions of R. The results will be very confusing.

This is how you could prepend a directory in Great Lakes to the front of the existing list of directories, using the R/4.4.0 module version:

$ module load R/4.4.0 
$ R 
R version 4.4.0 (2024-04-24) -- "Puppy Cup"
Copyright (C) 2024 The R Foundation for Statistical Computing
  (startup messages omitted)
> .libPaths()
[1] "<<home directory>>/R/x86_64-pc-linux-gnu-library/4.4"
[2] "/nfs/turbo/some/path/R/4.4.0/lib/R/library"                

> .libPaths(c("/nfs/turbo/some/path/Rlibs/4.4", .libPaths()))  

Alternatively export R_LIBS=/some/path/containing/R/packages before starting R will have the same effect.

Compilation options

R libraries may be coded in various languages, but C++ is most common. We occasionally see C and Fortran also. When a library is installed, R will use the C++ compiler it finds in your $PATH list. Since libraries may be written to a newer C++ standard than the default on RHEL, it is sometimes necessary to arrange for a newer compiler suite to be used. On the Great Lakes cluster, this is done by activating one of the GNU/GCC compiler packages using Lmod 

For example, to use the GCC/10.3.0 $ module load gcc/10.3.0

This is needed only for installing, usually for subsequent runtime use.

Special options passed to the compilers during build are stored in the file ~/.R/Makevars . Very occasionally it may be necessary to modify those options. A typical Makevars file contains the following, which tell the build process how to compile code labelled as, for example, C++17 standard.

CXX11=g++
CXX11FLAGS=-O2 -march=native -mtune=native -fPIC -std=c++11
CXX11STD=-std=c++11
CXX14=g++
CXX14FLAGS=-O3 -march=native -mtune=native -fPIC -std=c++14
CXX17=g++
CXX17STD=-std=c++17
CXX17FLAGS=-O3 -mtune=native -fPIC -std=c++17

An additional step you should take if you use conda or mamba, is to deactivate any environments, including the miniconda base environment. If you do not, R may try to use shared libraries from your internal environments, which may not work. Once a package is installed into your R library, you can use your internal environments without affecting the R package.

$ conda deactivate

$ mamba deactivate

install.packages() and CRAN

The default source for R packages is CRAN, the Comprehensive R Archive Network, a network of mirrored servers around the world with freely available code and documentation.

The R function to download, compile, test and install a new package from CRAN into a library, is install.packages() . It has many optional features, but most often you will only need to specify the package name(s), where to get them from (the repository) and perhaps where to install them to. The default destination is the first directory in the library path (see above) which is writable. If you do not have a personal R library yet, install.packages() will prompt if it should create one. A package specifies what other packages it depends on, and a list of all the missing dependencies is created, and then they are downloaded and installed in the appropriate order. A complex package may have many dependencies, resulting in a long download and build cycle. install.packages() may also complain that some pre-existing packages are out of date, and offer to upgrade them. The prompting only happens if you are running R interactively. You can install packages with a single shell command, using Rscript, but in that case there is no prompting and some updates may be skipped, or it will fail to create a personal library. Another very useful option to speed things up is Ncpus=N where N is the number of CPUs that you want the installation process to use.

Example:

$ R
> install.packages("ggraph", repos="cloud.r-project.org", Ncpus=4)
Installing package into '/<<home_directory>>/R/x86_64-pc-linux-gnu-library/R'
(as 'lib' is unspecified)
trying URL 'cloud.r-project.org/src/contrib/ggraph_2.0.5.tar.gz'
Content type 'application/x-gzip' length 3217051 bytes (3.1 MB)
==================================================
downloaded 3.1 MB

* installing *source* package 'ggraph' ...
** package 'ggraph' successfully unpacked and MD5 sums checked
** using staged installation
** libs

  (compilation and installation messages omitted)

quit()

or as a single command:

Rscript -e 'install.packages("ggraph", repos="https://cloud.r-project.org", Ncpus=4)'

(similar output omitted)

Example of installing into an explicitly named library. The directory must exist and be writable. Prerequisite packages found elsewhere in the library path will not be reinstalled, so beware of prerequisites satisfied by a personal library, while trying to set up a shared lab library.

Rscript -e 'install.packages("ggraph", repos="https://cloud.r-project.org", lib="/<<custom-location>>/R/4.4" )'

Checking what packages are installed

Use command installed.packages() to list all packages that are available in each library listed in your .libPaths() . The output includes version, what library it came from, dependencies, and licensing information.

Use command sessionInfo() to show what packages have actually been loaded in this R session, versions and where they were loaded from, plus other information.

Bioconductor and other repositories

Another major repository is Bioconductor, aimed at bioinformatics and data scientists. Packages may also be downloaded from other collections, or individual git repositories (e.g. hosted on github). If the installation instructions for a package tell you to install from Bioconductor, you must first installl the BiocManager package from CRAN, and then use the custom installer provided by that, to install packages from Bioconductor.

e.g., to install “rtracklayer” from Bioconductor

install.packages("BiocManager", repos="https://cloud.r-project.org")
BiocManager::install("rtracklayer")

Other code sources use a package devtools from CRAN (which has many prerequisites of its own). This has an installer for github repositories.

e.g., to install package “leidenbase” from a github repository named cole-trapnell-lab:

install.packages("devtools", repos="cloud.r-project.org")
devtools::install_github('cole-trapnell-lab/leidenbase')

Special HPC Examples

Special Considerations

Always read the installation instructions written by the developer of the package you are trying to install. Sometimes there are known dependency errors, requiring you to manually install some packages before you attempt to install the primary target package. System libraries (normally installed as Red Hat RPMs or equivalent) can not be installed by R, and an install may fail because of a missing library or header file, but the error message should give a hint about what needs to be installed. If this is something supported by the Linux distribution, we may be able to add it.

There may also be a requirement for non-standard system libraries. If the special software has been installed and made available as a module, it must be loaded before you try to build an R package which depends on it. E.g., some R packages need the GEOS software, which can be made available by running module load geos .