Less

(unnecessary packages loaded)

is more

(RAM for your other analyses)

Rosalie Bruel
October 14th, 2019


↩︎ Back

1. Problem

Last year I asked twitter for a solution for identifying packages that are no longer needed in a script.

User @jmcphers recommended packrat, which I used for one of my project. packrat is great to keep track of the package version needed to run a script, but I don’t like that it creates a copy of each package: it means I had to re-download all the packages needed for this specific project. It is just too much for my needs.

I re-thought about this issue today, and realized there might be another easy solution.

I wrote the function lessismore() to hopefully tackle this. I don’t have time to package it, but you can load it using the code below:

source("https://raw.githubusercontent.com/rosalieb/miscellaneous/master/R/lessismore.R")

2. Rapid overview

The function require the package sos.

I edited the code from Nicholas Cooper’s package NCmisc. He has a lot of great function in there. I didn’t reuse his function “list.functions.in.file()” because I wanted to keep the information on how many times each function was used + his code does not work with .Rmd, so it required some work anyway.

For each function that was not associated to a package, I conducted a second search with sos::findFn(), that uses Rsitesearch for matches in vignettes, help pages, or task views. This one takes longer, so that’s why I did not use that function systematically (and first went through the option coded in base R).

lessismore() will also return any function that was not matched. It might be because some of the code was uncorrectly identified as functions though.

Arguments of the function are:

Arguments Description Example
packages year(plktn$VisitDate[1]) .packages()
path2file path to file. Extensions can be .R or .Rmd. If .Rmd, will extract the code from the chunks using the knitr library “my_wd/file.R” / “my_wd/file.Rmd”
plot_output whether you want to visualize a plot showing the most used functions. If TRUE, will also require ggplot2 and plotly FALSE
thresh threshold, for packages that may be using the same names than really frequent functions 2

I added the threshold argument (thresh) at the last minute. I noticed that some functions are used in so many packages that the function becomes not useful. For example, plot() is used in the packages raster, sp, changepoint, graphics, and in base R. I’m not sure this “threshold” solves all the problem, but it might help a bit.

3. Example

I tried the script on a R markdown file I am currently working on. Here, I am trying to see whether all the packages I have loaded for this session (.packages()) are needed in the script which pathname is “R/EwE_model_param_LakeChamplain.Rmd”, and I want to plot which are the functions I used the most (plot_output = T).

tmp <- lessismore(packages = .packages(), 
                   path2file = "R/EwE_model_param_LakeChamplain.Rmd",
                   plot_output = T)

Here is the result:

I found that 14 packages are loaded and but not used in this script. I am using some of these on other scripts I worked on today, but I don’t need readtext for example (I loaded it as I was writing the lessismore() function).

However, I am using FSAdata! I call at some point the dataset FSAdata::TroutperchLM1. The function does not work for objects which token is different from “SYMBOL_FUNCTION_CALL”. If I or someone else ever want to update the function, that could be an aspect to improve.

4. Summing up

Eventually, instead of looking at all packages I have loaded (with .packages()), I should use a vector of the packages I load at the beginning of my Rmarkdown.

Bonus, you can have a look at the most-used functions in your script:

My most-used function is c(), used to create vectors — nothing fancy here 😒.

Let me know if you end up using the function, if it’s working or not, and if you have ideas to make the function more efficient!

is loading comments…