Tuto7_getobjects.knit

Rosalie Bruel
June 10^th, 2022

1. Motivation

I appreciate the functionality in RStudio that keeps in memory the objects created at previous steps, even after closing RStudio. I find this functionality useful when I have a script that takes a while to run, for which the output is useful but not necessarily that useful that I want to permanently save it to a local file.

However, on occasion, I have scripts I need to run once and that’s it. At the end of said script, I would like to be able to remove from the environment any object that was created by the script. In other words, I would like to be able to use rm(), but instead of doing it for the whole environment (`ls()`), I would like to do it for the environment of a specific script.

I wrote the function get.objects.from.script() to tackle this. You can load the function using the code below:

source("https://raw.githubusercontent.com/rosalieb/miscellaneous/master/R/get.objects.from.script.R")

2. Easy alternatives and why they did not fit my needs (but might fit yours)

Skipping the alternative of changing my settings to “do not save/restore the workspace”, below are two alternatives with existing code. Neither met all of my requirements.

2.1. Alternative 1: list objects before and after running the script

One easy way could be to list the files in the environment at the beginning of a script, and list the files after having ran the script, and do the differential of objects. For example, see the script below:

# Start script
initial_objects <- ls()

# Create a bunch of objects
temp <- LETTERS[1:3]
data <- data.frame(x = 1:10, y = 10:1)
p1 <- ggplot(data, aes(x, y)) + geom_point()

# Remove only the objects that were created since initial_objects was last run
# 1. See what is returned: should include "initial_objects" "temp" "data" "p1"  
ls()[!ls() %in% initial_objects]
# 2. Remove these objects
rm(list = ls()[!ls() %in% initial_objects])

However, there are at least three scenarios I can think of in which the listed files (in initial_objects) wouldn’t be the right ones:

If by mistake I forgot to run the first line `initial_objects <- ls()`, it will be “too late” to go back, and I will end up having to list the objects manually.
A variant: If by mistake I run the first line multiple times, after running the script a first time (for example, if I run everything above a certain line, or if I run “all chunks above” in a .Rmd).
If there was already an element in the environment with the same name as an object created within the script. I frequently use some names across scripts, for objects I will not keep. For example, I use “temp” for a temporary object, “p1” for a plot, “out” for an output object that I would populate within a loop, etc. If I already have an object “temp” in the environment, it won’t be removed using the method above. However, it is not something I need to keep per my requirements stated in the section “problem”.

2.2. Alternative 2: run the script in its own environment

A second option would be to run the script in its own environment, which is possible when sourcing a file.

my_env <- new.env()  
source("myscript.R", my_env)  
rm(my_env)

However, this would only work for a script I do not need to run line by line to visualize the output. This solution does not fit my exact needs either.

3. A rapid overview of the function get.objects.from.script()

The function is pretty simple, and follows three steps:

Screening through the current script (although you could also set the path to another script if you want). This part relies on the function getParseData() from the library utils.
Keep only the objects (token = “SYMBOL”).
Find and return which objects are in the Global Environment.

The function returns all the objects created in the script. You can then run rm(), except that instead of removing everything from the environment (rm(list = ls())), you can choose to remove only the objects created within the current script:

rm(list = get.objects.from.script())

Arguments of the function are:

Arguments	Description	Example
path2file	path to file. Extensions can be .R or .Rmd. If .Rmd, will extract the code from the chunks using the knitr library	“my_wd/file.R” / “my_wd/file.Rmd”
exception	any objects that you wish to preserve. Default = NULL	c(“output”, “p1”)
source	logical argument, indicating whether to source the script before running the function. It is a necessary condition to run the script first so that objects appear in the environment. Default = FALSE.	FALSE
message	logical argument, indicating whether to print a message or not. Default = TRUE	TRUE

4. Example

Here is an example. Note that the function get.objects.from.script() will only work on a saved script, because it requires an existing path. If you want to run the example below, you must copy and paste it (see the little copy link in the top right corner of the chunk for a quick copy!) in an R script, and save that script to your computer.

source("https://raw.githubusercontent.com/rosalieb/miscellaneous/master/R/get.objects.from.script.R")

# Create a bunch of objects
temp <- LETTERS[1:3]
data <- data.frame(x = 1:10, y = 10:1)
p1 <- ggplot(data, aes(x, y)) + geom_point()

# List the objects in the current script
get.objects.from.script()

# Number of objects in the environment before the function:
length(ls()) # 158
# Remove all the objects in the script (as obtained by 
#   get.objects.from.script(), with the exception of "p1"
#   as specified by the argument 'exception = "p1"'.
rm(list = get.objects.from.script(exception = "p1",
                      message = FALSE))
# Number of objects in the environment after running the function
length(ls()) # 156 (we decided to keep "p1", so only two objects were removed)

Here is the what it looks like if you run it:

It work well: with get.objects.from.script(), I do find the objects I just created. On the following line, I am able to remove these objects, but also include exception (e.g., if I want to keep “p1” for example.)

Furthermore, the function can be useful to those of you helping out colleagues/students. By simply adding rm(list = get.objects.from.script()) at the end of the colleague/student’s script, you can remove any objects not relevant to your current workspace, without having to navigate between workspace. This solution is less “nuclear” than the rm(list = ls()).

it suggests a clean workspace is the exception rather than the rule;
also, if someone’s giving some “drive by” help with your code, this clobbers their workspace (I run lots of student code)
— Jenny Bryan (@JennyBryan) December 11, 2017

is loading comments…

Get a vector of objects created within in an .R or .Rmd file