Get a vector of objects created within in an .R or .Rmd file
Rosalie Bruel
June 10th, 2022
I appreciate the functionality in RStudio that keeps in memory the objects created at previous steps, even after closing RStudio. I find this functionality useful when I have a script that takes a while to run, for which the output is useful but not necessarily that useful that I want to permanently save it to a local file.
However, on occasion, I have scripts I need to run once and that’s it. At the end of said script, I would like to be able to remove from the environment any object that was created by the script. In other words, I would like to be able to use rm(), but instead of doing it for the whole environment (`ls()`), I would like to do it for the environment of a specific script.
I wrote the function get.objects.from.script() to tackle this. You can load the function using the code below:
source("https://raw.githubusercontent.com/rosalieb/miscellaneous/master/R/get.objects.from.script.R")
Skipping the alternative of changing my settings to “do not save/restore the workspace”, below are two alternatives with existing code. Neither met all of my requirements.
One easy way could be to list the files in the environment at the beginning of a script, and list the files after having ran the script, and do the differential of objects. For example, see the script below:
# Start script
initial_objects <- ls()
# Create a bunch of objects
temp <- LETTERS[1:3]
data <- data.frame(x = 1:10, y = 10:1)
p1 <- ggplot(data, aes(x, y)) + geom_point()
# Remove only the objects that were created since initial_objects was last run
# 1. See what is returned: should include "initial_objects" "temp" "data" "p1"
ls()[!ls() %in% initial_objects]
# 2. Remove these objects
rm(list = ls()[!ls() %in% initial_objects])
However, there are at least three scenarios I can think of in which
the listed files (in initial_objects
) wouldn’t be the right
ones:
If by mistake I forgot to run the first line `initial_objects <- ls()`, it will be “too late” to go back, and I will end up having to list the objects manually.
A variant: If by mistake I run the first line multiple times, after running the script a first time (for example, if I run everything above a certain line, or if I run “all chunks above” in a .Rmd).
If there was already an element in the environment with the same name as an object created within the script. I frequently use some names across scripts, for objects I will not keep. For example, I use “temp” for a temporary object, “p1” for a plot, “out” for an output object that I would populate within a loop, etc. If I already have an object “temp” in the environment, it won’t be removed using the method above. However, it is not something I need to keep per my requirements stated in the section “problem”.
A second option would be to run the script in its own environment, which is possible when sourcing a file.
my_env <- new.env()
source("myscript.R", my_env)
rm(my_env)
However, this would only work for a script I do not need to run line by line to visualize the output. This solution does not fit my exact needs either.
The function is pretty simple, and follows three steps:
Screening through the current script (although you could also set
the path to another script if you want). This part relies on the
function getParseData()
from the library
utils.
Keep only the objects (token = “SYMBOL”).
Find and return which objects are in the Global Environment.
The function returns all the objects created in the script. You can
then run rm()
, except that instead of removing everything
from the environment (rm(list = ls())
), you can choose to
remove only the objects created within the current script:
rm(list = get.objects.from.script())
Arguments of the function are:
Arguments | Description | Example |
---|---|---|
path2file | path to file. Extensions can be .R or .Rmd. If .Rmd, will extract the code from the chunks using the knitr library | “my_wd/file.R” / “my_wd/file.Rmd” |
exception | any objects that you wish to preserve. Default = NULL | c(“output”, “p1”) |
source | logical argument, indicating whether to source the script before running the function. It is a necessary condition to run the script first so that objects appear in the environment. Default = FALSE. | FALSE |
message | logical argument, indicating whether to print a message or not. Default = TRUE | TRUE |
Here is an example. Note that the function
get.objects.from.script()
will only work on a saved script,
because it requires an existing path. If you want to run the example
below, you must copy and paste it (see the little copy link in the top
right corner of the chunk for a quick copy!) in an R script, and save
that script to your computer.
source("https://raw.githubusercontent.com/rosalieb/miscellaneous/master/R/get.objects.from.script.R")
# Create a bunch of objects
temp <- LETTERS[1:3]
data <- data.frame(x = 1:10, y = 10:1)
p1 <- ggplot(data, aes(x, y)) + geom_point()
# List the objects in the current script
get.objects.from.script()
# Number of objects in the environment before the function:
length(ls()) # 158
# Remove all the objects in the script (as obtained by
# get.objects.from.script(), with the exception of "p1"
# as specified by the argument 'exception = "p1"'.
rm(list = get.objects.from.script(exception = "p1",
message = FALSE))
# Number of objects in the environment after running the function
length(ls()) # 156 (we decided to keep "p1", so only two objects were removed)
Here is the what it looks like if you run it:
It work well: with get.objects.from.script()
, I do find
the objects I just created. On the following line, I am able to remove
these objects, but also include exception (e.g., if I want to keep “p1”
for example.)
Furthermore, the function can be useful to those of you helping out
colleagues/students. By simply adding
rm(list = get.objects.from.script())
at the end of the
colleague/student’s script, you can remove any objects not relevant to
your current workspace, without having to navigate between workspace.
This solution is less “nuclear” than the rm(list = ls()).
it suggests a clean workspace is the exception rather than the rule;
— Jenny Bryan (@JennyBryan) December 11, 2017
also, if someone’s giving some “drive by” help with your code, this clobbers their workspace (I run lots of student code)