Everybody develops its own coding habits and style. Some people take
a lot of effort in making their source code readable,
while others don’t bother at all. Working together with different people
is easier when everyone uses the same standard. The
checklist
package defines a set of standards and provides
tools to validate whether your project or R package adheres to these
standards. You can run these tools interactively on your machine. You
can also add these checks as GitHub actions, which
runs them automatically after every push to the repository
on GitHub.
Coding style
The most visible part of a coding style is the naming convention.
People use many different styles of naming conventions within the R
ecosystem (Bååth 2012). Popular ones are
alllowercase
, period.separated
,
underscore_separated
, lowerCamelCase
and
UpperCamelCase
. We picked underscore_separated
because that is the naming convention of the tidyverse
packages. It
is also the default setting in the lintr
package which we use to do the static code
analysis.
At first this seems a lot to memorise. RStudio makes things easier
when you activate all diagnostic options (Tools > Global
options > Code > Diagnostics). This
highlights several problems by showing a squiggly line and/or a warning
icon at the line number. Instead of learning all the rules by heart, run
check_lintr()
regularly and fix any issues that come up. Do
this when working on every single project. Doing so enforces you to
consistently use the same coding style, making it easy to learn and use
it.
Rules for coding style
-
underscore_separated
names for functions, parameters and variables. - A line of code or comments must be no longer than 80 characters. Pro tip: have RStudio display this margin in the editor. Tools > Global options > Code > Display > Show margin
- Object names must not be longer than 30 characters.
- Start a new line
- after the pipe (
%>%
) - never before but always after
{
- before
}
- after the pipe (
- Use spaces instead of tabs. Pro tip: make RStudio place 2 spaces when you hit the tab key. Tools > Global options > Code > Editing > Insert spaces for tabs
- Use spaces consistently
- Use exactly one space before and after
- assignments
<-
,->
,=
- operators like
+
,-
,*
,/
, …
- assignments
- No space before and one space after
,
- No space after or before
(
or[
- except in constructs like
if ()
,for ()
,while ()
- except in constructs like
- One space between
)
and{
, e.g.function () {
- Use exactly one space before and after
- Use double quotes (
"
) to define character strings. - No trailing whitespace
- spaces at the end of a line
- blank lines at the end of the script
Static code analysis checks
- Is an object defined before you use it?
- Do you use an object after you defined it?
- Use
<-
or->
to assign something. Only use=
to pass arguments to a function (e.g.check_package(fail = TRUE)
). - Use
is.na(x)
instead ofx == NA
. - Use
seq_len()
orseq_along()
instead of1:length(x)
,1:nrow(x)
, … Advantage: whenlength(x) == 0
,1:length(x)
yieldsc(1, 0)
, whereasseq_along(x)
would yield an empty vector. - Don’t store code in comments. If you don’t want to lose code, use
version control systems like
git
. If it is code that you need to run only under special circumstances, then either put the code in a separate script and run is manually or write an if-else were you run the code automatically when needed. - Avoid code with lots of nested loop or if statements. If the code is
too complex, you’ll get a warning that the cyclomatic
complexity is too high. Tips for reducing the code complexity:
- Use
assertthat::assert_that()
to validate object or conditions instead ofif() stop()
. - See if you can use
ifelse()
instead ofif()
. - Split the main function of your code over sub functions.
- Don’t use else if not strictly necessary.
- Use
File name conventions
To make this easier to remember we choose the same name conventions
for file names as for objects. We acknowledge that these rules sometimes
clash with requirements from other sources
(e.g. DESCRIPTION
in an R package, README.md
on GitHub, .gitignore
for git, …). In such case we allow
the file names as required by R, git or GitHub. When
check_filename()
does unfairly not allow a certain file or
folder name, then please open an issue on GitHub and
motivate why this should be allowed.
Rules for folder names
- Folder names should only contain lower case letters, numbers and
underscore (
_
). - They can start with a single dot (
.
).
Bundling your code in a package
Most users think of an R package as a collection of generic functions that they can use to run their analysis. However, an R package is a useful way to bundle and document a stand-alone analysis too! Suppose you want to pass your code to a collaborator or your future self who is working on a different computer. If you have a project folder with a bunch of files, people will need to get to know your project structure, find out what scripts to run and which dependencies they need. Unless you documented everything well they (including your future self!) will have a hard time figuring out how things work.
Having the analysis as a package and running
check_package()
to ensure a minimal quality standard, makes
things a lot easier for the user. Agreed, it will take a bit more time
to create the analysis, especially with the first few projects. In the
long run you save time due to a better quality of your code. Try to
start by packaging a recurrent analysis or standardised report when you
want to learn writing a package. Once you have some experience, it is
little overhead to do it for smaller analysis. Keep in mind that you
seldom run an analysis exactly once.
Benefits
- The package itself is a way to cite (a specific version of) the analysis in a report or paper.
- You have to list all dependencies on other R packages. This makes
installing your code as simple as running
remotes::install_github("inbo/packagename")
. - You must split your analysis in a set of functions. Say goodbye to scripts with thousands lines of code.
- Functions make it easy to re-use code. Need to run the same thing with a different parameter value? Add the parameter as an argument to the function and run the function once for every different parameter value. This avoids the need to copy-paste large chunks of scripts and replace a few values. The copy-paste work flow typically results in hard-to-read long scripts. Imagine you made a mistake in the code and copy-pasted that mistake several times before you found it. You have to check your entire project to fix the mistake several times. Having it as a function reduces the workload to fixing only the function.
- Packages require that every object is either defined within the package or imported from another package. Global variables are not allowed. The user only needs to load your package and run the function with the required arguments. The results will not depend on any other packages loaded nor by user-defined objects like vectors or dataframes (unless the user passes them explicitly as arguments to a function).
- A package gives the opportunity to add documentation to your code. Afterwards you can simply consult this documentation rather than having to dig into your code to find out what it is actually doing. Every function needs at least a title and an entry for every argument.
- Most likely you would still need a short script that combines a few
high level functions of your package to run the analysis. The
inst
folder is an ideal place to bundle such scripts within the package. You can also use it to store small (!) datasets or rmarkdown reports.