Skip to content

Coding Standards

Steve Linberg edited this page Oct 27, 2022 · 3 revisions

We are working to make the code conform to both the Tidyverse style guide, and Google's R style guide. The latter extends the former with a few (good) additions:

  • don't use attach
  • use explicit returns
  • fully qualify namespaces for non-base functions

The last point above is important, and (unfortunately) complicated.

Qualifying namespaces

Qualifying namespaces (e.g. dplyr::mutate instead of just mutate) is a very good idea with strong justification. It avoids conflicts between packages that have identically-named functions, and disambiguates which one is being referred to (for example, dplyr and igraph both have a union() function, and which one you will get if you just call union() depends solely on which library was loaded last). It is also good for learning (quick: is vcount an igraph function or a network function?).

Unfortunately, it does get complicated in some cases, notably with S3 generics with dispatch methods, like print(). A full discussion of this can be found in Advanced R, but when print is called with a network object, as in:

print(karate.stat)

What actually gets called, via dispatch, is:

print.network(karate.stat)

This might lead one to conclude that the correct way to call this function is to fully qualify it, but you can't say

network::print(karate.stat)

because print isn't a function in network. And although you can say

network::print.network(karate.stat)

Hadley Wickham specifically says not to do this in Advanced R:

...note that S3 methods are functions with a special naming scheme, generic.class(). For example, the factor method for the print() generic is called print.factor(). You should never call the method directly, but instead rely on the generic to find it for you.

This brings us back to just:

print(karate.stat)

which appears to violate Google's otherwise very sensible rule about fully qualifying namespaces. It could amended to say "qualify all functions' namespaces except S3-dispatched methods," but it can be surprisingly difficult to determine whether this is the case for any given function.

This leaves us in the uncomfortable position of not having a hard-and-fast rule here, and so the amended style guide we'll go with is something along the lines of:

Fully qualify most namespaces, except for base functions and S3-dispatched methods that use dot-syntax like print.

That doesn't always work; while union, for instance, is an S3 generic:

> sloop::ftype(union)
[1] "S3"      "generic"

...with no dispatched methods:

> s3_methods_generic("union")
# A tibble: 0 × 4
# … with 4 variables: generic <chr>, class <chr>, visible <lgl>, source <chr>
# ℹ Use `colnames()` to see all variable names

mutate is also an S3 generic:

> ftype(mutate)
[1] "S3"      "generic"

...with what appears to be one dispatched method:

> s3_methods_generic("mutate")
# A tibble: 1 × 4
  generic class      visible source
  <chr>   <chr>      <lgl>   <chr>
1 mutate  data.frame FALSE   registered S3method

However, mutate.data.frame is not a function.

We will still endeavor to qualify the use of functions most of the time, except where it creates problems. This means that we will turn the block

```{r size-solution}
# Find network size (vertex and edge count): igraph
vcount(airport.ig)
ecount(airport.ig)

# Find network size (vertex and edge count): statnet
print(airport.stat)
```

into:

```{r size-solution}
# Find network size (vertex and edge count): igraph
igraph::vcount(airport.ig)
igraph::ecount(airport.ig)

# Find network size (vertex and edge count): statnet
print(airport.stat)
```

but not into:

```{r size-solution}
# Find network size (vertex and edge count): igraph
igraph::vcount(airport.ig)
igraph::ecount(airport.ig)

# Find network size (vertex and edge count): statnet
network::print.network(airport.stat)
```

Dots in names

Dots in names are bad, especially in variable names. Don't use them. R has some bad and inconsistent use of dots in names (like data.frame) that have roots in S and very old code that pre-dates modern standards, but dots in R should only be used in the context of S3 objects and dispatch. It creates confusion with other languages, like Python and Java, that use dots as property indicators or to separate variables from methods.

We have a lot of usage of names like karate.stat and karate.ig to differentiate between statnet and igraph "versions" of a variable. These need to be changed to something like karate_stat and karate_ig. Unfortunately this affects the setup code as well, which there's a lot of, but it needs to be done.

Clone this wiki locally