Abdoul Blog - Base R Essentials

This third part of this series is dedicated to data creation & wrangling.
Wrangling is the entry point to most data analysis workflows. If the {tidyverse} ecosystem offers a bench of features, to filter, mutate …etc, particularly via the {dplyr} package, you can combine some base R functions to achieve the same results.

Data creation

Functions	Tasks / Examples
`c()`	It is the generic function which combines its arguments to make *atomic* vector¹. Ex:`x <- c(1, 2, 3)`,`x <- c("Paris", "Bordeaux", "Le Mans")`, `x <- c(TRUE, TRUE, FALSE)`
`from:to`	generates a sequence of integers. Ex:`1:3` produce the vector `c(1,2,3)` `0:-2` produces the vector `c(0,-1,-2)`
`seq(from, to, by = 1)`	generates a sequence from `from` to `to` with a step of value passed to `by`. Ex:`seq(1,8,by=2)` produces the vector `c(1,3,5,7)`. The function has other arguments that you should sometimes use separately with `by` argument. Ex: `seq(1,10,length.out= 3)` produces the vector `c(1.0, 5.5, 10.0)`. The `length.out` argument tells to generate a sequence of `length.out` equally spaced values from `from` to `to`.
`seq_along(x)`	produces, where `x` is a vector, the integer sequence `1, 2, ..., length(x)`. Ex:`seq_along(c(4,5,6,7))` produces the vector `c(1,2,3,4)`. The function is very helpful for the `for loops`.
`seq_len(length.out)`	generates the integer sequence `1,2, ..., length.out` unless `length.out = 0`, when it generates `integer(0)`. Ex:`seq_len(4)` produces the vector `c(1,2,3,4)`.
`rep(x, times)`	repeats the `x` vector `times` times. Ex: `rep(c(2,3), 2)` produces `c(2,3,2,3)`. You can use the argument `each` separately with `times` to specify how many times each element of `x` should be repeated. Ex:`rep(c(2,3), each=3)` produces `c(2,2,2,3,3,3)`
`factor(x, levels)`	transforms a vector `x` in factor.² Ex: `factor(c("Apple", "Strawberry", "Raspberry", "Apple"))` creates a factor of 3 categories: `Apple`, `Strawberry` and `Raspberry`.
`list(...)`	creates a list with arguments named or not, which can have different lengths and types. Ex: `list(x=1:2, b ="Myriam", c= TRUE)`
`data.frame()`	creates a data frame with named or not vectors. Ex:`data.frame(sutdent = c("Adam Kennington", "Pamela Ritchie"), mark = c("A-", "A+"))`. Shorten vectors are recycled to fit the length of the longest.
`rbind()`	combines arguments by row.
`cbind()`	combines arguments by column.

Data wrangling

Functions	Tasks / Examples
`x[i]`	returns the `ith` element of a vector. If `x` is list, `i` can be an argument, so `x[i]` returns the element(s) of `i`. Ex: `x <- list(a = "")` `x[["a"]]`
`x[[n]]`	returns the `nth` element of a list ³.
`x[-n]`	returns all the elements of `x` vector except the `nth`.
`x[1:n]`	returns the first `n` elements of the `x` vector.
`x[-(1:n)]`	returns all elements of `x` vector except the first `n`.
`x[c(1,3)]`	returns the `1st` and the `3rd` elements of `x` vector.
`x[-c(1,3)]`	returns all elements except the `1st` and the `3rd` elements of `x` vector.
`x[["name"]]` / `x$name`	returns the column named `name` when `x` is a data frame.
`as.data.frame(x)`, `as.numeric(x)`, `as.logical(x)`, `as.character(x)` … etc	converts `x` respectively to type `data.frame`, `numeric`, `logical` or `logical`. You can view all conversion methods with `methods(as)`.
`is.na(x)`,`is.null(x)`,`is.array(x)`,`is.data.frame(x)`, `is.numeric(x)`, `is.logical(x)`, `is.character(x)` … etc	returns `TRUE` or `FALSE` if `x` is of the type. Ex: `is.logical(4)` returns `FALSE`, `is.double(4.4)` return `TRUE`. Here too, you can view all type checking methods with `methods(is)`.
`nchar(x)`	takes a character⁴ vector and returns a vector the number of characters of the vector elements. Ex: `nchar(c("banana", "strawberry", 27))` returns `c(6,10,2)`.
`length(x)`	gets or sets the length of vectors (including list). Ex: `length(1:4)` returns `4`. `length(list(1:5,1:4))` returns 2.
`lengths(x)`	gets the length of each element of a list or atomic vector. Ex: `lengths(list(1:3,1:5))` returns `c(3,5)`.
`append(x,y, after = length(x))`	add elements of `y` vector to `x` vector after the subscript `after`. Ex: `append(1:8, 0:1, after = 2)` that can be translated by add the vector `c(0,1)` `after` the second element of the vector `c(1,2,3,4,5,6,7,8)` returns `c(1,2,0,1,3,4,5,6,7,8)`.
`nrow(x)`/`NROW(x)`	returns the number of rows of `x` when `x` is a data frame or a matrix. When `x` is a vector, `NROW` considers it as a matrix and returns the number of elements.
`ncol(x)`/`NCOL(x)`	same as `nrow`/`NROW` but for columns
`which.min(x)`	returns the index of the (first) minimum of a numeric (logical) vector. Ex: `which.min(c(1,0,4,5,0))` returns `2` which corresponds to the index of the first `0` (the minimum of the numeric vector).
`which.max(x)`	same as `which.min` but for the maximum. Ex: `which.max(c(5,3,-1,2,5))` returns `1`.
`which(x == a)`	return the index of `x` for which the result of the logical operation `x == a` is `TRUE`. Ex: `x <- c(0,1,3,4,2,4,5)` `which(x %% 2 == 0)` returns `c(1,4,5,6)` which corresponds to the index of the even elements of the numeric vector.
`rev(x)`	returns the reversed version of a vector or a reversible object `x`. Ex: `rev(c("Roger","Rafa","Novak"))` returns `c("Novak","Rafa","Roger")`.
`sort(x, decreasing = F)`	sorts a vector of factor into ascending (`decreasing = FALSE`) or descending order (`decreasing = TRUE`). Ex: `sort(c(1,4,3,7,-1,8), decreasing = T)` returns `c(8,7,4,3,1,-1)`
`order(..., decreasing = FALSE)`	returns a permutation which rearranges its first argument into ascending or descending order, breaking ties by further arguments. Ex:`my_df <- data.frame(name = c("Oliver","Frank", "Mohsen"), age = c(15, 25, 21))` `my_df[order(my_df$age),]` returns `data.frame(name = c("Oliver","Mohsen","Frank"), age = c(15,21,25))`
`rank(x)`	returns the sample ranks of the values in a vector. Ex: `rank(c(8,4,2,5))` returns `c(4,2,1,3)`. Equal and missing values can be handled via `ties.method` and `na.last` arguments. Feel free to read the documentation to know the different options.
`cut(x, breaks)`	converts the numeric vector `x` to factor by dividing the range of `x` into intervals and codes the values. Ex: `cut(1:5, 3)` returns the factor vector `c((0.996,2.33],(0.996,2.33],(2.33,3.67],(3.67,5], (3.67,5])` where levels are: `(0.996,2.33] (2.33,3.67] (3.67,5]` `cut(1:5, breaks = c(0,2,3,5))` returns the factor vector `c((0,2],(0,2], (2,3],(3,5],(3,5])` where levels are: `(0,2] (2,3] (3,5]`. There are many other arguments to set the results labels, to include the lowest … etc.
`unique(x)`	returns a vector/data frame where `x` duplicate elements/rows are removed.
`table(...)`	returns a contingency table of the counts at each combination of factor levels.

Well, we come to the end of this third and penultimate part of this series on data structures and wrangling with Base R. Of course again, I cannot claim that I cover all the existing functions, that is not the goal, but I am listing a few that I consider essential to be aware of for basic data wrangling with R. Many people may note the absence of functions to perform mathematical operations. They will be covered in the last part of the series.

Footnotes

R has six basic atomic vector types: logical, integer, double, character, complex and raw.↩︎
category or enumerated type terminologies can also be factor.↩︎
The annotation works for a vector too, but it is more convenient to write x[n]↩︎
If it is not a character vector, it is coerced to be one.↩︎

Citation

BibTeX citation:

@online{issabida2023,
  author = {Abdoul ISSA BIDA},
  title = {Base {R} {Essentials} - {Part} 3},
  date = {2023-02-18},
  url = {https://www.abdoulblog.com},
  langid = {en}
}

For attribution, please cite this work as:

Abdoul ISSA BIDA. 2023. “Base R Essentials - Part 3.” February 18, 2023. https://www.abdoulblog.com.