Some Basics of the R Language

Thomas Petzoldt
2017-11-01

Prerequisites

R 3.x is installed
Recent version of RStudio

Basic intuitive experience with R
vectors, arithmetics
basic plotting
function arguments

Expressions and Assignments

Expression

1 - pi + exp(1.7)

[1] 3.332355

is printed to the screen
the [1] indicates that the value shown at the beginning of the line is the first (and here the only) element

Assignment

a <- 1 - pi + exp(1.7)

The expression on the left hand side is assigned to the variable on the right.
The arrow is spelled as “a gets …”

Assignments

Assignment of a constant and a variable

x <- 1.3
y <- "hello"
a <- x

Assignment in opposite direction (rarely used)

x -> b

multiple assignment

x <- a <- b

Equal sign works similar to <- but is less powerful.

x = a

Super Assignment (for special cases)

x <<- 2

The elements of the R language

A short classification of R's language elemnts:

objects
constants
variables
operators
functions

Objects, constants, variables

Everything stored in R's memory is an object. Objects are specialized data structures that can be simple or very complex.

Objects can be constant or variable.

constants: 1, 123, 5.6, 5e7, “hello”

variables: can change their value are referenced by variable names (symbols)

x <- 2 # x is a variable, 2 a constant

Valid variable names: A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number.

Special characters, except _ and . (underscore and dot) are not allowed.

International characters (e.g German umlauts ä, ö, ü, …) are possible, but not recommended.

Allowed and disallowed variable names

correct:

x, y, X, x1, i, j, k,
value, test, myVariableName, do_something,
.hidden, .x1

forbidden:

1x, .1x (starts with a number)
!, @, $, #, space, comma, semicolon and other special characters

reserved words cannot be used as variable names:

if, else, repeat, while, function, for, in, next, break
TRUE, FALSE, NULL, Inf, NaN, NA, NA_integer_, NA_real_, NA_complex_, NA_character_,
…, ..1, ..2

Note: R is case sensitive, x and X, value and Value are different.

Operators

operators

Functions

Pre-defined functions:

have a return value or a side effect
examples with return value: sin(x), log(x)
examples side effect: plot(x), print(x)
both return value and side efect: hist(x)

Arguments: mandatory or optional, un-named or named

plot(1:4, c(3,4,3,6), type=“l”, col=“red”)
if argument names are used (with the “=” sigh), then argument order does not matter

User-defined functions:

can be used to extend R
will be discussed later

Functions have a name that is followed by arguments in round parentheses.

Parentheses

parentheses

Data objects

R supports different classes of data objects.

Data objects can contain single values, vectors, matrices, tables, numbers, text and even maps, sound, images or video sequences.

We start with vectors, matrices and arrays, and data frames.

Vectors, matrices and arrays

vectors = 1D, matrices = 2D and arrays = n-D
data are arranged into rows, columns, layers, etc.
data filled in column-wise, but structure can always be changed

x <- 1:20
x

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

y <- matrix(x, nrow=5, ncol=4)
y

     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

as.vector(y)

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

Vectors, matrices and arrays II

recycling rule if the number of elements is too small

x <- matrix(0, nrow=5, ncol=4)
x

     [,1] [,2] [,3] [,4]
[1,]    0    0    0    0
[2,]    0    0    0    0
[3,]    0    0    0    0
[4,]    0    0    0    0
[5,]    0    0    0    0

x <- matrix(1:4, nrow=5, ncol=4)
x

     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    2    3    4    1
[3,]    3    4    1    2
[4,]    4    1    2    3
[5,]    1    2    3    4

row-wise creation of a matrix

x <- matrix(1:20, nrow=5, ncol=4, byrow=TRUE)
x

     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12
[4,]   13   14   15   16
[5,]   17   18   19   20

transpose of a matrix

x <- t(x)
x

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20

Accessing array elements

a three dimensional array
row, column, layer/page
sub-matrices (slices)

x <- array(1:24, dim=c(3,4,2))
x

, , 1

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

, , 2

     [,1] [,2] [,3] [,4]
[1,]   13   16   19   22
[2,]   14   17   20   23
[3,]   15   18   21   24

elements of a matrix or array

x[1, 3, 1] # single element

[1] 7

x[ , 3, 1] # 3rd column of 1st layer

[1] 7 8 9

x[ ,  , 2] # second layer

     [,1] [,2] [,3] [,4]
[1,]   13   16   19   22
[2,]   14   17   20   23
[3,]   15   18   21   24

x[1,  ,  ] # another slice

     [,1] [,2]
[1,]    1   13
[2,]    4   16
[3,]    7   19
[4,]   10   22

Reordering and indirect indexing

Original matrix

(x <- matrix(1:20, nrow=4))

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20

Inverted row order

x[4:1, ]

     [,1] [,2] [,3] [,4] [,5]
[1,]    4    8   12   16   20
[2,]    3    7   11   15   19
[3,]    2    6   10   14   18
[4,]    1    5    9   13   17

Indirect index

x[c(1,2,1,2), c(1,3,2,5,4)]

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    9    5   17   13
[2,]    2   10    6   18   14
[3,]    1    9    5   17   13
[4,]    2   10    6   18   14

Logical selection

x[c(FALSE, TRUE, FALSE, TRUE), ]

     [,1] [,2] [,3] [,4] [,5]
[1,]    2    6   10   14   18
[2,]    4    8   12   16   20

Surprise?

x[c(0,1,0,1), ]

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    1    5    9   13   17

Matrix algebra

Matrix

(x <- matrix(1:4,   nrow=2))

     [,1] [,2]
[1,]    1    3
[2,]    2    4

Diagonal matrix

(y <- diag(2))

     [,1] [,2]
[1,]    1    0
[2,]    0    1

Element wise addition and multiplication

x * (y + 1)

     [,1] [,2]
[1,]    2    3
[2,]    2    8

Outer product (and sum)

1:4 %o% 1:4

     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    2    4    6    8
[3,]    3    6    9   12
[4,]    4    8   12   16

outer(1:4, 1:4, FUN = "+")

     [,1] [,2] [,3] [,4]
[1,]    2    3    4    5
[2,]    3    4    5    6
[3,]    4    5    6    7
[4,]    5    6    7    8

Matrix multiplication

x %*% y

     [,1] [,2]
[1,]    1    3
[2,]    2    4

Matrix multiplication in detail

matrix multiplication

Transpose and inverse

Matrix

x <- matrix(c(1,2,3,4,3,2,5,4,6),
            nrow=3)
x

     [,1] [,2] [,3]
[1,]    1    4    5
[2,]    2    3    4
[3,]    3    2    6

Transpose

t(x)

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    3    2
[3,]    5    4    6

Inverse ($ x^{-1} $)

solve(x)

        [,1]    [,2]    [,3]
[1,] -0.6667  0.9333 -0.0667
[2,]  0.0000  0.6000 -0.4000
[3,]  0.3333 -0.6667  0.3333

$ x \cdot x^{-1} $

x %*% solve(x)

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

Linear system of equations

\[ \begin{aligned} 3x && + && 2y && - && z && = && 1 \\ 2x && - && 2y && + && 4z && = && -2 \\ -x && + && 1/2y && - && z && = && 0 \end{aligned} \]

A <- matrix(c(3,  2,   -1,
             2,  -2,    4,
            -1,   0.5, -1), nrow=3, byrow=TRUE)
b <- c(1, -2, 0)

\[ \begin{aligned} Ax &= b\\ x &= A^{-1}b \end{aligned} \]

solve(A) %*% b

     [,1]
[1,]    1
[2,]   -2
[3,]   -2

Data frames

represents tabular data
similar to matrices, but rows are allowed to contain different types of data in their columns
typically imported from a file with read.table or read.csv

cities <- read.csv("data/cities.csv", header=TRUE)
cities

               Name    Country Population Latitude Longitude IsCapital
1  Fürstenfeldbruck    Germany      34033  48.1690   11.2340     FALSE
2             Dhaka Bangladesh   13000000  23.7500   90.3700      TRUE
3       Ulaanbaatar   Mongolia    3010000  47.9170  106.8830      TRUE
4           Shantou      China    5320000  23.3500  116.6700     FALSE
5           Kampala     Uganda    1659000   0.3310   32.5830      TRUE
6           Cottbus    Germany     100000  51.7650   14.3280     FALSE
7           Nairobi      Kenya    3100000   1.2833   36.8167      TRUE
8             Hanoi    Vietnam    1452055  21.0300  105.8400      TRUE
9          Bacgiang    Vietnam      53739  21.2800  106.1900     FALSE
10       Addis Abba   Ethiopia    2823167   9.0300   38.7400      TRUE
11        Hyderabad      India    3632094  17.4000   78.4800     FALSE

Data import assistant

File –> Import Dataset

Several options are available, depending on RStudio's version.

“From text (base)” uses the classical R functions
“From text (readr)” is more modern and uses an add-on package
“From Excel”“ can read Excel files if (and only if) they have a clear tabular structure

Note: The examples in this course are best tested with "From text (base)”!!!

From text (base)

From text (readr)

Save data in an Excel-compatible text format

Data frame in Excel

English number format (. as decimal):

write.table(cities, "output.csv", row.names = FALSE, sep=",")

German number format (, as decimal):

write.table(cities, "output.csv", row.names = FALSE, sep=";", dec=",")

Lists

most flexible data type in R
allows tree-like structure

Creation of lists

L1 <- list(a=1:10, b=c(1,2,3), x="hello")

lists within lists
str shows tree-like structure:

L2 <- list(a=5:7, b=L1)
str(L2)

List of 2
 $ a: int [1:3] 5 6 7
 $ b:List of 3
  ..$ a: int [1:10] 1 2 3 4 5 6 7 8 9 10
  ..$ b: num [1:3] 1 2 3
  ..$ x: chr "hello"

Access to list elements by names

L2$a

[1] 5 6 7

L2$b$a

 [1]  1  2  3  4  5  6  7  8  9 10

or with indices

L2[1]   # a list with 1 element

$a
[1] 5 6 7

L2[[1]] # content of 1st element

[1] 5 6 7

Lists II

Convert list to vector

unlist(L2)

     a1      a2      a3    b.a1    b.a2    b.a3    b.a4    b.a5    b.a6 
    "5"     "6"     "7"     "1"     "2"     "3"     "4"     "5"     "6" 
   b.a7    b.a8    b.a9   b.a10    b.b1    b.b2    b.b3     b.x 
    "7"     "8"     "9"    "10"     "1"     "2"     "3" "hello"

str(unlist(L2))

 Named chr [1:17] "5" "6" "7" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" ...
 - attr(*, "names")= chr [1:17] "a1" "a2" "a3" "b.a1" ...

Flatten list (remove only top level of list)

str(unlist(L2, recursive = FALSE))

List of 6
 $ a1 : int 5
 $ a2 : int 6
 $ a3 : int 7
 $ b.a: int [1:10] 1 2 3 4 5 6 7 8 9 10
 $ b.b: num [1:3] 1 2 3
 $ b.x: chr "hello"

Lists, vectors and data frames

Convert vector to list

x <- 1:3
str(as.list(x))

List of 3
 $ : int 1
 $ : int 2
 $ : int 3

Convert matrix to data frame

x <- matrix(1:16, nrow=4)
df <- as.data.frame(x)
is.list(df)

[1] TRUE

df

  V1 V2 V3 V4
1  1  5  9 13
2  2  6 10 14
3  3  7 11 15
4  4  8 12 16

Convert data frame to matrix

as.matrix(df)

     V1 V2 V3 V4
[1,]  1  5  9 13
[2,]  2  6 10 14
[3,]  3  7 11 15
[4,]  4  8 12 16

Append column to data frame

df2 <- cbind(df, id=c("first", "second", "third", "fourth"))

Data frame with character column

as.matrix(df2)

     V1  V2  V3   V4   id      
[1,] "1" "5" " 9" "13" "first" 
[2,] "2" "6" "10" "14" "second"
[3,] "3" "7" "11" "15" "third" 
[4,] "4" "8" "12" "16" "fourth"

Naming of elements

During creation

x <- c(a=1.2, b=2.3, c=6)
L <- list(a=1:3, b="hello")

With names-function

names(L)

[1] "a" "b"

names(L) <- c("numbers", "text")
names(L)

[1] "numbers" "text"

x <- 1:5
names(x) <- letters[1:5]
x

a b c d e 
1 2 3 4 5

Select and reorder data frame columns

x <- matrix(1:16, nrow=4)
df <- as.data.frame(x)
df

  V1 V2 V3 V4
1  1  5  9 13
2  2  6 10 14
3  3  7 11 15
4  4  8 12 16

names(df) <- c("N", "P", "O2", "C")
df

  N P O2  C
1 1 5  9 13
2 2 6 10 14
3 3 7 11 15
4 4 8 12 16

df2 <- df[c("C", "N", "P")]
df2

Apply FUN to all elements of a list

df  # data frame of previous slide

  N P O2  C
1 1 5  9 13
2 2 6 10 14
3 3 7 11 15
4 4 8 12 16

lapply(df, mean)  # returns list

$N
[1] 2.5

$P
[1] 6.5

$O2
[1] 10.5

$C
[1] 14.5

sapply(df, mean)  # returns vector

   N    P   O2    C 
 2.5  6.5 10.5 14.5

Row wise apply

apply(df, MARGIN = 1, sum)

[1] 28 32 36 40

Column wise apply

apply(df, MARGIN = 2, sum)

 N  P O2  C 
10 26 42 58

Apply user defined function

se <- function(x)
  sd(x)/sqrt(length(x))

sapply(df, se)

     N      P     O2      C 
0.6455 0.6455 0.6455 0.6455

Loops and conditional execution

Loops

for (i in 1:4) {
  cat(i, 2*i, "\n")
}

j <- 1; x <- 0
while (j > 1e-3) {
  j <- 0.1 * j
  x <- x + j
  cat(j, x, "\n")
}

0.1 0.1 
0.01 0.11 
0.001 0.111 
1e-04 0.1111

In many cases, loops can be avoided by using vectors and matrices or apply.

x <- 1
repeat {
 x <- 0.1*x
 cat(x, "\n")
 if (x < 1e-4) break
}

0.1 
0.01 
0.001 
1e-04 
1e-05

for (i in 1:3) {
  for (j in c(1,3,5)) {
    cat(i, i*j, "\n")
  }
}

Avoidable loops

Column means of a data frame

## a data frame
df <- data.frame(
  N=1:4, P=5:8, O2=9:12, C=13:16
)

## loop
m <- numeric(4)
for(i in 1:4) {
 m[i] <- mean(df[,i])
}
m

[1]  2.5  6.5 10.5 14.5

easier without loop

sapply(df, mean)

   N    P   O2    C 
 2.5  6.5 10.5 14.5

… also possible colMeans

An infinite series:

\[ \sum_{k=1}^{\infty}\frac{(-1)^{k-1}}{2k-1} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} \]

x <- 0
for (k in seq(1, 1e5)) {
  enum  <- (-1)^(k-1)
  denom <- 2*k-1
  x <- x + enum/denom
}
4 * x

[1] 3.141583

$ \Rightarrow $ Can you vectorize this?

Neccessary loop

The same series:

\[ \sum_{k=1}^{\infty}\frac{(-1)^{k-1}}{2k-1} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} \]

x <- 0
k <- 0
repeat {
  k <- k + 1
  enum  <- (-1)^(k-1)
  denom <- 2*k-1
  delta <- enum/denom
  x <- x + delta
  if (abs(delta) < 1e-6) break
}
4 * x

[1] 3.141595

number of iterations not known in advance
convergence criterium, stop when required precision is reached
no allocation of long vectors –> less memory than for loop

Note: there are more efficient methods to calculate $ \pi $.

if-clause

The example before showed already an if-clause. The syntax is as follows:

if (<condition>)
  <statement>
else if (<condition>)
  <statement>
else
  <statement>

Proper indentation improves readability. Suggestion 2: characters.
Professionals indent always.
Please do!

statement can of course be a compound statement with curly brackets {}
to be on the safe side and to avoid common errors you may always use {}.

Example:

if (x == 0) {
  print("x is Null")
} else if (x < 0) {
  print("x is negative")
} else {
  print("x is positive")
}

Vectorized if

Very often, a vectorized ifelse is more appropropriate than an if-function.

Let's assume we have a data set of chemical measurements x with missing NA values, and “nondetects” that are encoded with -99. First we want to replace the nontetects with half of the detection limit (e.g. 0.5):

x <- c(3, 6, NA, 5, 4, -99, 7, NA,  8, -99, -99, 9)
x2 <- ifelse(x == -99, 0.5, x)
x2

 [1] 3.0 6.0  NA 5.0 4.0 0.5 7.0  NA 8.0 0.5 0.5 9.0

Now let's remove the NAs:

na.omit(x2)

 [1] 3.0 6.0 5.0 4.0 0.5 7.0 8.0 0.5 0.5 9.0
attr(,"na.action")
[1] 3 8
attr(,"class")
[1] "omit"

This returns a special object, that can be used like a normal vector.

Some Basics of the R Language

Prerequisites

Expressions and Assignments

Assignments

The elements of the R language

Objects, constants, variables

Allowed and disallowed variable names

Operators

Functions

Parentheses

Data objects

Vectors, matrices and arrays

Vectors, matrices and arrays II

Accessing array elements

Reordering and indirect indexing

Matrix algebra

Matrix multiplication in detail

Transpose and inverse

Linear system of equations

Data frames

Data import assistant

From text (base)

From text (readr)

Save data in an Excel-compatible text format

Lists

Lists

Lists II

Lists, vectors and data frames

Naming of elements

Select and reorder data frame columns

Apply FUN to all elements of a list

Loops and conditional execution

Loops

Avoidable loops

Neccessary loop

if-clause

Vectorized if

Further Reading