Thomas Petzoldt
2017-11-01
Expression
1 - pi + exp(1.7)
[1] 3.332355
Assignment
a <- 1 - pi + exp(1.7)
Assignment of a constant and a variable
x <- 1.3
y <- "hello"
a <- x
x -> b
x <- a <- b
<-
but is less powerful.x = a
x <<- 2
A short classification of R's language elemnts:
Everything stored in R's memory is an object. Objects are specialized data structures that can be simple or very complex.
Objects can be constant or variable.
constants: 1, 123, 5.6, 5e7, “hello”
variables: can change their value are referenced by variable names (symbols)
x <- 2 # x is a variable, 2 a constant
Valid variable names: A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number.
Special characters, except _ and . (underscore and dot) are not allowed.
International characters (e.g German umlauts ä, ö, ü, …) are possible, but not recommended.
correct:
forbidden:
reserved words cannot be used as variable names:
Note: R is case sensitive, x and X, value and Value are different.
Pre-defined functions:
Arguments: mandatory or optional, un-named or named
User-defined functions:
Functions have a name that is followed by arguments in round parentheses.
R supports different classes of data objects.
Data objects can contain single values, vectors, matrices, tables, numbers, text and even maps, sound, images or video sequences.
We start with vectors, matrices and arrays, and data frames.
x <- 1:20
x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
y <- matrix(x, nrow=5, ncol=4)
y
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
as.vector(y)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x <- matrix(0, nrow=5, ncol=4)
x
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 0 0 0
[3,] 0 0 0 0
[4,] 0 0 0 0
[5,] 0 0 0 0
x <- matrix(1:4, nrow=5, ncol=4)
x
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 1
[3,] 3 4 1 2
[4,] 4 1 2 3
[5,] 1 2 3 4
x <- matrix(1:20, nrow=5, ncol=4, byrow=TRUE)
x
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
x <- t(x)
x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
x <- array(1:24, dim=c(3,4,2))
x
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
, , 2
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23
[3,] 15 18 21 24
x[1, 3, 1] # single element
[1] 7
x[ , 3, 1] # 3rd column of 1st layer
[1] 7 8 9
x[ , , 2] # second layer
[,1] [,2] [,3] [,4]
[1,] 13 16 19 22
[2,] 14 17 20 23
[3,] 15 18 21 24
x[1, , ] # another slice
[,1] [,2]
[1,] 1 13
[2,] 4 16
[3,] 7 19
[4,] 10 22
Original matrix
(x <- matrix(1:20, nrow=4))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
Inverted row order
x[4:1, ]
[,1] [,2] [,3] [,4] [,5]
[1,] 4 8 12 16 20
[2,] 3 7 11 15 19
[3,] 2 6 10 14 18
[4,] 1 5 9 13 17
Indirect index
x[c(1,2,1,2), c(1,3,2,5,4)]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 9 5 17 13
[2,] 2 10 6 18 14
[3,] 1 9 5 17 13
[4,] 2 10 6 18 14
Logical selection
x[c(FALSE, TRUE, FALSE, TRUE), ]
[,1] [,2] [,3] [,4] [,5]
[1,] 2 6 10 14 18
[2,] 4 8 12 16 20
Surprise?
x[c(0,1,0,1), ]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 1 5 9 13 17
Matrix
(x <- matrix(1:4, nrow=2))
[,1] [,2]
[1,] 1 3
[2,] 2 4
Diagonal matrix
(y <- diag(2))
[,1] [,2]
[1,] 1 0
[2,] 0 1
Element wise addition and multiplication
x * (y + 1)
[,1] [,2]
[1,] 2 3
[2,] 2 8
Outer product (and sum)
1:4 %o% 1:4
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 4 6 8
[3,] 3 6 9 12
[4,] 4 8 12 16
outer(1:4, 1:4, FUN = "+")
[,1] [,2] [,3] [,4]
[1,] 2 3 4 5
[2,] 3 4 5 6
[3,] 4 5 6 7
[4,] 5 6 7 8
Matrix multiplication
x %*% y
[,1] [,2]
[1,] 1 3
[2,] 2 4
Matrix
x <- matrix(c(1,2,3,4,3,2,5,4,6),
nrow=3)
x
[,1] [,2] [,3]
[1,] 1 4 5
[2,] 2 3 4
[3,] 3 2 6
Transpose
t(x)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 3 2
[3,] 5 4 6
Inverse (\( x^{-1} \))
solve(x)
[,1] [,2] [,3]
[1,] -0.6667 0.9333 -0.0667
[2,] 0.0000 0.6000 -0.4000
[3,] 0.3333 -0.6667 0.3333
\( x \cdot x^{-1} \)
x %*% solve(x)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
\[ \begin{aligned} 3x && + && 2y && - && z && = && 1 \\ 2x && - && 2y && + && 4z && = && -2 \\ -x && + && 1/2y && - && z && = && 0 \end{aligned} \]
A <- matrix(c(3, 2, -1,
2, -2, 4,
-1, 0.5, -1), nrow=3, byrow=TRUE)
b <- c(1, -2, 0)
\[ \begin{aligned} Ax &= b\\ x &= A^{-1}b \end{aligned} \]
solve(A) %*% b
[,1]
[1,] 1
[2,] -2
[3,] -2
read.table
or read.csv
cities <- read.csv("data/cities.csv", header=TRUE)
cities
Name Country Population Latitude Longitude IsCapital
1 Fürstenfeldbruck Germany 34033 48.1690 11.2340 FALSE
2 Dhaka Bangladesh 13000000 23.7500 90.3700 TRUE
3 Ulaanbaatar Mongolia 3010000 47.9170 106.8830 TRUE
4 Shantou China 5320000 23.3500 116.6700 FALSE
5 Kampala Uganda 1659000 0.3310 32.5830 TRUE
6 Cottbus Germany 100000 51.7650 14.3280 FALSE
7 Nairobi Kenya 3100000 1.2833 36.8167 TRUE
8 Hanoi Vietnam 1452055 21.0300 105.8400 TRUE
9 Bacgiang Vietnam 53739 21.2800 106.1900 FALSE
10 Addis Abba Ethiopia 2823167 9.0300 38.7400 TRUE
11 Hyderabad India 3632094 17.4000 78.4800 FALSE
File –> Import Dataset
Several options are available, depending on RStudio's version.
Note: The examples in this course are best tested with "From text (base)”!!!
English number format (. as decimal):
write.table(cities, "output.csv", row.names = FALSE, sep=",")
German number format (, as decimal):
write.table(cities, "output.csv", row.names = FALSE, sep=";", dec=",")
Creation of lists
L1 <- list(a=1:10, b=c(1,2,3), x="hello")
str
shows tree-like structure:L2 <- list(a=5:7, b=L1)
str(L2)
List of 2
$ a: int [1:3] 5 6 7
$ b:List of 3
..$ a: int [1:10] 1 2 3 4 5 6 7 8 9 10
..$ b: num [1:3] 1 2 3
..$ x: chr "hello"
Access to list elements by names
L2$a
[1] 5 6 7
L2$b$a
[1] 1 2 3 4 5 6 7 8 9 10
or with indices
L2[1] # a list with 1 element
$a
[1] 5 6 7
L2[[1]] # content of 1st element
[1] 5 6 7
Convert list to vector
unlist(L2)
a1 a2 a3 b.a1 b.a2 b.a3 b.a4 b.a5 b.a6
"5" "6" "7" "1" "2" "3" "4" "5" "6"
b.a7 b.a8 b.a9 b.a10 b.b1 b.b2 b.b3 b.x
"7" "8" "9" "10" "1" "2" "3" "hello"
str(unlist(L2))
Named chr [1:17] "5" "6" "7" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" ...
- attr(*, "names")= chr [1:17] "a1" "a2" "a3" "b.a1" ...
Flatten list (remove only top level of list)
str(unlist(L2, recursive = FALSE))
List of 6
$ a1 : int 5
$ a2 : int 6
$ a3 : int 7
$ b.a: int [1:10] 1 2 3 4 5 6 7 8 9 10
$ b.b: num [1:3] 1 2 3
$ b.x: chr "hello"
Convert vector to list
x <- 1:3
str(as.list(x))
List of 3
$ : int 1
$ : int 2
$ : int 3
Convert matrix to data frame
x <- matrix(1:16, nrow=4)
df <- as.data.frame(x)
is.list(df)
[1] TRUE
df
V1 V2 V3 V4
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
4 4 8 12 16
Convert data frame to matrix
as.matrix(df)
V1 V2 V3 V4
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
Append column to data frame
df2 <- cbind(df, id=c("first", "second", "third", "fourth"))
Data frame with character column
as.matrix(df2)
V1 V2 V3 V4 id
[1,] "1" "5" " 9" "13" "first"
[2,] "2" "6" "10" "14" "second"
[3,] "3" "7" "11" "15" "third"
[4,] "4" "8" "12" "16" "fourth"
During creation
x <- c(a=1.2, b=2.3, c=6)
L <- list(a=1:3, b="hello")
With names
-function
names(L)
[1] "a" "b"
names(L) <- c("numbers", "text")
names(L)
[1] "numbers" "text"
x <- 1:5
names(x) <- letters[1:5]
x
a b c d e
1 2 3 4 5
x <- matrix(1:16, nrow=4)
df <- as.data.frame(x)
df
V1 V2 V3 V4
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
4 4 8 12 16
names(df) <- c("N", "P", "O2", "C")
df
N P O2 C
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
4 4 8 12 16
df2 <- df[c("C", "N", "P")]
df2
C N P
1 13 1 5
2 14 2 6
3 15 3 7
4 16 4 8
df # data frame of previous slide
N P O2 C
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
4 4 8 12 16
lapply(df, mean) # returns list
$N
[1] 2.5
$P
[1] 6.5
$O2
[1] 10.5
$C
[1] 14.5
sapply(df, mean) # returns vector
N P O2 C
2.5 6.5 10.5 14.5
Row wise apply
apply(df, MARGIN = 1, sum)
[1] 28 32 36 40
Column wise apply
apply(df, MARGIN = 2, sum)
N P O2 C
10 26 42 58
Apply user defined function
se <- function(x)
sd(x)/sqrt(length(x))
sapply(df, se)
N P O2 C
0.6455 0.6455 0.6455 0.6455
for (i in 1:4) {
cat(i, 2*i, "\n")
}
1 2
2 4
3 6
4 8
j <- 1; x <- 0
while (j > 1e-3) {
j <- 0.1 * j
x <- x + j
cat(j, x, "\n")
}
0.1 0.1
0.01 0.11
0.001 0.111
1e-04 0.1111
In many cases, loops can be avoided by using vectors and matrices or apply
.
x <- 1
repeat {
x <- 0.1*x
cat(x, "\n")
if (x < 1e-4) break
}
0.1
0.01
0.001
1e-04
1e-05
for (i in 1:3) {
for (j in c(1,3,5)) {
cat(i, i*j, "\n")
}
}
1 1
1 3
1 5
2 2
2 6
2 10
3 3
3 9
3 15
Column means of a data frame
## a data frame
df <- data.frame(
N=1:4, P=5:8, O2=9:12, C=13:16
)
## loop
m <- numeric(4)
for(i in 1:4) {
m[i] <- mean(df[,i])
}
m
[1] 2.5 6.5 10.5 14.5
sapply(df, mean)
N P O2 C
2.5 6.5 10.5 14.5
… also possible colMeans
An infinite series:
\[ \sum_{k=1}^{\infty}\frac{(-1)^{k-1}}{2k-1} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} \]
x <- 0
for (k in seq(1, 1e5)) {
enum <- (-1)^(k-1)
denom <- 2*k-1
x <- x + enum/denom
}
4 * x
[1] 3.141583
\( \Rightarrow \) Can you vectorize this?
The same series:
\[ \sum_{k=1}^{\infty}\frac{(-1)^{k-1}}{2k-1} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} \]
x <- 0
k <- 0
repeat {
k <- k + 1
enum <- (-1)^(k-1)
denom <- 2*k-1
delta <- enum/denom
x <- x + delta
if (abs(delta) < 1e-6) break
}
4 * x
[1] 3.141595
Note: there are more efficient methods to calculate \( \pi \).
The example before showed already an if
-clause. The syntax is as follows:
if (<condition>)
<statement>
else if (<condition>)
<statement>
else
<statement>
statement
can of course be a compound statement with curly brackets {}
{}
.Example:
if (x == 0) {
print("x is Null")
} else if (x < 0) {
print("x is negative")
} else {
print("x is positive")
}
Very often, a vectorized ifelse
is more appropropriate than an if
-function.
Let's assume we have a data set of chemical measurements x
with missing NA
values,
and “nondetects” that are encoded with -99
. First we want to replace the nontetects
with half of the detection limit (e.g. 0.5):
x <- c(3, 6, NA, 5, 4, -99, 7, NA, 8, -99, -99, 9)
x2 <- ifelse(x == -99, 0.5, x)
x2
[1] 3.0 6.0 NA 5.0 4.0 0.5 7.0 NA 8.0 0.5 0.5 9.0
Now let's remove the NA
s:
na.omit(x2)
[1] 3.0 6.0 5.0 4.0 0.5 7.0 8.0 0.5 0.5 9.0
attr(,"na.action")
[1] 3 8
attr(,"class")
[1] "omit"
This returns a special object, that can be used like a normal vector.
More about this can be found in the official R manuals, especially in An Introduction to R.
This tutorial was made with R-Presentations of RStudio