# Introduction to R

Martin Morgan (mtmorgan@fhcrc.org), Fred Hutchinson Cancer Research, Center, Seattle, WA, USA.
24 August 2014

### Outline

Part I

• Vectors (data)
• Functions
• Help!

Part II

• Classes (objects)
• Generics & methods
• Help!

Part III

• Packages
• Help!

### Part I: Vectors (data)

1                # vector of length 1

[1] 1

c(1, 1, 2, 3, 5) # vector of length 5

[1] 1 1 2 3 5


### Part I: Vectors (data)

• logical c(TRUE, FALSE), integer, numeric, complex, character c("A", "beta")
• list list(c(TRUE, FALSE), c("A", "beta"))
• Statistical concepts: factor, NA

Assignment and names

x <- c(1, 1, 2, 3, 5)
y = c(5, 5, 3, 2, 1)
z <- c(Female=12, Male=3)

• = and <- are the same

### Part I: Vectors (data)

Operations

x + y        # vectorized

[1] 6 6 5 5 6

x / 5        # ...recylcing

[1] 0.2 0.2 0.4 0.6 1.0

x[c(3, 1)]   # subset

[1] 2 1


### Part I: Functions

Examples: c(), concatenate values; rnorm(), generate random normal deviates; plot()

x <- rnorm(1000)    # 1000 normal deviates
y <- x + rnorm(1000, sd = 0.5)

• Optional, named arguments; positional matching
args(rnorm)

function (n, mean = 0, sd = 1)
NULL


### Part I: Functions

plot(x, y)


• formula: another way plot(y ~ x)

### Part I: Help!

Within R

?rnorm


Rstudio

• “Help” tab, search for “rnorm”

Main sections

### Part II: Classes (objects)

Motivation: manipulate complicated data

• e.g., x and y from previous example are related to one another – same length, element i of y is a transformation of element i of x

Solution: a “data frame” to coordinate access

df <- data.frame(X=x, Y=y)

        X       Y
1 -1.3692 -1.2625
2  1.9072  2.6103
3 -0.5395 -0.5987


### Part II: Generics & methods

class(df) # plain function

[1] "data.frame"

dim(df)   # generic & method for data.frame

[1] 1000    2

head(df$X, 4) # column access  [1] -1.3692 1.9072 -0.5395 -1.3264  ### Part II: Generics & methods ## create or update 'Z' df$Z <- sqrt(abs(df$Y)) ## subset rows and / or columns head(df[df$X > 0, c("X", "Z")])

         X      Z
2  1.90720 1.6156
5  0.02705 0.5804
6  0.18376 0.5624
8  0.04149 0.2101
9  0.96177 0.3850
14 0.48720 1.0353


### Part II: Generics & methods

plot(Y ~ X, df) # Y ~ X, values from 'df'
## lm(): linear model, returns class 'lm'
fit <- lm(Y ~ X, df)
abline(fit)  # plot regression line


### Part II: Generics & methods

anova(fit)

Analysis of Variance Table

Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
X           1   1042    1042    4417 <2e-16
Residuals 998    235       0

X         ***
Residuals
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.'
0.1 ' ' 1


### Part II: Generics & methods

• fit: object of class lm
• anova(): generic, with method for for class fit
methods(anova)

[1] anova.glm*     anova.glmlist*
[3] anova.lm*      anova.lmlist*
[5] anova.loess*   anova.mlm*
[7] anova.nls*

Non-visible functions are asterisked


### Part II: Help!

## class of object
class(fit)

## method discovery
methods(class=class(fit))
methods(anova)

## help on generic, and specific method
?anova
?anova.lm


### Part III: Packages

Installed

• Base & recommended
length(rownames(installed.packages()))

[1] 227


Available

### Part III: Packages

'Attached' (installed and available for use):

search()            # attached packages
ls("package:stats") # functions in 'stats'


Attaching (make installed package available for use)

library(ggplot2)


Installing CRAN or Bioconductor packages

source("http://bioconductor.org/biocLite.R")
biocLite("GenomicRanges")


Packages

### Part IV: Help!

Best bet

• Other R users you know!

R

Bioconductor

### Acknowledgements

Funding

• US NIH / NHGRI 2U41HG004059; NSF 1247813

People

• Seattle Bioconductor team: Sonali Arora, Marc Carlson, Nate Hayden, Valerie Obenchain, Hervé Pagès, Dan Tenenbaum
• Vincent Carey, Robert Gentleman, Rafael Irizzary, Sean Davis, Kasper Hansen, Michael Lawrence, Levi Waldron