====== R Tips ======
===== General Tutorials =====
* Quick-R: [[http://www.statmethods.net/index.html]]
===== Read Data =====
# assumes the input_file is a tab-delim text file with header row
> dataset <- read.table("input_file", sep="\t", header=TRUE)
# examine the dataset
> summary(dataset)
===== Regression Analysis =====
==== Correlation ====
> x <- c(1, 2, 3, 4, 5)
> y <- c(2, 3, 5, 8, 9)
> cor.test(x, y, method = c("pearson"))
Pearson's product-moment correlation
data: x and y
t = 9.9224, df = 3, p-value = 0.002178
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7857660 0.9990617
sample estimates:
cor
0.9851041
==== Linear Regression ====
To perform a simple linear regression in R, use the command:
> fit_1 <- lm (dep_var ~ ind_var, data=dataset)
The model used in this case is: ''dep_var = a + b * ind_var + error'', where ''a'' and ''b'' are the constants used to fit the model.
To exclude the coefficient ''a'' (i.e., forcing the regression line to go through origin (0, 0)), use:
> fit_2 <- lm (dep_var ~ 0 + ind_var, data=dataset)
To exclude the coefficient ''b'' (i.e., forcing a slope of 1), use:
> fit_3 <- lm (dep_var ~ offset(ind_var), data=dataset)
==== Non-linear Regression ====
A good tutorial at: [[http://mercury.bio.uaf.edu/mercury/R/R.html]]
===== ANOVA =====
Analysis of Variance.
# first draw a boxplot for visualization
> boxplot(dep_var ~ ind_var, data = dataset)
# perform the ANOVA, save the result to anova_result
> anova_result <- aov(dep_var ~ ind_var, data = dataset)
# to look at the result, including the P-value
> summary(anova_result)
===== Install Optional Packages =====
==== Installation ====
# choose the CRAN mirror site to use
> chooseCRANmirror()
# some useful packages as examples
# gplots contains the heatmap.2 function
> install.packages(c("gplots"))
also installing the dependencies ‘gtools’, ‘gdata’
trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/gtools_2.5.0.tgz'
Content type 'application/x-gzip' length 85423 bytes (83 Kb)
opened URL
==================================================
downloaded 83 Kb
trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/gdata_2.4.2.tgz'
Content type 'application/x-gzip' length 539269 bytes (526 Kb)
opened URL
==================================================
downloaded 526 Kb
trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/gplots_2.6.0.tgz'
Content type 'application/x-gzip' length 339358 bytes (331 Kb)
opened URL
==================================================
downloaded 331 Kb
The downloaded packages are in
/var/folders/F7/F7SZ5h-+GG0z6BlZFMBIH++++TM/-Tmp-//Rtmp0Vfq3y/downloaded_packages
# RColorBrewer contains additional color schemes
> install.packages(c("RColorBrewer"))
trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/RColorBrewer_1.0-2.tgz'
Content type 'application/x-gzip' length 21060 bytes (20 Kb)
opened URL
==================================================
downloaded 20 Kb
The downloaded packages are in
/var/folders/F7/F7SZ5h-+GG0z6BlZFMBIH++++TM/-Tmp-//Rtmp0Vfq3y/downloaded_packages
# install HH package for exporting figure to eps
> install.packages(c("HH"))
also installing the dependencies ‘multcomp’, ‘mvtnorm’
trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/multcomp_1.0-3.tgz'
Content type 'application/x-gzip' length 484591 bytes (473 Kb)
opened URL
==================================================
downloaded 473 Kb
trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/mvtnorm_0.9-2.tgz'
Content type 'application/x-gzip' length 231364 bytes (225 Kb)
opened URL
==================================================
downloaded 225 Kb
trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/HH_2.1-15.tgz'
Content type 'application/x-gzip' length 544085 bytes (531 Kb)
opened URL
==================================================
downloaded 531 Kb
The downloaded packages are in
/var/folders/F7/F7SZ5h-+GG0z6BlZFMBIH++++TM/-Tmp-//Rtmp0Vfq3y/downloaded_packages
# vcd: Visualizing Categorical Data
> install.packages(c("vcd"))
also installing the dependency ‘colorspace’
trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/colorspace_0.97.tgz'
Content type 'application/x-gzip' length 289822 bytes (283 Kb)
opened URL
==================================================
downloaded 283 Kb
trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/vcd_1.2-0.tgz'
Content type 'application/x-gzip' length 1184534 bytes (1.1 Mb)
opened URL
==================================================
downloaded 1.1 Mb
The downloaded packages are in
/var/folders/F7/F7SZ5h-+GG0z6BlZFMBIH++++TM/-Tmp-//Rtmp2T0MfG/downloaded_packages
==== Installation R on ubuntu ====
add following line:
deb http:///bin/linux/ubuntu precise/
on file /etc/apt/sources.list\\
replace by [[http://cran.r-project.org/mirrors.html|your favorite CRAN site ]]\\
>sudo apt-get update #there are error messages and I just ignored it
>sudo apt-get install r-base
>sudo apt-get install r-base-dev
Then R is ready to use
Alternative methed: \\
$ sudo R CMD INSTALL package.tar.gz
==== Load Packages ====
To use optional packages, they need to be loaded after the installation.
> library(gplots)
> library(RColorBrewer)
> library(HH)
===== Color Palettes =====
==== List of Color Palettes ====
A list of useful color palettes:
* rich.colors: this is what I considered the true "rainbow", goes from red to indigo.
* rainbow: not as good as the ''rich.colors'' palettes because the two extremes look similar (red)
* greenred: green-black-red, often used in microarray-type data
* heat.colors: red-orange-yellow-white
* terrain.colors: green-yellow-orange
* topo.colors: blue-green-yellow
* cm.colors: cyan-white-magenta
* gray: black-white
==== Custom Color Palettes ====
To create custom color palettes, use the ''colorpanel'' function (usage: ''colorpanel(n, low, mid, high)'').
* n: Desired number of color elements in the panel.
* low, mid, high: Colors to use for the Lowest, middle, and highest values. The value for ''mid'' may be ommited. These values can be given as color names ('red') or HTML-style RGB ("\#FF0000").
Example: to create a blue-grey-yellow color palette, use ''col = colorpanel(256, 'blue', 'grey', 'yellow')'' in the ''heatmap.2'' function call.
==== Quick Visualization ====
> # load library
> library(gplots)
> # define the number of colors to show
> num <- 10
> # call barplot function for visualization
> barplot(rep(1,num), yaxt = "n", col = rich.colors(num))
==== Color Name Conversion ====
To convert the default color name to hexadecimal format
# call the col2rgb function
> col2rgb("darkorange1")
[,1]
red 255
green 127
blue 0
> rgb(255,127,0, maxColorValue=255)
[1] "#FF7F00"
===== Heatmap =====
''heatmap.2()'' is included in the optional ''gplots'' package and provides a number of extensions to the standard ''heatmap()'' function. Most notably, it can generate a color key by specifying "''key = TRUE''" in the function call.
==== Tutorials ====
* Microarray data: [[http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/r/heatmap/|By Peter Cock]]
* A simple guide to using heatmap.2: [[http://enotacoes.wordpress.com/2007/11/16/easy-guide-to-drawing-heat-maps-to-pdf-with-r-with-color-key/|@ E-notações]]
==== Heatmap Example ====
# load the packages
> library(gplots)
> library(RColorBrewer)
> library(HH)
# initiate the display device
> trellis.device()
# load data
> dataset <- read.table("input_file", sep="\t", header=TRUE)
> dataset_matrix = data.matrix(dataset)
# generate heatmap
> heatmap.2(dataset_matrix,
# dendrogram control
Rowv = TRUE,
Colv = TRUE,
distfun = dist,
hclustfun = hclust,
# dendrogram = c("both","row","column","none"),
dendrogram = c("both"),
symm = FALSE,
# data scaling
# scale = c("none","row", "column"),
scale = c("row"),
# colors
col = rich.colors(256),
# level trace
# trace=c("column","row","both","none"),
trace=c("none"),
# Row/Column Labeling
margins = c(20, 20),
# color key + density info
key = TRUE,
keysize = 1.0,
# density.info=c("histogram","density","none"),
density.info=c("none"),
# plot labels
main = NULL,
xlab = NULL,
ylab = NULL,
)
# export to file
> export.eps("output_file.eps")
If the ''scale'' option is turned on (by specifying "''scale = c("row")''" or "''scale = c("column")''"), the color key will display the color mapping to Z-scores, which are calculated by subtracting the mean from each cell, and then divide the value by the standard deviation (see [[http://www.r-help.com/list/85/429617.html]] for details).
===== Hierarchical Clustering =====
Hierarchical clustering in ''R'' can be done using the package ''pvclust''. See more details here: [[http://www.is.titech.ac.jp/~shimo/prog/pvclust/]]
To install:
# install
> install.packages("pvclust")
# load the package
> library(pvclust)
# run example
> example(pvclust)
To run:
# load data
> dataset <- read.table("input_file", sep="\t", header=TRUE)
> attach(dataset)
# execute
> result <- pvclust( dataset,
method.hclust = "average",
method.dist = "correlation",
use.cor = "pairwise.complete.obs",
# set the number of bootstrap resampling
nboot = 1000,
)
# plot result
> plot(result)
# highlight the grouping with high confidence
> pvrect(result, alpha=0.95)
# export to eps file (needs the HH library)
> export.eps("output_file.eps")
===== Edit R using illustrator =====
copying AdobePiStd.otf from
/Library/Application Support/Adobe/PDFL/10.9/Fonts/AdobePiStd.otf
to
/Library/Fonts/