====== R Tips ====== ===== General Tutorials ===== * Quick-R: [[http://www.statmethods.net/index.html]] ===== Read Data ===== # assumes the input_file is a tab-delim text file with header row > dataset <- read.table("input_file", sep="\t", header=TRUE) # examine the dataset > summary(dataset) ===== Regression Analysis ===== ==== Correlation ==== > x <- c(1, 2, 3, 4, 5) > y <- c(2, 3, 5, 8, 9) > cor.test(x, y, method = c("pearson")) Pearson's product-moment correlation data: x and y t = 9.9224, df = 3, p-value = 0.002178 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.7857660 0.9990617 sample estimates: cor 0.9851041 ==== Linear Regression ==== To perform a simple linear regression in R, use the command: > fit_1 <- lm (dep_var ~ ind_var, data=dataset) The model used in this case is: ''dep_var = a + b * ind_var + error'', where ''a'' and ''b'' are the constants used to fit the model. To exclude the coefficient ''a'' (i.e., forcing the regression line to go through origin (0, 0)), use: > fit_2 <- lm (dep_var ~ 0 + ind_var, data=dataset) To exclude the coefficient ''b'' (i.e., forcing a slope of 1), use: > fit_3 <- lm (dep_var ~ offset(ind_var), data=dataset) ==== Non-linear Regression ==== A good tutorial at: [[http://mercury.bio.uaf.edu/mercury/R/R.html]] ===== ANOVA ===== Analysis of Variance. # first draw a boxplot for visualization > boxplot(dep_var ~ ind_var, data = dataset) # perform the ANOVA, save the result to anova_result > anova_result <- aov(dep_var ~ ind_var, data = dataset) # to look at the result, including the P-value > summary(anova_result) ===== Install Optional Packages ===== ==== Installation ==== # choose the CRAN mirror site to use > chooseCRANmirror() # some useful packages as examples # gplots contains the heatmap.2 function > install.packages(c("gplots")) also installing the dependencies ‘gtools’, ‘gdata’ trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/gtools_2.5.0.tgz' Content type 'application/x-gzip' length 85423 bytes (83 Kb) opened URL ================================================== downloaded 83 Kb trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/gdata_2.4.2.tgz' Content type 'application/x-gzip' length 539269 bytes (526 Kb) opened URL ================================================== downloaded 526 Kb trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/gplots_2.6.0.tgz' Content type 'application/x-gzip' length 339358 bytes (331 Kb) opened URL ================================================== downloaded 331 Kb The downloaded packages are in /var/folders/F7/F7SZ5h-+GG0z6BlZFMBIH++++TM/-Tmp-//Rtmp0Vfq3y/downloaded_packages # RColorBrewer contains additional color schemes > install.packages(c("RColorBrewer")) trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/RColorBrewer_1.0-2.tgz' Content type 'application/x-gzip' length 21060 bytes (20 Kb) opened URL ================================================== downloaded 20 Kb The downloaded packages are in /var/folders/F7/F7SZ5h-+GG0z6BlZFMBIH++++TM/-Tmp-//Rtmp0Vfq3y/downloaded_packages # install HH package for exporting figure to eps > install.packages(c("HH")) also installing the dependencies ‘multcomp’, ‘mvtnorm’ trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/multcomp_1.0-3.tgz' Content type 'application/x-gzip' length 484591 bytes (473 Kb) opened URL ================================================== downloaded 473 Kb trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/mvtnorm_0.9-2.tgz' Content type 'application/x-gzip' length 231364 bytes (225 Kb) opened URL ================================================== downloaded 225 Kb trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/HH_2.1-15.tgz' Content type 'application/x-gzip' length 544085 bytes (531 Kb) opened URL ================================================== downloaded 531 Kb The downloaded packages are in /var/folders/F7/F7SZ5h-+GG0z6BlZFMBIH++++TM/-Tmp-//Rtmp0Vfq3y/downloaded_packages # vcd: Visualizing Categorical Data > install.packages(c("vcd")) also installing the dependency ‘colorspace’ trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/colorspace_0.97.tgz' Content type 'application/x-gzip' length 289822 bytes (283 Kb) opened URL ================================================== downloaded 283 Kb trying URL 'http://cran.opensourceresources.org/bin/macosx/universal/contrib/2.7/vcd_1.2-0.tgz' Content type 'application/x-gzip' length 1184534 bytes (1.1 Mb) opened URL ================================================== downloaded 1.1 Mb The downloaded packages are in /var/folders/F7/F7SZ5h-+GG0z6BlZFMBIH++++TM/-Tmp-//Rtmp2T0MfG/downloaded_packages ==== Installation R on ubuntu ==== add following line: deb http:///bin/linux/ubuntu precise/ on file /etc/apt/sources.list\\ replace by [[http://cran.r-project.org/mirrors.html|your favorite CRAN site ]]\\ >sudo apt-get update #there are error messages and I just ignored it >sudo apt-get install r-base >sudo apt-get install r-base-dev Then R is ready to use Alternative methed: \\ $ sudo R CMD INSTALL package.tar.gz ==== Load Packages ==== To use optional packages, they need to be loaded after the installation. > library(gplots) > library(RColorBrewer) > library(HH) ===== Color Palettes ===== ==== List of Color Palettes ==== A list of useful color palettes: * rich.colors: this is what I considered the true "rainbow", goes from red to indigo. * rainbow: not as good as the ''rich.colors'' palettes because the two extremes look similar (red) * greenred: green-black-red, often used in microarray-type data * heat.colors: red-orange-yellow-white * terrain.colors: green-yellow-orange * topo.colors: blue-green-yellow * cm.colors: cyan-white-magenta * gray: black-white ==== Custom Color Palettes ==== To create custom color palettes, use the ''colorpanel'' function (usage: ''colorpanel(n, low, mid, high)''). * n: Desired number of color elements in the panel. * low, mid, high: Colors to use for the Lowest, middle, and highest values. The value for ''mid'' may be ommited. These values can be given as color names ('red') or HTML-style RGB ("\#FF0000"). Example: to create a blue-grey-yellow color palette, use ''col = colorpanel(256, 'blue', 'grey', 'yellow')'' in the ''heatmap.2'' function call. ==== Quick Visualization ==== > # load library > library(gplots) > # define the number of colors to show > num <- 10 > # call barplot function for visualization > barplot(rep(1,num), yaxt = "n", col = rich.colors(num)) ==== Color Name Conversion ==== To convert the default color name to hexadecimal format # call the col2rgb function > col2rgb("darkorange1") [,1] red 255 green 127 blue 0 > rgb(255,127,0, maxColorValue=255) [1] "#FF7F00" ===== Heatmap ===== ''heatmap.2()'' is included in the optional ''gplots'' package and provides a number of extensions to the standard ''heatmap()'' function. Most notably, it can generate a color key by specifying "''key = TRUE''" in the function call. ==== Tutorials ==== * Microarray data: [[http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/r/heatmap/|By Peter Cock]] * A simple guide to using heatmap.2: [[http://enotacoes.wordpress.com/2007/11/16/easy-guide-to-drawing-heat-maps-to-pdf-with-r-with-color-key/|@ E-notações]] ==== Heatmap Example ==== # load the packages > library(gplots) > library(RColorBrewer) > library(HH) # initiate the display device > trellis.device() # load data > dataset <- read.table("input_file", sep="\t", header=TRUE) > dataset_matrix = data.matrix(dataset) # generate heatmap > heatmap.2(dataset_matrix, # dendrogram control Rowv = TRUE, Colv = TRUE, distfun = dist, hclustfun = hclust, # dendrogram = c("both","row","column","none"), dendrogram = c("both"), symm = FALSE, # data scaling # scale = c("none","row", "column"), scale = c("row"), # colors col = rich.colors(256), # level trace # trace=c("column","row","both","none"), trace=c("none"), # Row/Column Labeling margins = c(20, 20), # color key + density info key = TRUE, keysize = 1.0, # density.info=c("histogram","density","none"), density.info=c("none"), # plot labels main = NULL, xlab = NULL, ylab = NULL, ) # export to file > export.eps("output_file.eps") If the ''scale'' option is turned on (by specifying "''scale = c("row")''" or "''scale = c("column")''"), the color key will display the color mapping to Z-scores, which are calculated by subtracting the mean from each cell, and then divide the value by the standard deviation (see [[http://www.r-help.com/list/85/429617.html]] for details). ===== Hierarchical Clustering ===== Hierarchical clustering in ''R'' can be done using the package ''pvclust''. See more details here: [[http://www.is.titech.ac.jp/~shimo/prog/pvclust/]] To install: # install > install.packages("pvclust") # load the package > library(pvclust) # run example > example(pvclust) To run: # load data > dataset <- read.table("input_file", sep="\t", header=TRUE) > attach(dataset) # execute > result <- pvclust( dataset, method.hclust = "average", method.dist = "correlation", use.cor = "pairwise.complete.obs", # set the number of bootstrap resampling nboot = 1000, ) # plot result > plot(result) # highlight the grouping with high confidence > pvrect(result, alpha=0.95) # export to eps file (needs the HH library) > export.eps("output_file.eps") ===== Edit R using illustrator ===== copying AdobePiStd.otf from /Library/Application Support/Adobe/PDFL/10.9/Fonts/AdobePiStd.otf to /Library/Fonts/