20 January

From Stats756
Jump to: navigation, search

graphics defaults

Deciding on appropriate graphical parameters, or on appropriate defaults, is often a matter of judgment. Comparing base, lattice, ggplot defaults for small multiples (and discussing the reasoning):

base plots

log(pplot_base.png)

  • open points (said to be better for detecting overlapping points in exploratory contexts)
  • y-axis tick labels are parallel to the axis (vertical), which is useful if you don't know how long the labels are going to be and can't adjust the distance of the y-axis label from the axis -- but hurts readability
  • would have to add a legend by hand (?legend, probably also setting margins by hand)
  • the gaps between plots are considered bad by Tufte, because they generate a strong, potentially distracting visual pattern; they also use space that could be used for data
  • the alternating tick marks (i.e. x-axis ticks on the bottom in columns 1 and 3, on the top in column 2, similar pattern for y-axis) are a way to try to reduce clutter on the graph
  • as mentioned in class, I find the default colors (including red and green) problematic
  • the dots in the variable names (Sepal.Length etc.) occur because R doesn't like spaces in variable names

car plot

log(pplot_car.png)

  • an enhanced version of scatterplot with density and rug plots on the diagonal, overall loess trend lines (black) and linear regression lines (red) on the off-diagonals (nice if you want it, perhaps cluttered as a default)
  • automatic legend
  • graphical defaults otherwise the same

lattice plot

log(pplot_lattice.png)

  • between-plot gaps eliminated
  • vertical y-axis labels
  • differences in tick mark spacing and position (using the space within the diagonal boxes, for compactness)
  • labels on bottom-left-to-top-right diagonal rather than top-left-to-bottom-right (don't know why)
  • (not shown) strip labels for sub-plots would be in a neutral/pastel shade (#FFE5CC)
  • key can be added, but the simplest attempt (auto.key=TRUE, which works for some other lattice plot) fails, and I haven't worked out the right incantation to get round points (pch=1); default is pch=8 (even though that's not what is used in the plot)

ggplot

log(pplot_ggplot.png)

  • gray background, with gray grid. Tufte would consider this wasted ('non-data') ink: Wickham (ggplot's author) argues that this is to keep the 'optical density' of the plot similar to that of surrounding text. (?theme_bw to change to a white background with gray grid lines)
  • the between-plot gaps are back
    • filled points
  • no color distinguishing species (not easy to do in this context)
  • labels at top/right, on darker gray background

GGally

  • this is a set of enhancements to ggplot2. Basic format is similar to previous, but adds density plots separated by group
  • red/blue/green colors (better, I think, in not giving black to one group [makes groups more similar] but still has red/green distinction problem)
  • correlation summary statistics plotted in upper triangle
  • problem lining up bottom L plot (y-axis labels are a different length)

log(pplot_GGally.png)

R script

pairplots.R
printfun <- function(expr,pkgname="",basename="pplot",ext="png",devfun=png,...) {
  fn <- paste(basename,"_",pkgname,".",ext,sep="")
  devfun(file=fn,...)
  print(eval(expr)) ## R magic
  invisible(dev.off())
}
 
data(iris)
## base graphics
printfun(pairs(iris[,1:3],col=iris$Species),"base")
 
## car (enhanced, using base graphics)
library(car)
printfun(scatterplotMatrix(iris[1:3],groups=iris$Species),"car")
 
## lattice
library(lattice)
printfun(splom(~iris[1:3],col=iris$Species),"lattice")
 
## ggplot (hard to get colours from standard package
library(ggplot2)
printfun(plotmatrix(iris[1:3]),"ggplot")
 
## GGally (enhanced, using ggplot2)
if (!require(GGally)) {
  install.packages("GGally") ## repos="http://probability.ca/cran" ?
  library(GGally)
}
printfun(ggpairs(iris,columns=1:3,colour="Species"),"GGally")


Makefile

googleVis example

(Link to Hans Rosling etc.)

gvex.R
library(emdbook) ## for SeedPred
library(Hmisc) ## for smean.sd
data(SeedPred)
 
## measurements are taken on each transect on different dates;
##  if we try to plot them all together against time, we get
##  a weird jitter effect as we alternate between data from
##  different transects.  We could either separate each
##  species by transect, or compute some interesting summary statistic
##  (such as diversity) by transect, or just select a single transect
ss <- subset(SeedPred,select=c(date,species,available),
             dist=="10")
ss_a <- aggregate(available~date+species,data=ss,FUN=smean.sd)
## not quite the format we wanted! str(ss_a) to see what happened
ss_a <- with(ss_a,data.frame(date,species,Mean=available[,1],SD=available[,2]))
 
## melt/cast don't work easily with dates
## make sure that we can transform both ways
identical(ss$date,as.Date(as.numeric(ss$date),origin="1970-1-1"))
ss$date <- as.numeric(ss$date)
ss_a <- cast(ss,date+species~.,fun.aggregate=smean.sd)
ss_a$date <- as.Date(ss_a$date,origin="1970-1-1")
 
ss_am <- melt(ss_a,id.var=1:2)
library(ggplot2)
g1 <- qplot(date,value,data=ss_am,
      geom="line",colour=species)+facet_grid(.~variable)+
  xlim(as.Date("1999-03-01"),as.Date("2000-01-01"))
 
library(directlabels)
direct.label(g1)
## would like to further play with/adjust x-axis ticks ...
 
library("googleVis")
M1 <- gvisMotionChart(ss_a,idvar="species",timevar="date")
plot(M1)
## doesn't work without web access?
 
## diversity/mean example:
ss <- subset(SeedPred,select=c(date,dist,species,available))
divmean <- function(x) {
  p <- x/sum(x)
  c(diversity=-sum(p*log(p),na.rm=TRUE),
    mean=mean(x,na.rm=TRUE))
}
 
ss_a <- aggregate(available~date+dist,data=ss,
                  FUN=divmean)
ss_a <- with(ss_a,data.frame(date,dist,diversity=available[,1],
                             mean=available[,2]))
 
ss_am <- melt(ss_a,id.var=1:2)
qplot(date,value,data=ss_am,linetype=dist,colour=variable,
      geom="line")
## actually, this isn't that interesting.
## and we probably should be putting these in different
## facets (facet_grid(.~variable)) rather than distinguishing
## them via colour