Friday, April 5, 2013

Multiple pairwise comparisons for categorical predictors

Dale Barr (@datacmdr) recently had a nice blog post about coding categorical predictors, which reminded me to share my thoughts about multiple pairwise comparisons for categorical predictors in growth curve analysis. As Dale pointed out in his post, the R default is to treat the reference level of a factor as a baseline and to estimate parameters for each of the remaining levels. This will give pairwise comparisons for each of the other levels with the baseline, but not among those other levels. Here's a simple example using the ChickWeight data set (part of the datasets package). As a reminder, this data set is from an experiment on the effect of diet on early growth of chicks. There were 50 chicks, each fed one of 4 diets, and their weights were measured up to 12 times over the first 3 weeks after they were born.

Thursday, April 4, 2013

R 3.0 released; ggplot2 stat_summary bug fixed!

The new version of R was released yesterday. As I understand it, the numbering change to 3.0 represents the recognition that R had evolved enough to justify a new number rather than the addition of many new features. There are some important new features, but I am not sure they will affect me very much. 

For me, the much bigger change occurred in the update of the ggplot2 package to version, which actually happened about a month ago, but I somehow missed it. This update is a big deal for me because it fixes a very unfortunate bug from version 0.9.3 that broke one of my favorite features: stat_summary(). As I mentioned in my previous post, one of the great features of ggplot is that allows you to compute summary statistics "on the fly". The bug had broken this feature for certain kinds of summary statistics computed using stat_summary(). A workaround was developed relatively quickly, which I think is a nice example of open-source software development working well, but it's great to have it fixed in the packaged version.