Opiniomics

bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

Plotting cool graphs in R

I have to admit to being a bit of a snob when it comes to graphs and charts in scientific papers and presentations.  It’s not like I think I am particularly good at it – I’m OK – it’s just that I know what’s bad.  I’ve seen folk screenshot multiple Excel graphs so they can paste them into a powerpoint table to create multi-panel plots… and it kind of makes me want to scream.   I’m sorry, I really am, but when I see Excel plots in papers I judge the authors, and I don’t mean in a good way.  I can’t help it.  Plotting good graphs is an art, and sticking with the metaphor, Excel is paint-by-numbers and R is a blank canvas, waiting for something beautiful to be created; Excel is limiting, whereas R sets you free.

Readers of this blog will know that I like to take plots that I find which are fabulous and recreate them.  Well let’s do that again 🙂

I saw this Tweet by Trevor Branch on Twitter and found it intriguing:

It shows two plots of the same data.  The Excel plot:

excel

And the multi plot:

multi

You’re clearly supposed to think the latter is better, and I do; however perhaps disappointingly, the top graph would be easy to plot in Excel but I’m guessing most people would find it impossible to create the bottom one (in Excel or otherwise).

Well, I’m going to show you how to create both, in R. All code now in Github!

The Excel Graph

Now, I’ve shown you how to create Excel-like graphs in R before, and we’ll use some of the same tricks again.

First we set up the data:


# set up the data
df <- data.frame(Circulatory=c(32,26,19,16,14,13,11,11),
		 Mental=c(11,11,18,24,23,24,26,23),
		 Musculoskeletal=c(17,18,13,16,12,18,20,26),
		 Cancer=c(10,15,15,14,16,16,14,14))

rownames(df) <- seq(1975,2010,by=5)

df

Now let's plot the graph


# set up colours and points
cols <- c("darkolivegreen3","darkcyan","mediumpurple2","coral3")
pch <- c(17,18,8,15)

# we have one point on X axis for each row of df (nrow(df))
# we then add 2.5 to make room for the legend
xmax <- nrow(df) + 2.5

# make the borders smaller
par(mar=c(3,3,0,0))

# plot an empty graph
plot(1:nrow(df), 1:nrow(df), pch="", 
		xlab=NA, ylab=NA, xaxt="n", yaxt="n", 
		ylim=c(0,35), bty="n", xlim=c(1,xmax))

# add horizontal lines
for (i in seq(0,35,by=5)) {
	lines(1:nrow(df), rep(i,nrow(df)), col="grey")
}

# add points and lines 
# for each dataset
for (i in 1:ncol(df)) {

	points(1:nrow(df), df[,i], pch=pch[i], 
		col=cols[i], cex=1.5)

	lines(1:nrow(df), df[,i], col=cols[i], 
		lwd=4)


}

# add bottom axes
axis(side=1, at=1:nrow(df), tick=FALSE, 
		labels=rownames(df))

axis(side=1, at=seq(-0.5,8.5,by=1), 
		tick=TRUE, labels=NA)

# add left axis
axis(side=2, at=seq(0,35,by=5), tick=TRUE, 
		las=TRUE, labels=paste(seq(0,35,by=5),"%",sep=""))

# add legend
legend(8.5,25,legend=colnames(df), pch=pch, 
		col=cols, cex=1.5, bty="n",  lwd=3, lty=1)

And here is the result:

excel_plot

Not bad eh?  Actually, yes, very bad; but also very Excel!

The multi-plot

Plotting multi-panel figures in R is sooooooo easy!  Here we go for the alternate multi-plot.  We use the same data.


# split into 2 rows and 2 cols
split.screen(c(2,2))

# keep track of which screen we are
# plotting to
scr <- 1

# iterate over columns
for (i in 1:ncol(df)) {

	# select screen
	screen(scr)

	# reduce margins
	par(mar=c(3,2,1,1))

	# empty plot
	plot(1:nrow(df), 1:nrow(df), pch="", xlab=NA, 
		ylab=NA, xaxt="n", yaxt="n", ylim=c(0,35), 
		bty="n")

	# plot all data in grey
	for (j in 1:ncol(df)) {
		lines(1:nrow(df), df[,j], 
		col="grey", lwd=3)

	}	

	# plot selected in blue
	lines(1:nrow(df), df[,i], col="blue4", lwd=4)

	# add blobs
	points(c(1,nrow(df)), c(df[1,i], df[nrow(df),i]), 
		pch=16, cex=2, col="blue4")

	# add numbers
	mtext(df[1,i], side=2, at=df[1,i], las=2)
	mtext(df[nrow(df),i], side=4, at=df[nrow(df),i], 
		las=2)	

	# add title
	title(colnames(df)[i])

	# add axes if we are one of
	# the bottom two plots
	if (scr >= 3) {
		axis(side=1, at=1:nrow(df), tick=FALSE, 
			labels=rownames(df))
	}

	# next screen
	scr <- scr + 1
}

# close multi-panel image
close.screen(all=TRUE)

And here is the result:

multi_plot

 


And there we have it.

So which do you prefer?

5 Comments

  1. I have made just for fun a version with ggplot: https://gist.github.com/Artjom-Metro/0d2861f98aec7ab7544de2ae2b3d1e44

    Maybe I should consider using R base plotting more often in the future…

  2. Guillaume Devailly

    25th August 2016 at 10:16 pm

    This is soooooo static.
    https://plot.ly/ggplot2/
    🙂

  3. I have to confess, this particular example leaves me very conflicted. The faceted plot is clearly more aesthetically pleasing and more in keeping with good practice. But if the aim is to compare quickly the values from the 4 categories, I think many people would find that the original does so quite well, despite the ugliness. I can say straight off, for example, that “circulatory starts highest and ends lowest.”

  4. Marek Gierlinski

    26th August 2016 at 1:57 pm

    Ehm. Error bars?

    • biomickwatson

      30th August 2016 at 4:39 pm

      Yeah, so point of post was to recreate graphs… they don’t have error bars so I didn’t re-create them!

1 Pingback

  1. Page not found |

Leave a Reply

Your email address will not be published.

*

*

code

© 2017 Opiniomics

Theme by Anders NorenUp ↑