Opiniomics

bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

Reading data from google sheets into R

Reading data from google sheets into R is something you imagine should be really simple, but often is anything but. However, package googlesheets goes a long way to solving this problem.

Let’s crack on with an example.

First, install the software:

install.packages("googlesheets")

We then need an example sheet to work with, and I’m going to use one from Britain Elects:

So go here and add this to your own collection (yes you’ll need a google account!) using the little “Add to my drive” icon to the right of the title, just next to the star.

Right, let’s play around a bit!

# load package
library(googlesheets)

# which google sheets do you have access to?
# may ask you to authenticate in a browser!
gs_ls()

# get the Britain Elects google sheet
be <- gs_title("Britain Elects / Public Opinion")

This should show you all of the sheets you have access to, and also select the one we want.

Now we can see which worksheets exist within the sheet:

# list worksheets
gs_ws_ls(be)

We can "download" one of the sheets using gs_read()

# get Westminster voting
west <- gs_read(ss=be, ws = "Westminster voting intentions", skip=1)

# convert to data.frame
wdf <- as.data.frame(west)

And hey presto, we have the data. Simples!

Now let's plot it:

# reverse so that data are forward-sorted by time
wdf <- wdf[2769:1,]

# treat all dates as a single point on x-axis
dates <- 1:2769

# smooth parameter
fs <- 0.04

# plot conservative
plot(lowess(dates, wdf$Con, f=fs), type="l", col="blue", lwd=5, ylim=c(0,65), xaxt="n", xlab="", ylab="%", main="Polls: Westminster voting intention")

# add labels
axis(side=1, at=dates[seq(1, 2769, by=40)], labels=paste(wdf$"Fieldwork end date", wdf$Y)[seq(1, 2769, by=40)], las=2, cex.axis=0.8)

# plot labour and libdem
lines(lowess(dates, wdf$Lab, f=fs), col="red", lwd=5)
lines(lowess(dates, wdf$LDem, f=fs), col="orange", lwd=5)

# add UKIP, and we treat absent values as -50 for plotting purposes
ukip <- wdf$UKIP
ukip[is.na(ukip)] <- -50
lines(lowess(dates, ukip, f=fs), col="purple", lwd=5)

# add legend
legend(1, 65, legend=c("Con","Lab","LibDem","UKIP"), fill=c("blue","red","orange","purple"))


westminster

6 Comments

  1. I got an error in:

    > west<-gs_read(ss=be, ws="Westminster voting intentions", skip=1)
    Accessing worksheet titled 'Westminster voting intentions'.
    Downloading: 100 kB No encoding supplied: defaulting to UTF-8.
    Warning: 3 parsing failures.
    row col expected actual
    1514 CON a double Con
    1514 LAB a double Lab
    1514 LDEM a double LDem
    Error: Each variable must have a unique name.
    Problem variables: 'UKIP', 'C', 'L', 'LD', 'U', 'G', 'YouGovOnline', 'TNSOnline', 'ICMOnline', 'OpiniumOnline', 'Ipsos MoriPhone', 'SurvationPhone', 'SurvationOnline', 'ComResOnline', 'ICMPhone', 'BMGOnline', 'ComResPhone', 'PanelbaseOnline', 'Angus ReidOnline', 'BPIXOnline', 'HarrisOnline', 'YouGovOnline', 'TNSOnline', 'ICMOnline', 'OpiniumOnline', 'Ipsos MoriPhone', 'SurvationPhone', 'SurvationOnline', 'ComResOnline', 'ICMPhone', 'BMGOnline', 'ComResPhone', 'PanelbaseOnline', 'Angus ReidOnline', 'BPIXOnline', 'HarrisOnline', 'YouGovOnline', 'TNSOnline', 'ICMOnline', 'OpiniumOnline', 'Ipsos MoriPhone', 'SurvationPhone', 'SurvationOnline', 'ComResOnline', 'ICMPhone', 'BMGOnline', 'ComResPhone', 'PanelbaseOnline', 'Angus ReidOnline', 'BPIXOnline', 'HarrisOnline', 'YouGovOnline', 'TNSOnline', 'ICMOnline', 'OpiniumOnline', 'Ipsos MoriPhone', 'SurvationPhone', 'SurvationOnline', 'ComResOnline', 'ICMPhone', 'BMGOnline', 'ComResPhone', 'Panelbas

    • biomickwatson

      23rd November 2016 at 9:53 am

      Hmmm! I cannot reproduce this. Which version of R are you using?

      • > $version.string
        [1] “R version 3.3.1 (2016-06-21)”

        I think is a problem with the sheet, about the sames names of columns, and about the format of some data.

        I tested with a sheet of mine and it works.

  2. Or you could download the Excel file to access locally! No google account required, no bugs to trace and waste time.

    • biomickwatson

      23rd November 2016 at 1:59 pm

      Well that’s great if you like working with Excel; less great if you like scripting in R. Plus the download will become out of date…

      • One must not like working with Excel. Just using the file format is the point. After all it allows one to avoid having google track people some more and also avoid the bugs that come with dealing with its file format, as evident above.

        R is perfectly happy to read excel files. And since– to be up to date– one will have to download the googlesheet data every time analysis is to be done, then what is the problem of downloading the excel file just as often?

Leave a Reply

Your email address will not be published.

*

*

code

© 2017 Opiniomics

Theme by Anders NorenUp ↑