Being part of a large, monolithic University means that things take a really long time to change.  For example, many of the Linux systems I have access to are at least a major version out of date, often two.  The attitude is often not “let’s have the latest stuff!”, more “if it ain’t broke, don’t fix it”.

What this means is that I am often frustrated by not being able to install certain tools, as dependencies are out of date – this happened quite recently, when I tried to use Aaron Quinlan’s Piledriver; you can see the issue here.

Rather than engage in a quite lengthy and often fruitless attempt to bring our systems up to date, I often turn to Amazon EC2, where I can spin up a more recent Linux server, do what I need to do, close it down and move on.

This is how you do that with Piledriver.

Piledriver in the cloud

The first part is to set up a cloud server – I’m not going to cover this here as it’s covered elsewhere (e.g. here and here; Note: these links specifically tell you how to set up a server using a cloud image created specifically for a training course.  If you follow these instructions, instead of searching for the WageningenEU image, just select Ubuntu 12.04 LTS).

If you just want to have a play without spending any money, note that setting up a t1.micro instance is free with Amazon for the first year, so give it a go.  I set up a server running Ubuntu 12.04 LTS and chose an m1.large instance type.

Once set up and logged in, the first step is to get Piledriver.  I’m just going to get the zip file of the master branch directly from github:

wget https://github.com/arq5x/piledriver/archive/master.zip

Next we need to unzip it.  Unfortunately, this version of Ubuntu doesn’t come with unzip, so we can install it.  But before we start, it is always a good idea to update all of our repositories:

sudo apt-get update
sudo apt-get install unzip

(answer Y to any questions)

So let’s get into it:

unzip master.zip
cd piledriver-master

Now, it just so happens that I know that Piledriver compiles with cmake (which has a dependency on make) and also, as it’s written in C/C++, we will need C and C++ compilers – so lets make sure we have those:

sudo apt-get install make cmake gcc g++

(answer Y to all questions)

Now, if you try and compile Piledriver, it will fail, complaining about an absence of zlib.h.  If you don’t know where to get zlib.h from you can use a utility called apt-file, but we need to install it first:

sudo apt-get install apt-file
sudo apt-file update
sudo apt-file search zlib.h

(answer Y to all questions)

This produces a lot of output, but the one at the bottom is the one we want – “zlib1g-dev”:

sudo apt-get install zlib1g-dev

(answer Y to all questions)

After all of that (which only takes a few minutes) we can build piledriver:

mkdir build
cd build
cmake ..
make

And that should install perfectly!  Now just change back to your home directory and run Piledriver:

cd 
./piledriver-master/bin/bamtools piledriver -h

This will produce:

Description: converts BAM to a number of other formats.

Usage: bamtools piledriver -format <FORMAT> [-in <filename> -in <filename> ... | -list <filelist>] [-out <filename>] [-region <REGION>] [format-specific options]

Input & Output:
  -in <BAM filename>                the input BAM file(s) [stdin]
  -list <filename>                  the input BAM file list, one
                                    line per file
  -out <BAM filename>               the output BAM file [stdout]
  -region <REGION>                  genomic region. Index file is
                                    recommended for better performance, and is
                                    used automatically if it exists. Regions
                                    specified using the following format:
                                    -region chr1:START..END

Pileup Options:
  -fasta <FASTA filename>           FASTA reference file

Help:
  --help, -h                        shows this help text

And that’s it – enjoy! Don’t forget to close down your EC2 instance afterwards