bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

Category: Uncategorized

Online behavior, and an apology

When someone accuses you of bullying and personal attacks, it’s only right to take stock, think about your behavior and analyse what you’ve done wrong.

Recently on Twitter, I’ve obviously caused some upset and I need(ed) time to reflect and digest. As you will know if you read this blog or follow me on Twitter, I am opinionated and have a confrontational style of discussion. Clearly sometimes this comes across badly and obviously my recent interactions with @DrMel_T and @Julie_B92 represents one of those cases. Please understand it wasn’t and never will be a conscious attempt to bully.  I myself have been bullied in the past and I didn’t enjoy it one bit.  I believe in these instances the feelings of the person feeling bullied are way more important than the feelings, opinions or intentions of the accused bully, and I clearly need to respond to that. I realise that I have benefited from white male privilege and try to mitigate that wherever I can. However, seeing the reaction of Melanie and Julie, I have failed in this case and I apologise unreservedly for that. I made a mistake and I regret it.

I hope now we can move on and I will try to moderate my responses in future.

Easy and powerful graph editing using R and PowerPoint

A recent conversation on Twitter reminded me of a powerful way to edit graphs created in R inside PowerPoint:

I realise that R is incredibly powerful anyway, and that much of this can be done within R, but I am also painfully well aware that R is hard, and that many users prefer “point-and-click”.

This example uses Windows.

1. In R, let’s create a graph:

plot(iris$Sepal.Length, iris$Sepal.Width, main="My Graph", xlab="Width", ylab="Length")

2. You should see something like the graph below.  In the top left hand corner, choose File -> Save As -> Metafile, and save it somewhere convenient

mygraph3. Now, fire up PowerPoint and start with a blank slide.  Choose insert picture

insertpictureNavigate to the .emf file you just saved and choose it.  You should see:

inserted4. Now, right click the image, choose group -> ungroup:

ungroupYou will get a message asking if you want to convert the image to a Microsoft Drawing object.  Choose yes:


5. Now repeat step 4. Right click the image, choose group -> ungroup:

ungroupYou should now see something like this:

editableEvery single part of the graph is now selectable, moveable and editable!  Click outside of the chart area to de-select everything, and then click on individual components to edit them.  In the graph below I have changed the title, and given one of the data points a red background:



(I realise there are other ways to do this, perhaps better ways e.g. export as PDF then edit in Illustrator (thanks !  This post is really aimed at those familiar and comfortable with PowerPoint :-))

2014 in review

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here’s an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 160,000 times in 2014. If it were an exhibit at the Louvre Museum, it would take about 7 days for that many people to see it.

Click here to see the complete report.

You’re not allowed bioinformatics anymore

Ah welcome! Come in, come in!” said the institute director as Professor Smith appeared for their scheduled 2pm meeting. “I want to talk to you about your latest proposal”, the director continued.

“Oh?” replied Smith.

“Yes. Now, let’s see. It’s an amazing, visionary proposal, a great collaboration, and congratulations on pulling it together. I just have one question” said the director “This proposal will generate a huge amount of data – how do you plan to deal with it all?”

“Oh that’s easy!” answered Smith. “It’s all on page 6. We’ve requested funds to employ a bioinformatician for the lifetime of the project. They’ll deal with all of the data” he stated, triumphantly.

The director frowned.

“I see. Do you yourself have any experience of bioinformatics?”

Smith seemed uncertain.

“Well, no…..”

“Then how will you be able to guide the bioinformatician, to ensure they are using appropriate tools? How will you train them?” the director pressed

Smith appeared perplexed by the question.

“We’ll employ someone who has already been trained, with at least a Masters in bioinformatics! They should already know what they’re doing…” Smith trailed off.

The director sighed.

“And what papers will the bioinformatician publish?”

Smith regained some confidence.

“They’ll get co-authorship on all of the papers coming out of the project. The post-docs who do the work will be first author, I will be last author and the bioinformatician will be in the middle”

The director drummed his fingers on his desk.

“What about a data management plan?”

“A what?”

“A data management plan. A plan, to manage the data. Where will it be stored? How will it be backed up? When will it be released?” the director asked

“Same as always, I guess” said Smith. “We’ll release supporting data as supplementary PDFs, and we’ll make sure we get every last publication we possibly can before releasing the full data set”

The director shifted uneasily in his seat. “And data storage?”

“Don’t IT deal with that kind of stuff?” Smith answered.

An awkward silence settled over the office. The director stared at Professor Smith. Finally he broke the silence.

“OK, so you have this bioinformatician, you give them the data, and they analyse it and they give you the results. How will you ensure that they’ve carried out reproducible science?”

“Reproducible what? What the hell are you talking about?” Smith answered angrily.

The director slammed his hand down on the desk.

“At least tell me you have a plan for dealing with the sequence data!”

“Of course!” said Smith “We’ve been doing this for years. We’ll keep the sequences in Word documents….”

an amber light started flashing on the director’s desk

“… annotate genes by highlighting the sequence in blue…”

the flashing light turned red

“… annotate promoters by highlighting the sequence in orange…”

Smith’s sentence was interrupted by a noisy klaxon suddenly going off, accompanied by a bright blue flashing light that had popped up behind the director’s chair.  Smith looked wide-eyed, terrified.

The director pressed a few buttons on his desk and the noisy alarm ceased, the blue light disappeared.

Smith, removing his hands from his ears, asked “What the hell was that?”

The director stood, walked over to the window and sighed heavily. “I’m sorry, Smith. I had a feeling this might happen. Look… this may appear harsh, but… you’re not allowed bioinformatics anymore”


“As I said. You’ve crossed the threshold. You’re not allowed bioinformatics anymore”

Smith’s mouth flapped open and shut as he tried to take in the news.

“You mean no-one will analyse my data?”

The director turned to face Smith.

“Quite the contrary, Smith. Good data will always be welcome, and yours will be treated no differently. It’s just that you won’t be in charge of the storage and analysis of it anymore. You can generate the data, but that will be the end of your involvement. The data will be passed to a bioinformatics group who know what to do with it.”

Smith was furious.

“Are you insane? That’s my data! I can do whatever I like with it! Bioinformaticians won’t know what to do with it anyway!”

“On the contrary” replied the director “It’s not your data. Your research is funded by the government, which is in turn funded by the tax payer. The data belong in the public domain. As for bioinformaticians, they’re scientists too and they’ll be able to analyse your data just as well as you can, probably better”

“I’ve never heard anything so ridiculous! Who decided that I’m not allowed bioinformatics anymore?”

“The Universe.”

“The Universe? Why should the Universe say I’m not allowed bioinformatics anymore?”

“Because you haven’t paid bioinformatics enough attention. It’s not a support service, at your beck and call. It’s a science. Bioinformaticians are scientists too. Young bioinformaticians need support, guidance and training; something you’re clearly not qualified to provide. They also need first-author papers to advance their careers”

“I don’t understand. What do you mean, they’re not support?!” spluttered Smith.

The director continued regardless of the interruption.

“You’ve had the opportunity to learn about bioinformatics. We’ve had a bioinformatics research group at the institute for over ten years, yet you only ever speak to them at the end of a project when you’ve already generated the data and need their help!”

“The bioinformatics group?! They’re just a bunch of computer junkies!”

The director was beginning to get angry.

“Quite the opposite. They publish multiple research papers every year, and consistently bring in funding. More than your group, actually”.

Smith looked stunned.

“But, but, but… how can this be possible? You’ll never get away with this!”

“I’m afraid I can and I will” said the director. “Science has changed, Smith. It’s a brave new world out there. Bioinformatics is key to the success of many major research programmes and bioinformaticians are now driving those programmes. Those researchers who embrace bioinformatics as a new and exciting science will be successful and those that don’t will be left behind.”

The director stared pointedly at Professor Smith. Smith was defeated, but still defiant.

“It doesn’t matter. We have tons of data we haven’t published yet. I’ll be able to work on that for decades! I don’t need new data, I have plenty of existing data”.

A smile flittered at the corners of the director’s mouth.

“Here’s the thing, Smith. As soon as that alarm went off, all of your data were zipped into a .tar.gz archive and uploaded to the cloud. It’s no longer in your possession”.

Smith looked horrified.

“What’s the cloud? How do I access it? What is a .tar.gz file and how do I open it?”

“You know” said the director “keep asking questions like that, and you might get bioinformatics back”

If you are leading a project that creates huge amounts of data, instead of employing a bioinformatician in your own group, why not collaborate with an existing bioinformatics group and fund a post there? The bioinformatician will benefit hugely from being around more knowledgeable computational biologists, and will still be dedicated to your project.

The above was hugely Inspired by “Ballantyne T (2012) If only … Nature 489(7414):170-170”.  I hope Tony doesn’t mind.


How not to make your papers replicable

Titus Brown has written a somewhat idealistic post on replicable bioinformatics papers, so I thought I would write some of my ideas down too 🙂

1. Create a new folder to hold all of the results/analysis.  Probably the best thing to do is name it after yourself, so try “Dave” or “Kelly”.  If you already have folders with those names, just add an index number e.g. “Dave378” or “Kelly5142”

2. Put all of your analysis in a Perl script.  Call this analysis.pl.  If you need to update this script, don’t add a version number, simply call these new scripts “newanalysis.pl”, “latestanalysis.pl”, “newnewanalysis.pl”, “newestanalysis.pl” etc etc

3. Place a README in the directory.  Don’t put anything in the README.  Your PI will simply check that the README is there and be satisfied you are doing reproducible research.  They won’t ever read the README.

4. Write the paper in Word, called “paper.docx”.  Send it round to all co-authors, asking them to turn on track changes.  Watch in horror has 500 different versions come back to you, called things like “paper_edited.docx”, “paper_mw.docx”, “paper_new.docx” etc etc.  Open each one to see that it now looks like Salvadore Dali had an epilectic fit in a paint factory.

5. When reviewer comments come back 6 months later asking for some small detail to be changed, have a massive panic attack as you realise you have no idea how you did any of it.  Start the whole analysis again, in a new folder (“Dave379” or “Kelly5143”) and pray to God that you somehow miraculously come up with the same results and figures.

6. After the paper has been accepted, and the copy editor insists that all figures are 1200 dpi, first look up dpi so you know what it means, and then wrestle with R’s png() and jpeg() functions.  Watch as your PC grinds away for 300 hours to produce a scatterplot that, in area, is roughly the size of Russia and comes in at 30Tb.  Attempts to open it in an image viewer crash your entire network.

7. Weep silently with joy when someone tells you about ImageMagick, or that the journal will accept PDF images.

8. Upon publication, forget any of this ever happened.

On cheer-leading and pom-poms

Those of you whose libraries pay the over-priced subscription to Genome Biology may have read my good friend Neil Hall’s piece entitled “It’s only human”, relating to Illumina’s recent release of the HiSeq X Ten system.

In this rather droll paper, I am accused of being a pom-pom spinning Illumina cheer leader!  My goodness:


Of course, I think what Neil actually means is that I backed the winning horse, and continue to do so 😉 There are certain people, who shall remain unnamed, who spent a lot of money on those expensive door-stops otherwise known as ABI SOLiD machines; those same people have now almost completely traded those door-stops for Illumina HiSeqs.

To those people, I offer you my pom-poms.  Enjoy! 😀

We’re still recruiting!

The deadline to apply for most of our posts has passed, but we still have one vacancy left open.  To apply please go to https://www.vacancies.ed.ac.uk/ and enter the vacancy ref below:


  • Bioinformatician (Data Analyst) – working with the latest Illumina next-gen sequencing data on a diverse range of collaborative projects (Vacancy Ref: 022549)

© 2022 Opiniomics

Theme by Anders NorenUp ↑