## Opiniomics

### bioinformatics, genomes, biology etc. "I don't mean to sound angry and cynical, but I am, so that's how it comes across"

Yesterday, David Moyes, manager of Manchester United, complained that the fixture list for the new season might have been manipulated to give them a difficult start to the season.

Immediately, I wanted to know – how wrong is he?

Moyes is complaining that, in the first 5 games, Man Utd have to face Liverpool, Chelsea and Man City – 3 very difficult games.  So what is the probability of that happening?  We can look at it with some simple simulations in R:

```# read a list of current UK premier league clubs

# there is ManUtd and then "others"
manutd <- "ManUtd"
others <- clubs[-grep("ManUtd", clubs)]

# create data.frames of all possible home and away fixtures
home <- data.frame(home=rep(manutd,19), away=others, stringsAsFactors=FALSE)
away <- data.frame(home=others, away=rep(manutd,19), stringsAsFactors=FALSE)

# create an empty data.frame to hold the fixture list
fixtures <- data.frame(home=rep("",38), away=rep("",38), stringsAsFactors=FALSE)

# home and away games generally alternate so we create home and away index
# vectors to reflect this
home.idx <- seq(1, 38, by=2)
away.idx <- seq(2, 38, by=2)

# these are the clubs Moyes wants to avoid
avoid <- c("Liverpool","ManCity","Chelsea")

# the number of simulations
nsim <- 100

# a vector to hold the results
results <- vector(length=nsim)

# run the simulations
for(i in 1:nsim) {

# randomly assign the home and away fixtures to the home and away indices
# of the fixture list
fixtures[home.idx,] <- home[sample(1:19,19),]
fixtures[away.idx,] <- away[sample(1:19,19),]

# only look at the first five games
opponents <- fixtures[1:5,]

# a variable to record how many "bad" games Moyes gets

# iterate over the teams Moyes wants to avoid and count them if they
# occur as either a home or away fixture
for (a in avoid) {
}
}

# the probability is
length(results[results>=3]) / nsim```

When I run this in R, I get a probability of 0.03, or 3%.

So has Moyes been unlucky?  Well, perhaps not.  The teams Liverpool, Man City and Chelsea are just a few of the difficult teams Moyes could have faced, and any 3 difficult fixtures in the first 5 games would surely have had him moaning too.  For example, if he had to face Arsenal, Tottenham and Everton (who finished above Liverpool last season), he might also moan about the fixtures.

So let’s run the simulation again, adding in the new teams:

```# these are the clubs Moyes wants to avoid
avoid <- c("Liverpool","ManCity","Chelsea","Arsenal","Everton","Tottenham")

# the number of simulations
nsim <- 100

# a vector to hold the results
results <- vector(length=nsim)

# run the simulations
for(i in 1:nsim) {

# randomly assign the home and away fixtures to the home and away indices
# of the fixture list
fixtures[home.idx,] <- home[sample(1:19,19),]
fixtures[away.idx,] <- away[sample(1:19,19),]

# only look at the first five games
opponents <- fixtures[1:5,]

# a variable to record how many "bad" games Moyes gets

# iterate over the teams Moyes wants to avoid and count them if they
# occur as either a home or away  fixture
for (a in avoid) {
}
}

# the probability is
length(results[results>=3]) / nsim```

This time I get a much higher probability of 0.22, or 22% – a greater than 1 in 5 chance!  So not that unlikely after all….

Perhaps you should have paid attention in Maths class, David? 🙂

1. But why simulate? With nsim=100, your estimate isn’t particularly accurate. With larger nsim, it gets rather slow (on my laptop).

An exact solution is available. To get at least 3 bad fixtures in the first 5 games, assuming for convenience they start with a home game, Man U need to get any of the following
(1) first 3 home fixtures from the 6 teams they want to avoid: number of possibilities = choose(6,3)
(2) two home and one away, or two home and two away, or one home and two away from those 6: choose(6,c(2,2,1)) * choose(13,c(1,1,2)) * choose(6,c(1,2,2)) * choose(13,c(1,0,0))

There are choose(19,3) possible first 3 home fixtures and choose(19,3)*choose(19,2) possible first 5 fixtures, home and away.

So the solution can be found by:
> choose(6,3)/choose(19,3) +
+ sum(choose(6,c(2,2,1)) * choose(13,c(1,1,2)) * choose(6,c(1,2,2)) * choose(13,c(1,0,0)))/prod(choose(19,c(2,3)))
[1] 0.1724513

• Because siimulating is far more fun and takes into account stochasticity 🙂 10k simulations takes just a few minutes on my desktop PC.

• And perhaps here lies the difference between bioinformatics and statistics ;o)

2. Thanks for sharing, I like David Moyes.