Yesterday, David Moyes, manager of Manchester United, complained that the fixture list for the new season might have been manipulated to give them a difficult start to the season.

Immediately, I wanted to know – how wrong is he?

Moyes is complaining that, in the first 5 games, Man Utd have to face Liverpool, Chelsea and Man City – 3 very difficult games. So what is the probability of that happening? We can look at it with some simple simulations in R:

# read a list of current UK premier league clubs clubs <- read.table("http://www.opiniomics.org/wp-content/uploads/2013/08/clubs.doc", stringsAsFactors=FALSE, header=FALSE)[,1] # there is ManUtd and then "others" manutd <- "ManUtd" others <- clubs[-grep("ManUtd", clubs)] # create data.frames of all possible home and away fixtures home <- data.frame(home=rep(manutd,19), away=others, stringsAsFactors=FALSE) away <- data.frame(home=others, away=rep(manutd,19), stringsAsFactors=FALSE) # create an empty data.frame to hold the fixture list fixtures <- data.frame(home=rep("",38), away=rep("",38), stringsAsFactors=FALSE) # home and away games generally alternate so we create home and away index # vectors to reflect this home.idx <- seq(1, 38, by=2) away.idx <- seq(2, 38, by=2) # these are the clubs Moyes wants to avoid avoid <- c("Liverpool","ManCity","Chelsea") # the number of simulations nsim <- 100 # a vector to hold the results results <- vector(length=nsim) # run the simulations for(i in 1:nsim) { # randomly assign the home and away fixtures to the home and away indices # of the fixture list fixtures[home.idx,] <- home[sample(1:19,19),] fixtures[away.idx,] <- away[sample(1:19,19),] # only look at the first five games opponents <- fixtures[1:5,] # a variable to record how many "bad" games Moyes gets bad <- 0 # iterate over the teams Moyes wants to avoid and count them if they # occur as either a home or away fixture for (a in avoid) { bad <- bad + nrow(opponents[opponents$home==a,]) bad <- bad + nrow(opponents[opponents$away==a,]) } results[i] <- bad } # the probability is length(results[results>=3]) / nsim

When I run this in R, I get a probability of 0.03, or 3%.

So has Moyes been unlucky? Well, perhaps not. The teams Liverpool, Man City and Chelsea are just a few of the difficult teams Moyes could have faced, and any 3 difficult fixtures in the first 5 games would surely have had him moaning too. For example, if he had to face Arsenal, Tottenham and Everton (who finished above Liverpool last season), he might also moan about the fixtures.

So let’s run the simulation again, adding in the new teams:

# these are the clubs Moyes wants to avoid avoid <- c("Liverpool","ManCity","Chelsea","Arsenal","Everton","Tottenham") # the number of simulations nsim <- 100 # a vector to hold the results results <- vector(length=nsim) # run the simulations for(i in 1:nsim) { # randomly assign the home and away fixtures to the home and away indices # of the fixture list fixtures[home.idx,] <- home[sample(1:19,19),] fixtures[away.idx,] <- away[sample(1:19,19),] # only look at the first five games opponents <- fixtures[1:5,] # a variable to record how many "bad" games Moyes gets bad <- 0 # iterate over the teams Moyes wants to avoid and count them if they # occur as either a home or away fixture for (a in avoid) { bad <- bad + nrow(opponents[opponents$home==a,]) bad <- bad + nrow(opponents[opponents$away==a,]) } results[i] <- bad } # the probability is length(results[results>=3]) / nsim

This time I get a much higher probability of 0.22, or 22% – a greater than 1 in 5 chance! So not that unlikely after all….

Perhaps you should have paid attention in Maths class, David? 🙂

1st September 2013 at 5:45 pm

But why simulate? With nsim=100, your estimate isn’t particularly accurate. With larger nsim, it gets rather slow (on my laptop).

An exact solution is available. To get at least 3 bad fixtures in the first 5 games, assuming for convenience they start with a home game, Man U need to get any of the following

(1) first 3 home fixtures from the 6 teams they want to avoid: number of possibilities = choose(6,3)

(2) two home and one away, or two home and two away, or one home and two away from those 6: choose(6,c(2,2,1)) * choose(13,c(1,1,2)) * choose(6,c(1,2,2)) * choose(13,c(1,0,0))

There are choose(19,3) possible first 3 home fixtures and choose(19,3)*choose(19,2) possible first 5 fixtures, home and away.

So the solution can be found by:

> choose(6,3)/choose(19,3) +

+ sum(choose(6,c(2,2,1)) * choose(13,c(1,1,2)) * choose(6,c(1,2,2)) * choose(13,c(1,0,0)))/prod(choose(19,c(2,3)))

[1] 0.1724513

1st September 2013 at 6:13 pm

Because siimulating is far more fun and takes into account stochasticity 🙂 10k simulations takes just a few minutes on my desktop PC.

1st September 2013 at 8:01 pm

And perhaps here lies the difference between bioinformatics and statistics ;o)

21st January 2014 at 8:11 am

Thanks for sharing, I like David Moyes.