Yesterday, David Moyes, manager of Manchester United, complained that the fixture list for the new season might have been manipulated to give them a difficult start to the season.

Immediately, I wanted to know – how wrong is he?

Moyes is complaining that, in the first 5 games, Man Utd have to face Liverpool, Chelsea and Man City – 3 very difficult games. So what is the probability of that happening? We can look at it with some simple simulations in R:

# read a list of current UK premier league clubs clubs <- read.table("http://www.opiniomics.org/wp-content/uploads/2013/08/clubs.doc", stringsAsFactors=FALSE, header=FALSE)[,1] # there is ManUtd and then "others" manutd <- "ManUtd" others <- clubs[-grep("ManUtd", clubs)] # create data.frames of all possible home and away fixtures home <- data.frame(home=rep(manutd,19), away=others, stringsAsFactors=FALSE) away <- data.frame(home=others, away=rep(manutd,19), stringsAsFactors=FALSE) # create an empty data.frame to hold the fixture list fixtures <- data.frame(home=rep("",38), away=rep("",38), stringsAsFactors=FALSE) # home and away games generally alternate so we create home and away index # vectors to reflect this home.idx <- seq(1, 38, by=2) away.idx <- seq(2, 38, by=2) # these are the clubs Moyes wants to avoid avoid <- c("Liverpool","ManCity","Chelsea") # the number of simulations nsim <- 100 # a vector to hold the results results <- vector(length=nsim) # run the simulations for(i in 1:nsim) { # randomly assign the home and away fixtures to the home and away indices # of the fixture list fixtures[home.idx,] <- home[sample(1:19,19),] fixtures[away.idx,] <- away[sample(1:19,19),] # only look at the first five games opponents <- fixtures[1:5,] # a variable to record how many "bad" games Moyes gets bad <- 0 # iterate over the teams Moyes wants to avoid and count them if they # occur as either a home or away fixture for (a in avoid) { bad <- bad + nrow(opponents[opponents$home==a,]) bad <- bad + nrow(opponents[opponents$away==a,]) } results[i] <- bad } # the probability is length(results[results>=3]) / nsim

When I run this in R, I get a probability of 0.03, or 3%.

So has Moyes been unlucky? Well, perhaps not. The teams Liverpool, Man City and Chelsea are just a few of the difficult teams Moyes could have faced, and any 3 difficult fixtures in the first 5 games would surely have had him moaning too. For example, if he had to face Arsenal, Tottenham and Everton (who finished above Liverpool last season), he might also moan about the fixtures.

So let’s run the simulation again, adding in the new teams:

# these are the clubs Moyes wants to avoid avoid <- c("Liverpool","ManCity","Chelsea","Arsenal","Everton","Tottenham") # the number of simulations nsim <- 100 # a vector to hold the results results <- vector(length=nsim) # run the simulations for(i in 1:nsim) { # randomly assign the home and away fixtures to the home and away indices # of the fixture list fixtures[home.idx,] <- home[sample(1:19,19),] fixtures[away.idx,] <- away[sample(1:19,19),] # only look at the first five games opponents <- fixtures[1:5,] # a variable to record how many "bad" games Moyes gets bad <- 0 # iterate over the teams Moyes wants to avoid and count them if they # occur as either a home or away fixture for (a in avoid) { bad <- bad + nrow(opponents[opponents$home==a,]) bad <- bad + nrow(opponents[opponents$away==a,]) } results[i] <- bad } # the probability is length(results[results>=3]) / nsim

This time I get a much higher probability of 0.22, or 22% – a greater than 1 in 5 chance! So not that unlikely after all….

Perhaps you should have paid attention in Maths class, David? 🙂