Yesterday, David Moyes, manager of Manchester United, complained that the fixture list for the new season might have been manipulated to give them a difficult start to the season.

Immediately, I wanted to know – how wrong is he?

Moyes is complaining that, in the first 5 games, Man Utd have to face Liverpool, Chelsea and Man City – 3 very difficult games.  So what is the probability of that happening?  We can look at it with some simple simulations in R:

# read a list of current UK premier league clubs
clubs <- read.table("http://www.opiniomics.org/wp-content/uploads/2013/08/clubs.doc", 
		stringsAsFactors=FALSE, header=FALSE)[,1]

# there is ManUtd and then "others"
manutd <- "ManUtd"
others <- clubs[-grep("ManUtd", clubs)]

# create data.frames of all possible home and away fixtures
home <- data.frame(home=rep(manutd,19), away=others, stringsAsFactors=FALSE)
away <- data.frame(home=others, away=rep(manutd,19), stringsAsFactors=FALSE)

# create an empty data.frame to hold the fixture list
fixtures <- data.frame(home=rep("",38), away=rep("",38), stringsAsFactors=FALSE)

# home and away games generally alternate so we create home and away index
# vectors to reflect this
home.idx <- seq(1, 38, by=2)
away.idx <- seq(2, 38, by=2)

# these are the clubs Moyes wants to avoid
avoid <- c("Liverpool","ManCity","Chelsea")

# the number of simulations
nsim <- 100

# a vector to hold the results
results <- vector(length=nsim)

# run the simulations
for(i in 1:nsim) {

	# randomly assign the home and away fixtures to the home and away indices
	# of the fixture list
	fixtures[home.idx,] <- home[sample(1:19,19),]
	fixtures[away.idx,] <- away[sample(1:19,19),]

	# only look at the first five games
	opponents <- fixtures[1:5,]

	# a variable to record how many "bad" games Moyes gets
	bad <- 0

	# iterate over the teams Moyes wants to avoid and count them if they
	# occur as either a home or away fixture
	for (a in avoid) {
		bad <- bad + nrow(opponents[opponents$home==a,])
		bad <- bad + nrow(opponents[opponents$away==a,])
	}
	results[i] <- bad
}

# the probability is
length(results[results>=3]) / nsim

When I run this in R, I get a probability of 0.03, or 3%.

So has Moyes been unlucky?  Well, perhaps not.  The teams Liverpool, Man City and Chelsea are just a few of the difficult teams Moyes could have faced, and any 3 difficult fixtures in the first 5 games would surely have had him moaning too.  For example, if he had to face Arsenal, Tottenham and Everton (who finished above Liverpool last season), he might also moan about the fixtures.

So let’s run the simulation again, adding in the new teams:

# these are the clubs Moyes wants to avoid
avoid <- c("Liverpool","ManCity","Chelsea","Arsenal","Everton","Tottenham")

# the number of simulations
nsim <- 100

# a vector to hold the results
results <- vector(length=nsim)

# run the simulations
for(i in 1:nsim) {

	# randomly assign the home and away fixtures to the home and away indices
	# of the fixture list
	fixtures[home.idx,] <- home[sample(1:19,19),]
	fixtures[away.idx,] <- away[sample(1:19,19),]

	# only look at the first five games
	opponents <- fixtures[1:5,]

	# a variable to record how many "bad" games Moyes gets
	bad <- 0

	# iterate over the teams Moyes wants to avoid and count them if they
	# occur as either a home or away  fixture
	for (a in avoid) {
		bad <- bad + nrow(opponents[opponents$home==a,])
		bad <- bad + nrow(opponents[opponents$away==a,])
	}
	results[i] <- bad
}

# the probability is
length(results[results>=3]) / nsim

This time I get a much higher probability of 0.22, or 22% – a greater than 1 in 5 chance!  So not that unlikely after all….

Perhaps you should have paid attention in Maths class, David? 🙂