Can an AI algorithm win fantasy football?

If you’ve heard the term moneyball, then you’ll know that in 2002 the Oakland ‘A’s Major League Baseball team began to use statistical analysis to identify and sign undervalued players, in order to compete against their richer competitors. The approach is credited with getting them to the playoffs in both 2002 and 2003 and has since been adopted more widely.

In the football world, Brentford FC are reportedly embarking on a similar journey using the data-driven approach proven at their Danish sister club.

So could an AI algorithm win fantasy football?

The short answer is probably yes (and maybe already has!), but here is how a less successful attempt went this season!


Fantasy Football 101

The English Premier League’s free fantasy football competition lets you pick a squad of 15 players from the various Premier League teams, with the aim of seeing how your squad performs compared to other peoples. There are restrictions on the make up of your squad, including a maximum cost that prevents you loading up with too many top players.

Each week you select a team of 11 players from your squad to “play”, and then score points depending on how well those players perform in actual matches that week. If a player is not performing then you can transfer them out and another player in. There are limits on the number of free transfers that you can make and player costs change depending on demand (making popular players less affordable). There are two opportunities to make unlimited transfers during the season by playing a transfer wildcard.


The goal

My goal was to have a fully automated team entered into the competition. The algorithm would select a squad at the start of the season, collect data each week, make any transfers and select the best team to field based on that data. I didn’t achieve full automation this season, and so manually applied aspects of the algorithm instead.


The algorithm

Firstly, this algorithm isn’t really AI. It’s more of a first attempt to work with some of the data involved!

Initial squad selection: Step 1 was to select the best combination of players that met the cost and team structure rules. The query I developed attempted to ”brute force” valid combinations of players and then order those by overall score last season. There were some obvious downsides to this approach!

  • The query ignored players that were not involved last season (new signings and those from promoted clubs)
  • The highest scoring combination of players wouldn’t necessarily have won last season (transfers and captain choices play a large part in overall score).
  • The number of combinations considered had to be limited in order for the query to run in a reasonable time
  • The best scoring combination I could identify was some way off both the actual top scores from last season and the theoretical maximum for a squad with no cost restrictions (the best 2 keepers, plus the best 5 defenders, etc.)

On a weekly basis: Gather data, make transfers and pick team.

  • Get latest player performance data
  • Get performance data for my squad
  • Get betting odds for forthcoming games (which teams are most likely to win)
  • Transfer out the player with the worst form and/or least likely to play
  • Transfer in a player with better form from a team with good odds (and within cost rules, etc.)
  • Select a team based on home advantage and odds for the forthcoming games (e.g. favor Chelsea at home)

An initial issue with the transfer approach was that budget bench players would be transferred out until the point where the algorithm could no longer make a useful transfer and expensive players were sat on the bench tying up funds. Consequently I adapted the rule after the first half of the season to generally leave budget bench players alone and only transfer the regular players.

Periodic squad reselection: At two points during the season the squad was reset using the transfer wildcards. I tried different approaches to the selection on each occasion. The variations tried involved:

  • Pre-selecting best value budget bench players (playing regularly and picking up some points)
  • Consider form and value as well as overall points when selecting main players


The results

The squad was entered from week two.

It finished third in a small private league of eight (second if you discard the points from the first week).

It had an overall rank of 455,222 out of 3,502,998 (I.e. in the top 10-15%)

And was ranked 5,412 out of those entering in game week two.


What would I do differently next time?

The fantasy premier league form calculation wasn’t wholly reliable. Their form calculation is an average over a thirty day period, consequently a player who scored very well 3 or 4 games ago but has since been poor, can still have a relatively high form. Instead it might be better to recalculate form, factoring in how consistently the player is scoring points and/or whether their performance is trending up or down.

Too many selection factors were retrospective – just because a player has done well doesn’t mean that they will continue to do so.  Instead it would be interesting to gather more predictive data (e.g. fan tips) and try to identify the factors that are most linked to future performance.

It would be nice to more fully automate the processing and team management. The tool-set used should allow this to be achieved.


Tool-set used

Data was gathered using Python and HTQL and stored in a Microsoft Azure SQL Server database.

Analysis was performed using SQL queries.

Add column to SSRS Excel export

A situation arose today where the key fields for an SSRS report just fitted into an A4 landscape page, but additional fields would be useful when exporting to Excel or CSV. Others seem to have had the same requirement and suggest setting the column visibility depending upon the render format. The suggestion works nicely once tweaked to cater for newer versions of Excel!

Right click on the column and change the column visibility to “Show or hide based on an expression”:


And then set the expression to the following:

=NOT(UCASE(LEFT(Globals!RenderFormat.Name,5))="EXCEL" OR (Globals!RenderFormat.Name)="CSV")

The expression above is for the case where you only want the column to only show when exported to Excel or a CSV. The tweak from other solutions is to check that the render format starts with, instead of just equals, “EXCEL”.

Twelfth Day of Christmas

A festive post for the twelfth day of Christmas and one way to avoid taking the decorations down for a while longer!

If you have each gift from the “Twelve days of Christmas” song as an individual record in a SQL Server database table, can you write a query that returns one row for each verse of the song?

Yes you can. This problem is similar to other cases where you need to flatten multiple rows into a single string. In this case for each day we want a comma separated list of the gifts for that day and all preceding days.

Another case you may have come across is flattening multiple access rights per user into a single row per user, where each row has a comma separated list of that user’s rights:

User Access Rights
Anne Read, Write
Bob Read
Chad Admin, Read, Write

Back to the twelve days of Christmas, first create a table to contain the gifts:

CREATE TABLE #christmas_gifts (day_num INT, gift VARCHAR(30))

And insert the data:

INSERT INTO #christmas_gifts VALUES (1, 'a Partridge in a Pear Tree')
INSERT INTO #christmas_gifts VALUES (2, '2 Turtle Doves')
INSERT INTO #christmas_gifts VALUES (3, '3 French Hens')
INSERT INTO #christmas_gifts VALUES (4, '4 Calling Birds')
INSERT INTO #christmas_gifts VALUES (5, '5 Gold Rings')
INSERT INTO #christmas_gifts VALUES (6, '6 Geese-a-Laying')
INSERT INTO #christmas_gifts VALUES (7, '7 Swans-a-Swimming')
INSERT INTO #christmas_gifts VALUES (8, '8 Maids-a-Milking')
INSERT INTO #christmas_gifts VALUES (9, '9 Ladies Dancing')
INSERT INTO #christmas_gifts VALUES (10, '10 Lords-a-Leaping')
INSERT INTO #christmas_gifts VALUES (11, '11 Pipers Piping')
INSERT INTO #christmas_gifts VALUES (12, '12 Drummers Drumming')

And here is the query to return each verse of the song as a single row:

 'On the '
 + CAST(theday.day_num AS VARCHAR)
 + CASE 
   WHEN theday.day_num%10=1 AND theday.day_num%100<>11 THEN 'st'
   WHEN theday.day_num%10=2 AND theday.day_num%100<>12 THEN 'nd'
   WHEN theday.day_num%10=3 AND theday.day_num%100<>13 THEN 'rd'
   ELSE 'th' 
 + ' day of Christmas SQL gave to me '
     FROM #christmas_gifts todate 
     WHERE todate.day_num <= theday.day_num
     ORDER BY todate.day_num DESC
     FOR XML PATH('')),
    '</gift><gift>a ', ' and a '),
    '</gift><gift>', ', '),
    '</gift>', ''),
    '<gift>', '') AS verse
FROM #christmas_gifts theday

The approach used above has been around for a while and a quick Google will turn up plenty of background material. The inner query selects the list of gifts for the day and its previous days as an XML structure (SELECT … WHERE todate.day_num <= theday.day_num … FOR XML). We then replace the XML tags with commas or “and”.

If you want to get a feel for the XML that is returned by the inner select then you can run that part of the query and take a look, e.g. if you run the inner select for day 2:

FROM #christmas_gifts todate 
WHERE todate.day_num <= 2
ORDER BY todate.day_num DESC

The XML you should get back is:

<gift>2 Turtle Doves</gift><gift>a Partridge in a Pear Tree</gift>

You may spot that this isn’t well formed XML, because there is no containing element (such as <gifts>…</gifts>). The absence of a containing element was specified with the empty string in the PATH(”) part of the FOR XML clause. Leaving out the containing element made the subsequent string replacements a bit simpler (arguably).