All Blog Posts

Football Data Analytics: Translating Commentary into Statistics

Translating commentary into statistics, I've spent my week of Innovation Time developing a football data analytics platform, exploring machine learning and artificial intelligence to build a system that can translate commentary into stats, facts and figures. Opening the doors for local clubs to use data analysis that can compete with the likes of teams in the Premier League.


Football Data Analytics App

What inspired me to explore football data analytics?

I’ve been obsessed with football since I was a youngster.

To this day, the way that professional teams, managers and coaches use football data analytics to turn raw numbers into tangible player and game stats really fascinates me.

However, current approaches (the best-known being Opta sports data), can be a little bit convoluted. They require large-scale operations, meticulous hand-eye coordination and attention to detail that can only be achieved by a full team of people.

As a developer, I wanted to challenge myself:

Can recent trends in machine learning be leveraged to make this level of insight financially viable for the average consumer, as well as naturally ergonomic for football data analysts?

I spent my week of Innovation Time exploring these trends in machine learning to see if I could develop an AI that translates football commentary into statistics.

Bournemouth FC for StatChat Project

I was lucky enough to bump into the AFC Bournemouth team recently. As a Cherries fan myself, it’s no surprise I was inspired to take on this project!

Introducing StatChat: The football data analytics platform

Analysing football commentary:

An audio interface lets users stream spoken input that they want to analyse; converting speech to text and translating sentences, phrases and keywords into a language that the computer can understand, define and record.

Translating raw data into statistics:

The system is configured to understand pre-defined “micro-events” within a match. This should cover every conceivable occurrence, including substitutions, throw-ins, offsides, fouls, free-kicks, yellow and red cards, even additional time at the end of each half.

Visualising football data on a dashboard:

StatChat then feeds this data from the API to its dashboard, allowing a bunch of different systems to present the same data in various ways. This could be as simple as including images for each stat,  or as complex as presenting patterns and trends over time.

Football Data Analytics Interface
Football Data Analytics Interfaces

Teaching the system to understand football commentary:

The aim is to build a dictionary flexible enough that the end-user can describe the game as naturally as possible when logging micro-events within the match. To borrow from the much-missed world of retro mid-90s game shows… they can literally “say what they see”!

Recognising actions as micro-events:

The system understands a series of keywords which translate to actions within a game, such as “pass”, “shot”, “tackle”, “interception” and “goal” (we’re no strangers to conversational design).

An additional series of keywords serve as action “modifiers”, which describe the action in more detail so that the system can understand the nature of a micro-event. A pass can be long or short; a tackle can be standing or sliding; a shot can be on-target or off-target as well as long or short.

Football Data Analytics Platform
Football Data Analytics Tool

Understanding human conversations: is a Natural Language Processing (NLP) software that uses machine learning to let developers teach a system how to understand conversations, translating natural speech into a language that computers can better understand.

The software also provides the concept of synonyms, meaning that several different utterances (such as “passes to”, “plays to”, “gives the ball to”, “gives it to”, “lays it off for”, “threads it through towards” etc.) can all map to a single normalised action that Wit has been trained with (in this case, “pass”).

The video below provides a great example of how processes natural language and understands the intent behind what we’re saying:

Interpreting subjects and objects:

Once the API receives an initial analysis from Wit, further processing is performed to determine the subjects and objects of the micro-event. These concepts are relatively self-explanatory; a subject is the team (and optionally player) who is performing the action, and an object is the team (and optionally player) who is having the action performed to them.

This can either be explicitly defined in the initial utterance that is sent to Wit (in the form of “home team” or “away team”), otherwise, the subject/object data is extrapolated from the current team in possession.

Most actions (such as “pass” and “shot”) dictate that the team currently in possession is the micro-event’s subject. However, other actions, such as a “tackle”, would invert this principle. It’s unlikely that a player would make a tackle if their team were already in possession, therefore the team NOT currently in possession becomes the inferred subject of the micro-event.

Football data analytics in action:

For example, a voice command sequence such as “kick-off home 9, pass 7, pass 5, long pass 10, interception 4, pass 7” implies the following:

  • The game begins with Home team player 9 assuming possession
  • Home team player 9 passes to Home team player 7
  • Home team player 7 passes to Home team player 5
  • Home team player 5 performs a long pass to Home team player 10
  • Away team player 4 intercepts this pass and assumes possession
  • Away team player 4 passes to Away team player 7.

Giving local clubs tech that can compete in the big leagues

For years coaches and managers have sat around computer screens counting different statistics, from possession and passes, to the number of steps Ronaldo took in his last appearance. But what if you don’t play at Wembley with millions to invest in data analysis or the resources available to record every stat for every player in every game?

That’s where StatChat comes into play.

With the considerable cost savings for clubs looking to record stats for their games (only requiring an audio recording, rather than a number of videos), these clubs can avoid having to spend hours analysing footage and manually recording statistics. Which means that the StatChat app can be used by anyone with a passion to improve their team, not just those who have the budget and resources to do so.

Following on from the ideas around local clubs using our system, once I’ve had the time to work on the project further I hope to develop a data dashboard to visualise these statistics in real-time. Providing a more engaging way of presenting and displaying information on clubs, leagues and players, this could be displayed in clubhouses or at the end-of-season awards.

Meetball Bournemouth Placement Student

Cube FC: Our team of developers and designers at the Meetball beach football tournament in Bournemouth last year.

The future of football data analytics?

Artificial intelligence & machine learning.

Both artificial intelligence (AI) and machine learning (ML) are trends that have changed the way brands and organisations reach and engage with their audiences, from apps recognising objects to chatbots interacting with humans.

However, it’s important to note that the value in these tech trends arise when they’re both used simultaneously; machine learning being an application of artificial intelligence.

In fact, any potential application of artificial intelligence will need some degree of machine learning, the best use-cases for this technology do exactly that, adding value to brands who use AI and ML to support their processes and user-experience, rather than replace it entirely.


I plan on moving forward with the project by training the system to handle data analytics in sports other than football. If you have any ideas on how I can improve or implement the system, get in touch!

Published on December 6, 2018, last updated on February 24, 2020

Give us the seal of approval!

Like what you read? Let us know.



Explore Tech of the Future

We dedicate 15% of our team's time to explore emerging technology and work on projects they're passionate about. So far we've developed a Jenga game in augmented reality, an app that mimics the human eyes and an interactive map that tacks natural disasters in real-time.

Read more

Latest from the blog: