Monthly Archives: September, 2012

First pass at ball possession

Ball possession is key in a soccer game. I guess there is a positive correlation between ball possession and game result. Of course, as discussed (namely) here, things are not as simple as that:


But, above the overall percentage of possession of the ball, I am interested to know more about evolution of ball possession during a game.

Specifically, based on the sample file provided by MCFCAnalytics and Opta sports, I’m interested to know more about ball possession of our two teams. For example, I want to see if there is an increase or a decrease of ones team possession as the game progress. I also want to see if the team protecting its 3-2 advantage manages to increase ball possession in order to take control of the game. Conversely, I want to see if the team desesperately trying to score the tie goal worked into creating good sequences of possession of the ball in order to build solid attacks. I’d also want to see if fatigue, lack of timing and so on could be seen through evolution of ball possession.

Finally, as ball possession kinda rythms the game, I want to see how possessions of the ball in offensive zones were managed by the two teams all along the game. This last point will help us go from quantitative to spatial observations leading us eventually to patterns of play.

And of course, the goal being to build something that can lead us to realtime analytics.

So, our first steps is to represent possession of the ball by the two teams. To do that, I’ve written a little Python script (see previous post for basis) that does this:
– distribute each team’s events in respective arrays
– each event is associated to a timestamp (min * 60 + sec)
– sort arrays
– pipe out the output into two files (one for team 30 and one for team 43)

To show the data, I used a really cool javascript JQuery plugin which name is Flot. Flot displays data nicely without too much effort.

First pass is rough but it looks promising:

Now if we focus on approx time when goals were scored (see previous post), we would have this (tick in sec):
goal scored at by
1514 1-0 43
2219 2-0 43
2334 2-1 43
2799 3-1 43
3726 3-2 43

So let’s add another data series corresponding to goals (value 20 on Y axis) so we could see goals. Then with a focus on time of goals, we have for goal number one:

In this case, it is easy to notice a clear domination of ball possession by club 43 in the moments preceeding the goal.

Good, so this one first pass.

Next, we will have to focus more precisely on ball possession (and not team-related events in time) and on measurement of time possession in order to study its evolution along the game. We will also introduce spatial criterias.


Let’s warm up: goal scan!!!

Pretty simple exercise to play and get familiar with the MCFC Bolton_ManCityF24.xml file.

We’re going to parse the file and simply count goals (Event/Type_id=”16″). We will print each goal event plus the team code (30 or 43) and the min:sec of the event.

First conclusion you get when you look at the F24 file is that it is not deep or heavily nested. Everything is organized around “Event”. Every event is associated with shared set of attributes. Quite simple, quite straightforward.

That being said, for more complex operations, it’ll require a tool like high performance in-memory datastore Redis. We’ll see.

So, snippet of code looks like this:

# Little warmup with the file
# We’ll parse the entire file and look for Event/type_id = ’16’ (goal)
# And we’ll output goal + team + timestamp

from lxml import etree
inFile = “./Downloads/Bolton_ManCityF24.xml”
xmlData = etree.parse(inFile) #etree.parse() opens and parses the data
# proceed to loop on all Events
events = xmlData.findall(“//Event”)
for event in events:
    eventType = event.attrib[‘type_id’]
    team = event.attrib[‘team_id’]
    tick = event.attrib[‘min’] + “:” + event.attrib[‘sec’]
    if eventType == ’16’:
        print “GOAL!” + ” for team ” + team + ” at ” + tick

And output simply is:

GOAL! for team 43 at 25:14
GOAL! for team 43 at 36:59
GOAL! for team 30 at 38:54
GOAL! for team 43 at 46:39
GOAL! for team 30 at 62:6

So team 43 won 3 to 2. Yeah!

For the record, the code was executed on an Ubuntu 11.04 LST equipped box. Current version of Python is 2.7.3

Setting the stage for realtime analytic tools

Hi all,

Following release by MCFC and Opta of extensive games data, I decided to explore possibilities to push soccer analytics to real-time domain. The goal being to provide real-time tools that will complement live assessments in order to support coaches in their tactical and strategic decisions during a game.

Nothing replaces deep knowledge and vast experience. Those real-time tools won’t ever replace those skills but they will bring a quantitative dimension completing what coaches observe and analyze on the field.

Based on live data, those tools will bring in live intelligence about things coaches obviously see like shots, deep penetration in offensive zone, turnovers, etc. but those events would now be seen in an continuous quantitative perspective.

Those tools would also highlight things that are not seen at first glance, like:

  • fluctuation of the rate of ball takeovers in midfield that are converted into deep penetration in opponent’s territory
  • fluctuation of time possession, of offensive zone penetration
  • fluctuation of time to get the ball out of the defensive zone
  • and so on

Those tools would also enable alerting linked to preset thresholds.

So, this is the area I intend to explore hoping to have fun and to bring something that can be useful someday.


Hello world!

Welcome to! This is your very first post. Click the Edit link to modify or delete it, or start a new post. If you like, use this post to tell readers why you started this blog and what you plan to do with it.

Happy blogging!