QOA Quality of attack – a real time metric
Our goal is to develop a real time tool that will support soccer coaches decisions DURING a game. The first version of this tool – I guess we can call that a mockup – will be based on the F24 data graciously shared by MCFC Analytics and Opta.
At some point I will provide technical information but one key aspect is that tool will simulate the clock while reading the events in the file in “real time”.
Another key aspect is the metrics that will be shown (progressing in time) and that will ultimately used to support in-game decisions. For the moment, we will stick at 3 metrics:
Possession of the ball
- Cumulative possession of the ball (inspired by great work done by Ravi Raminieri at analysefootball.com)
- Number of possessions
- Mean time of possession
Passes
- Success versus attempts ratio
- Number of passes (Did you see the stats of yesterdays Liverpool-Udinese game where Udinese won 3-2 with 4-5 times less successful passes ? This is a humility lesson for us socceranalytics fans 🙂
Quality of attack (lets call that QOA)
QOA is something I will try to develop and see where it can lead us. The idea is simple: every possession will be rated. The highest rate will mean that the “quality” of a given possession sequence is high. The goal for a team being to build succession of of high quality possession sequences or, conversely in a defensive perspective, the goal would be to reduce the opponent’s QOAs.
The recipe is not completely done yet but QOA rate (for one possession sequence) would be calculated based on this:
- number of passes
- length in time
- physical penetration of offensive zone
- total perimeter (total space covered)
- conclusion of possession sequence (goal, shot and so on)
Once again, the idea is to rate in real time a sequence of actions in order to quantify what a team is building (or not).
More to come on this!
First pass at ball possession
Ball possession is key in a soccer game. I guess there is a positive correlation between ball possession and game result. Of course, as discussed (namely) here, things are not as simple as that:
1. http://www.zonalmarking.net/2012/05/04/the-relationship-between-possession-and-shots/
2. http://www.mlssoccer.com/numerology/news/article/2012/06/21/central-winger-getting-ball-and-get-defensive
3. http://www.soccerbythenumbers.com/2012/07/pass-accuracy-and-possession-supremacy.html
But, above the overall percentage of possession of the ball, I am interested to know more about evolution of ball possession during a game.
Specifically, based on the sample file provided by MCFCAnalytics and Opta sports, I’m interested to know more about ball possession of our two teams. For example, I want to see if there is an increase or a decrease of ones team possession as the game progress. I also want to see if the team protecting its 3-2 advantage manages to increase ball possession in order to take control of the game. Conversely, I want to see if the team desesperately trying to score the tie goal worked into creating good sequences of possession of the ball in order to build solid attacks. I’d also want to see if fatigue, lack of timing and so on could be seen through evolution of ball possession.
Finally, as ball possession kinda rythms the game, I want to see how possessions of the ball in offensive zones were managed by the two teams all along the game. This last point will help us go from quantitative to spatial observations leading us eventually to patterns of play.
And of course, the goal being to build something that can lead us to realtime analytics.
So, our first steps is to represent possession of the ball by the two teams. To do that, I’ve written a little Python script (see previous post for basis) that does this:
– distribute each team’s events in respective arrays
– each event is associated to a timestamp (min * 60 + sec)
– sort arrays
– pipe out the output into two files (one for team 30 and one for team 43)
To show the data, I used a really cool javascript JQuery plugin which name is Flot. Flot displays data nicely without too much effort.
First pass is rough but it looks promising:
Now if we focus on approx time when goals were scored (see previous post), we would have this (tick in sec):
goal scored at by
1514 1-0 43
2219 2-0 43
2334 2-1 43
2799 3-1 43
3726 3-2 43
So let’s add another data series corresponding to goals (value 20 on Y axis) so we could see goals. Then with a focus on time of goals, we have for goal number one:
In this case, it is easy to notice a clear domination of ball possession by club 43 in the moments preceeding the goal.
Good, so this one first pass.
Next, we will have to focus more precisely on ball possession (and not team-related events in time) and on measurement of time possession in order to study its evolution along the game. We will also introduce spatial criterias.
Let’s warm up: goal scan!!!
Pretty simple exercise to play and get familiar with the MCFC Bolton_ManCityF24.xml file.
We’re going to parse the file and simply count goals (Event/Type_id=”16″). We will print each goal event plus the team code (30 or 43) and the min:sec of the event.
First conclusion you get when you look at the F24 file is that it is not deep or heavily nested. Everything is organized around “Event”. Every event is associated with shared set of attributes. Quite simple, quite straightforward.
That being said, for more complex operations, it’ll require a tool like high performance in-memory datastore Redis. We’ll see.
So, snippet of code looks like this:
# Little warmup with the file
# We’ll parse the entire file and look for Event/type_id = ’16’ (goal)
# And we’ll output goal + team + timestamp
from lxml import etree
inFile = “./Downloads/Bolton_ManCityF24.xml”
xmlData = etree.parse(inFile) #etree.parse() opens and parses the data
# proceed to loop on all Events
events = xmlData.findall(“//Event”)
for event in events:
eventType = event.attrib[‘type_id’]
team = event.attrib[‘team_id’]
tick = event.attrib[‘min’] + “:” + event.attrib[‘sec’]
if eventType == ’16’:
print “GOAL!” + ” for team ” + team + ” at ” + tick
And output simply is:
GOAL! for team 43 at 25:14
GOAL! for team 43 at 36:59
GOAL! for team 30 at 38:54
GOAL! for team 43 at 46:39
GOAL! for team 30 at 62:6
So team 43 won 3 to 2. Yeah!
For the record, the code was executed on an Ubuntu 11.04 LST equipped box. Current version of Python is 2.7.3
Setting the stage for realtime analytic tools
Hi all,
Following release by MCFC and Opta of extensive games data, I decided to explore possibilities to push soccer analytics to real-time domain. The goal being to provide real-time tools that will complement live assessments in order to support coaches in their tactical and strategic decisions during a game.
Nothing replaces deep knowledge and vast experience. Those real-time tools won’t ever replace those skills but they will bring a quantitative dimension completing what coaches observe and analyze on the field.
Based on live data, those tools will bring in live intelligence about things coaches obviously see like shots, deep penetration in offensive zone, turnovers, etc. but those events would now be seen in an continuous quantitative perspective.
Those tools would also highlight things that are not seen at first glance, like:
- fluctuation of the rate of ball takeovers in midfield that are converted into deep penetration in opponent’s territory
- fluctuation of time possession, of offensive zone penetration
- fluctuation of time to get the ball out of the defensive zone
- and so on
Those tools would also enable alerting linked to preset thresholds.
So, this is the area I intend to explore hoping to have fun and to bring something that can be useful someday.
Hello world!
Welcome to WordPress.com! This is your very first post. Click the Edit link to modify or delete it, or start a new post. If you like, use this post to tell readers why you started this blog and what you plan to do with it.
Happy blogging!