With Bartłomiej Drągowski on his way out at Fiorentina, one area the Viola are likely to upgrade is between the sticks. Pietro Terracciano may be the World’s Funnest Dad, but the consensus is that he’s better suited to a backup role than a starting one. While we’ve heard some names (step forward, Guglielmo Vicario), Daniele Pradè and company are likely doing their homework to figure out who they want to grab. And I figured that, even though they neither want nor need my help, I’d offer it anyways in the form of this article.
As always, there are a few caveats here. The first is that I’ve only got data for 5 leagues; fbref doesn’t have the stats for smaller ones, so there are doubtless a lot of folks I’m missing here. If anyone wants to buy me a full StatsBomb subscription, hit me in the comments (Joe Barone, if you’re reading this, you know where to find me).
The second is that fbref, which uses StatsBomb data, relies on StatsBomb’s xG model. I think it’s a pretty dang good model. I’m also not an expert in this field, or even an experienced amateur, so I’m sure there are criticisms of this (or any) system for quantifying this stuff. If you’re Team Opta, well, I guess it’s pistols at dawn.
The third is that goalkeeping is really hard to quantify, even more so than the outfield positions. For example, think of a goalkeeper claiming a cross. It’s one thing to casually outjump, say, Dries Mertens to reel in a floated ball from the byline. It’s a very different proposition to go up and cleanly take a whipped corner with Andrea Petagna and Kalidou Koulibaly bearing down on you. Statistically, though, those actions are identical; it’s like saying that playing with a corgi puppy is the same as going full Liam Neeson in The Grey.
Data, methods, and other boring stuff
I created a dataset by combining the basic goalkeeping stats, advanced goalkeeping stats, and passing stats for Serie A, the Premier League, the Bundesliga, la Liga, and Ligue 1 on fbref.com. I copied and pasted fbref’s tables into a series of Google Sheets, then loaded them into RStudio and combined them into a Frankenset that I could write my code for.
I ignored any goalkeeper who made fewer than 5 league appearances in an effort to avoid any really weird outliers; I’m looking at you, Rúben Vezo (spent 3 minutes between the posts after Aitor Fernández got sent off in the second matchweek and pretty well nuked all the numbers as a result). Anyways, that left me with 153 eligible dudes, which seemed like both way too many to analyze individually.
My solution was to look at a few key categories that I think represent what Vincenzo Italiano wants out of a goalkeeper. While the traditional duties of shot-stopping and commanding the penalty area are, of course, necessary in any goalkeeper, it seems clear to me that Cousin Vinnie wants a goalkeeper who’s good in possession and comfortable sweeping up behind a high line; after all, it was failures in these duties that saw poor Bart sent to the bench despite being the best 1-v-1 goalkeeper I’ve ever seen for the club.
To best capture these attributes from the statistics, I narrowed my focus down to ten numbers, all of which paired very neatly. That left me with five categories that I could use to create scatter plots. And because plots are fun (Pazzi Conspiracy, anyone?), I went ahead and plotted them to give myself a better idea of where each goalkeeper sits in each category relative to the others.
I know that these are kind of tough to read, so you can mouse over each data point and it should tell you which goalkeeper it represents. If the plots don’t appear, it’s probably because you found this page via an aggregator like Apple News, which strips out a lot of images. Try navigating to the site on your own, or just take my word for it that these plots are magnificent and beautiful and you’re really missing out on them.
As much as we all love Éderson essentially playing as a midfielder, a goalie’s first job is to keep the ball out of the net. The first thing I wanted to look at, then, was how good each goalkeeper was at that deceptively simple task. Post-shot expected goals against per 90 minutes (PSxGA) is a pretty good measure. The short version is that it measures how likely a goalkeeper is to save any given shot. A positive number indicates that a goalkeeper is saving more shots that the model predicts, which means he’s either lucky or good.
I was also interested in seeing which goalkeepers didn’t just save shots well but did it consistently, so I plotted PSxGA against total shots faced per 90 minutes; basically, I prefer a goalkeeper who makes a lot of good saves over one who only needs to make a few because their defense is really good. To sum up: goalkeepers who are higher up save more goals, and goalkeepers who are farther right face more shots. Guys in the top right quadrant, then, are probably the safest bets.
As previously mentioned, this is a slightly tough one to quantify, given the nature of the available data, although I’m sure that Ted Knutson and company are figuring something out. Even so, a goalkeeper who can command the area is really valuable, especially for a team like Fiorentina that plays a very high line and is thus prone to making tactical fouls in dangerous areas.
I did pretty much the same thing as I did for shot stopping here, plotting percentage of crosses claimed and total crosses faced per 90 minutes. The guys who claim more crosses are represented by dots further up, and guys who face more crosses are farther right. Again, being farther up the y-axis is way more important than being farther up the x-axis, but I figured that it’s useful to see who’s consistently good at handling balls lumped into the box too. Also, shed a tear for Bart, who is, by percentage, the the second-best goalkeeper at defending crosses in the big five leagues.
Drągowski’s lack of comfort with a sweeper-keeper role is well-documented; his two red cards are the main reason he was demoted to backup. As Terracciano is much quicker off his line, it seemed pretty clear to me that a goalkeeper who can clean up behind a high defensive line is necessary for Italiano’s system.
I therefore plotted the number of defensive interventions (i.e. tackles, interceptions, and clearances) per 90 minutes along with the average distance of those actions from the goal. As you’ll notice by Bart’s position way out to the right, this isn’t a perfect metric: an intervention can also include a foul, so this model doesn’t penalize recklessness.
Finally, it’s pretty obvious that Italiano needs a goalkeeper who’s comfortable in possession. The two Serie A goalkeepers who averaged the most passes per 90 minutes last year were Drągowski and Terracciano, and that passes the eyeball test. The Viola constantly tried to build up play from the back, using the goalkeeper as an out ball at times to reset when a move fizzled out.
The two things I looked at here were medium-length passing and long passing. I ignored short passing (0-15 yards) because nearly every single one completed at least 97%; it’s such a basic skill that it’s essentially worth ignoring. Medium-length passes (15-30 yards) struck me as a better measure of a player’s technical ability on the ball. I looked at the number of medium-length passes a goalkeeper played per 90 and plotted that against the completion percentage.
I also looked at long passing, as having a goalkeeper who can launch a ball over the top of an aggressive press to find a forward and start a quick counterattack is a very valuable skill for a team like Fiorentina, whose strategy is to take the lead and then force the opposition to pressure them high up. I pretty much repeated the code for medium-length passing but replaced it with long passing, which fbref labels a “launch.” This one can get tricky because it’s a lot easier to go long when you’re aiming at a unit like Dušan Vlahović than it is when you’re aiming at, say, José Callejón.
Meet the candidates
Because my original purpose was to find a new goalkeeper who’d fit Fiorentina’s style of play, I used these data visualizations to establish minimum thresholds and filtered for guys who met each one. Specifically, I looked for players who:
- Possessed a negative PSxGA/90,
- Defended at least 7% of the crosses they faced,
- Performed at least 0.75 defensive actions outside their own penalty areas per 90,
- Completed at least 97% of their medium length passes, and
- Completed at least 38% of their long passes.
Only eight goalkeepers met these criteria: Ederson of Manchester City, Steve Mandanda of Marseille, Alexandre Oukidja of Metz, and Sven Ulreich of Bayern München, Juan Musso of Atalanta, Giorgi Mamardashvili of Valencia, and David Soria of Getafe.
Of these players, only the latter three look like viable targets for Fiorentina. Ederson isn’t leaving Man City for an inferior club. Ditto for Ulrich at Bayern München. Mandanda and Oukidja are over 33, which makes them poor options for investment. Even Musso may view a move away from Atalanta as a step down, so it’s down to Mamardashvili or Soria.
If you’ve been reading closely, though, you’ll notice that I only listed seven players out of the eight who met the criteria. That’s because the final goalkeeper is none other than Pietro Terracciano. If you need let out a pantomime gasp, now is the moment.
I won’t lie to you: I was pretty ticked off that the Fun Dad is, according to my model, the best option for Fiorentina. Once I’d emerged from my abject frustration, though, I at least took a few useful conclusions from this exercise that I think are pretty useful when evaluating players in general and goalkeepers in particular.
The first is that the robots haven’t won yet. I’ve made no secret of my love for statistics, fbref, and data in general over the past several years here. To repeat what I said at the top of this piece, though, there’s no substitute for watching a player over several games to determine how good they actually are. Soccer statistics are, in a lot of ways, tougher to analyze than, say, baseball or basketball because the game has so many more moving pieces and defies anyone who wants to break it into discrete sequences.
I think we’ll get better at analyzing this stuff—there’s a reason that a StatsBomb subscription will run you thousands of dollars, and it’s that the numbers they provide have a massive real-world value—but looking at data will not and probably never will be the best way to identify talent. What it can do is create a model that flags players who are worth watching, allowing recruitment teams to cast a much wider net than they otherwise would.
Second, my model is probably a long way from perfect. Part of that is sample size, of course: with just 153 guys to look at, I wasn’t expecting too many happy returns. Still, it seems clear that I need to look more carefully at what I think Italiano wants from his goalkeeper. My first instinct is to ignore the crossing numbers. I’m also willing to overlook some of the xG against numbers, especially since they take penalties into account: if a goalkeeper is playing behind a defense that concedes a lot of PKs, he’s going to look a lot worse as a result.
Third, I think I may have been too hard on Terracciano in my evaluation of him this year. While he’s nowhere near the shot stopper Bart is, he’s competent enough, and his occasional, hm, adventurous pass probably doesn’t outweigh the number of good ones he makes.
As a result, I’m not as fussed about finding Fiorentina a new goalkeeper this summer. If one comes available at a good price, Pradè and company should pounce. However, I don’t think upgrading the World’s Funnest Dad needs to be the main priority (fullbacks, strikers, and registe, step this way, please). The Viola can win with the guy they’ve got between the sticks right now.