How Does Soccer Work?

When Moneyball, Michael Lewis’s account of how statisticians and data analysts were transforming baseball, was published in 2003, people across the soccer world realized just how far behind the curve they were, as other professional sports franchises, seeking a competitive edge, went all in on analytics. ESPN staff writer Ryan O’Hanlon’s Net Gains tracks the revolution that followed—and the assortment of coaches, bloggers, and tech nerds who were bent on changing soccer forever. But Net Gains doesn’t merely track the numerous ways the sport has changed in recent years; it’s also an extraordinary account of how soccer actually works—and how much there is still left to learn about the world’s most popular sport. I spoke to O’Hanlon about soccer analytics, the Messi versus Ronaldo debate, and the World Cup.

Soccer has been relatively late to analytics, particularly when compared to baseball. Why do you think that is?

It definitely lagged behind. A bunch of people in the book and a bunch of other people that I spoke to read Moneyball [when it came out] and they were like, “Whoa, you can do this? I could look at soccer this way?” That was a not uncommon response across other sports as well. It definitely lagged behind for structural reasons. It’s really fucking hard to measure soccer. It’s probably impossible to create a “wins above replacement” metric. I think you could maybe create it for your own team—if, that is, you’re committed to playing in a particular style. But that’s useless for the Moneyball idea of finding undervalued players!

That’s one big reason. But there are also all of the cultural reasons. The first baseball and soccer leagues are a little more than a hundred years old at this point. To me, that’s really not that old. The National League was immediately a closed system—a cartel system. They had like eight teams and knew they were getting all the money for themselves. The big question was, How do we make more money? In England, it was always an open system. Anyone could create a team; anyone could go up and down the ladder. The relegation and promotion system, that’s a competitive lever. But it’s more of a symbol. These teams are a community trust. You grow up rooting for them. There might have been a former butcher running the team in the 1920s, whereas the baseball teams were immediately businesses. When you’re a business you’re naturally searching for ways to cut costs and find value. It’s unsurprising that the first closed sports league in the world was the first one to be taken over by data valuations.

In soccer, these teams weren’t created to be these efficient, moneymaking, win-seeking vehicles. That only happened once the Premier League was officially founded in 1992. That’s the biggest reason. The other thing is that soccer’s brain center is in Europe and there’s just less osmosis with ideas than there [was] in American sports. In England, soccer has mainly been a working-class game. The people who run the teams have until recently mostly been guys who barely went to high school because of the academy system. All those things come together to cause this lag.

Scoring happens so rarely in soccer and seems, generally, to be much harder to quantify than in sports like baseball, which pioneered analytics, and basketball, which has its own various efficiency metrics. How have the various people, teams, and institutions you profiled gotten around that? How much is left to learn? Similarly, how do you quantify the success of players who don’t do very much statistically—I’m thinking of someone like former Liverpool midfielder Gini Wijnaldum?

In basketball, there’s a way of sort of quantifying this. Michael Lewis wrote a piece about this called “The No Stats All-Star” about Shane Battier. In basketball, each game is a series of experiments: Different players play together and then you see how they perform; you can’t get objective truth on a player’s impact, but you can see things like “when this player isn’t on the floor, our team is worse.” All the kinds of soft factors that people on TV want to talk about—he’s just a winning player, he makes the right play—these things are accounted for in the team’s performance. You can at least put some concrete evidence on these ideas. That was shown to be true with guys like Shane Battier.

With soccer, it’s basically impossible to do that. You can try! But it’s really hard because you had three subs a game—now five—but coaches never make subs, even though they have them. The same players typically play together all of the time. Is Mohammed Salah playing well because Trent Alexander-Arnold and Andy Robertson are also on the field? It’s hard to say. But there are things you can quantify. You can take the geography of the field and figure out when a possession reaches a certain area on the field that it’s x percent likely to lead to a goal. Then you can figure out if a player wins a tackle in that certain area, then he’s probably increased his team’s chances of scoring a goal and probably taken away a certain value from the other team.

The issue though is that basically shows that nothing is popular between either penalty area: It’s all meaningless in between. The best player and the worst player in those areas are the same. One analyst referred to that zone as the “Valley of Meh.” So it raises an interesting question: This is probably not the fully accurate representation of how soccer works. But it might be a more accurate representation of how soccer works than everyone thinks. We just don’t know for sure. But what you learn is that the stuff near the goals is way more valuable. That’s an obvious thing, but then again, all the major breakthroughs in other sports are insanely obvious too: Go for it on fourth down, take more three-pointers, don’t sacrifice bunt.

It’s not to say that midfield doesn’t matter: Wijnaldum, who you brought up earlier, probably did a lot of things that allowed his teammates to play better on the field. But then how do you determine how valuable he is? Could there have been 90 other players who also could have done that? Perhaps. In a lot of ways, this is why I wanted to write the book: There isn’t actually a good answer for any of this.

Do you think there are more advances coming?

Soccer is complex, sure. But weather is also complex, and predicting weather has gotten a shit ton better in the last hundred years. Some of the characters in the book have studied stuff that is a lot more complex than a soccer game. That’s important to note. It is still incredibly complex compared to other sports, however. In baseball, a guy is just throwing the ball at you. Pitchers are now throwing the ball faster, but the problem is the same: You just have to hit it. In the NBA, there’s a hoop and there’s 24 seconds. In the NFL this year, there has been a shift in the way that teams are defending: Offenses are having a harder time throwing deep.

With soccer, half of what you’re doing is determined by what the other players are doing. Yes, it seems like pressing is a high-value way to play. It fits very nicely into this idea that teams are just too conservative on the whole: That’s just true in every sport. The best ways to play are ways where, when it goes bad, you look like a complete idiot: Like going for it on fourth down, or shooting and missing a ton of threes like the Rockets did [in Game 7 of the 2018 Western Conference Finals], or Aaron Judge striking out four times in a playoff game. You look like an idiot when these things don’t work! It looks bad! But in the long run, it’s a better way to optimize your chances. Part of the reason pressing works so well is because other teams play conservatively. So when everyone starts pressing, will pressing still be the best way to up expected value? Probably not! The issue isn’t just how hard it is to quantify everything. There’s also this game-within-a-game that’s always shifting. That makes soccer immune in some ways to a lot of the new truths we’ve learned about other sports.

There is a funny thing with this too where you just see the old ways constantly returning. This was a World Cup where big strikers and hoofing the ball over defenses mattered. Liverpool and Manchester City just brought in big tall strikers, albeit not really of the lumbering old-school English First Division type, and love hitting it over the top. You’re seeing Klopp and Guardiola mess around with 4–4–2.

How did Liverpool score against Manchester City [when they played in October]? Their keeper punted the ball so hard that he fell down and their center forward won the ball and won a breakaway. City’s keeper is an OK shot steeper, but he’s on the team because he can bomb the ball long.

Look at set pieces. That’s one of the areas where analytics has found that you can create more goals—it’s the one area where you can control and practice your players’ movements. Liverpool scored a ton of set pieces when they won the Premier League and the Champions League. City wasn’t very good at scoring at set pieces, and now they’re amazing at them. You’re seeing two of these savvy continental managers adopt and adapt what are old-school British methods, and they’re working so well that everyone is copying them.

How important has the concept of expected goals been in understanding soccer?

I have a character in my book, Michael Caley, who helped popularize expected goals. He worked at SB Nation for a while and has been blogging and tweeting about xG for a long time. He came from baseball. He was a huge Red Sox fan and was on message boards when [baseball analytics blog] Fire Joe Morgan was at its heyday, just arguing with people about stats. And then, watching the World Cup, he started to wonder if the kinds of things that were being applied to baseball could be applied to soccer. In baseball all the initial work is you have runs: You see that runs are being scored. So then you work back from that. You want to find what creates runs. That’s where all the work in early baseball analytics comes from. Famously, when you’re looking at batting average, you’re not seeing the whole picture because you’re not including walks, a big part of how people actually get on base.

With soccer it’s similar. You start with goals. You look to see if goals are predictive of future goals—and they’re not. Like, not at all. But then you turn to shots. You see that shots are more predictive than goals themselves. You run into some problems. Caley found that there were some Tottenham players that took a ton of shots from outside the box. They skewed against the idea that the teams that are better should take more shots. That team also didn’t give up a lot of shots but the ones they did were often pretty good chances. So you get to, OK, maybe the types of shots matter. From there you get into a way of predicting goals. That’s what you want to know: What team created the better chances to score in the game? Managers have been talking about that for a hundred years. That’s how Caley came about it. And so did a handful of other people at the same time. I think that all these people came to it from a different way, showed that if you had that analytical mindsight, you’re going to approach it in that same way: breaking down goals to their component parts and then eventually figuring it out.

It’s important to say that expected goals isn’t reality. It’s not like you are your expected goals. This does not mean that a team like Leeds will just be the twelfth-best team automatically—it’s just better at predicting quality than anything else there is. It’s almost like if a baseball season was 38 games long—that’s how I view the soccer season. We know that the playoffs are random as hell, and that’s what makes them fun. Let’s take the Champions League final, for example: 24 [from Liverpool] shots to five [from Real Madrid]. Depending on the model, the expected goals were basically 2-to-0.6 in favor of Liverpool. Obviously, Liverpool didn’t win. Real Madrid converted the one good chance they had, and their keeper played out of his mind. That’s why the game ended the way it did. All the other reasons are very tiny things that happened: Luka Modric made one nice pass. That’s the story of the game. But it goes against stories that a lot of people want to read. It doesn’t give you a reason to be angry, it doesn’t tell you that Liverpool are a team in crisis.

I would be remiss if I didn’t ask you about the most important question in soccer: Who is better, Lionel Messi or Cristiano Ronaldo?

It’s not a debate at all. Messi is by far the best soccer player in the modern era. Him and Ronaldo have roughly scored the same number of goals. But Messi is basically the best chance creator of his generation. He’s the best dribbler of his generation. He’s also the best at playing the pass before the assist. And one study I use in the book shows that he’s also the best at occupying space. He’s the best passer, the best dribbler, the best shooter, the best finisher—he actually finishes higher above xG than any player—and the best facilitator. Ronaldo is arguably as good of a goal scorer as Messi. It’s very simple. It’s just not a debate.

What did you think about the World Cup?

I feel like my biggest takeaway—and I’m not sure if this is only because I just wrote a book about this or if it’s because I’ve been working in soccer for longer or because the tournament is in the middle of the season—but I feel like I’m just so keenly aware of how random everything is. I did a piece about all of the big teams that got knocked out: England lost, but England played really well for the entire tournament and then Harry Kane missed a penalty and that’s why they went out. That’s the purest answer for why they went out of the World Cup. Brazil dominates against Croatia, gives up one chance which deflects off Marquinhos’s knee into the goal with a minute left; then they lose in penalties. Otherwise, they were great—that’s why they were out. Germany had the best expected goal differential of any team in the group stages by far, and they were eliminated ... in the group stages. Even the U.S.—they played up to par and probably exceeded expectations. If Cristian Pulisic scores the best chance of the game right away, it’s a completely different game. Instead, the Dutch finished their chances and that’s it. So it’s amplified how stupid the World Cup is—in a good way!

What about Argentina? `

They had a pretty easy draw. Australia, then the Netherlands. They dominated the Netherlands, but they were maybe the eighth- or ninth-best team in the World Cup. Croatia has been to the semifinals in the last two tournaments but also hasn’t won a knockout game this century—the bottom was going to fall out eventually. With all that said, they had an incredible defense. It seemed like that wasn’t true because Saudi Arabia scored two insanely low-probability shots, probably the shots of their lives for the two goal scorers. But they just didn’t give up chances, not high-quality ones, and not a lot of them either. And then you have Messi. Messi is the best soccer player ever, if not the best athlete ever. Even if you’re not an Argentina fan or not a soccer fan and you’re watching him, it’s obvious how good Messi is. This is the first World Cup that he’s been in where if you’ve never watched soccer before you’d be like, “Oh, this is why everyone freaks out about this guy!”