Hi all,
First time poster, here. I thought I'd share a large-scale statistical exercise I just ran. (Props to BigGreenShrek for giving me the idea.)
Any thoughts on the protocol or on the results would be much appreciated!
Background:
Some endgame books (e.g., de la Villa) recommend focusing on rook endgames, because they show up often in top-level games. This is interesting, but could be of limited interest to amateur players, whose games often play out very differently than GM games. Other endgame books (e.g., Silman) present the Philidor and Lucena position, but label most other rook endgame material as "expert-level content". Unfortunately, Silman doesn't really ofter a justification for that choice, and I'm not sure the endgames he presents first are really most "useful"/frequent.
Questions:
Which endgames do amateur players encounter most frequently? Which endgames should they study?
Data:
Lichess.com publishes all the games played on their server as PGN files. For instance, the September 2017 file includes 12,564,109 games. My computer is still running, because it takes forever to process that amount of games. Currently, I've identified and analyzed nearly 150,000 "relevant" games that include "proper" endgame positions. I think this is close to the point of diminishing returns, since results don't seem to change much as I add new games.
Protocol:
Here are the criteria I used to identify and extract endgame positions (do you have any suggestions to improve this?):
* Classical games only
* The game needs to include at least 40 moves by each player
* The position needs to have stayed on the board for at least 4 half-moves
* Maximum amount of material per player: 13 (Q=10,R=5,B=3,N=3,P=1)
* Maximum of 3 pawns per side
* No player has more than 2 pieces on the board (excluding king and pawns)
* No overwhelming material advantage (max difference: 4)
Results:
In amateur games, Rook endgames are absolutely dominant! If my statistical analysis is correct, amateurs should spend most of their endgame study "budget" looking at rooks, and it's not even close.
67% of games include an endgame position with rook(s)
38% of games include an endgame position with bishop(s) (with or without rook(s))
18% of games include an endgame position with bishop(s) (without rooks)
31% of games include an endgame position with knight(s) (with or without rook(s))
15% of games include an endgame position with knight(s) (without rooks)
Only 37% of all endgame positions in my database do not include a rook.
Here are the first 50 endgame positions, with the % of games in which they are found (p+ means 2 or more pawns).
pieces games_share
rp+ vs. rp+ 14.8
rp vs. rp+ 14.2
p vs. p+ 11.3
p+ vs. p+ 10.9
r vs. rp+ 7.2
r vs. rp 6.2
brp+ vs. rp+ 5.3
bp+ vs. p+ 5.1
p vs. p 4.9
rp vs. rp 4.8
p+ vs. rp+ 4.4
nrp+ vs. rp+ 4.3
np+ vs. p+ 4.0
bp+ vs. rp+ 3.6
p+ vs. qp+ 3.2
np+ vs. rp+ 3.0
p vs. rp+ 3.0
bp+ vs. bp+ 2.8
bp+ vs. np+ 2.7
p+ vs. rp 2.5
r vs. r 2.4
p+ vs. r 2.3
qp+ vs. qp+ 2.3
p vs. qp+ 2.2
bp vs. p+ 2.1
brp+ vs. nrp+ 2.1
bp vs. bp+ 2.0
bp vs. p 2.0
p vs. qp 2.0
brp+ vs. brp+ 1.9
p vs. rp 1.9
brp vs. rp+ 1.8
np vs. p+ 1.8
2rp+ vs. 2rp+ 1.7
bp+ vs. p 1.7
p vs. q 1.7
qp vs. qp+ 1.7
2rp+ vs. rp+ 1.6
np vs. p 1.6
np+ vs. np+ 1.6
p vs. r 1.6
p+ vs. qp 1.6
2rp+ vs. brp+ 1.5
brp+ vs. rp 1.5
nrp vs. rp+ 1.5
bp vs. rp+ 1.4
bp+ vs. rp 1.4
qrp+ vs. rp+ 1.4
np vs. np+ 1.3
p+ vs. qrp+ 1.3