Friday, December 6, 2013

Estimating Number of Chess Positions

The exact number of unique chess positions is known for plies zero through ten from the start of the game, here.

Namely:


 0            1
 1           20
 2          400
 3         5362
 4        72078
 5       822518
 6      9417681
 7     96400068
 8    988187354
 9   9183421888
10  85375278064

And there is a serious upper bound calculated for the total possible number of unique chess positions, here.

Namely:

45193640626062205213735739171550309047984050718

Or approximately:
10^46.655

A plot of the log of the series:
 

This plot is curved downward, which makes sense, since the total number of unique positions is finite, meaning a plot of the number of positions must be asymptotic to approximately n=10^46.655, and the plot of the log10 of that number must be asymptotic to approximately 46.655.

I have hypothesized a simple asymptotic function and least-squares fitted it to the 11 data points using Excel.

I have used the parametrized function log10(n) = A(C-1/(p+B)^D), where p is the ply.  The asymptote is the limit as p-->infinity, =AC, which I have fixed at 46.655 (i.e. C=46.655/A).

Running the fit gives these approximate values (to 10 digits):
A=22215.23274
C=0.002100135549
B=53.67773548
D=1.547873206

Here is the plot with the fit overlayed:
   

And here is a table of original data against the log-fitted data, with a fourth column showing the deviation of the fit value (as a fraction of the original).


 0            1            0.939876583  -0.060123417
 1           20           19.40668546   -0.029665727
 2          400          349.1076095    -0.127230976
 3         5362         5518.540861      0.029194491
 4        72078        77265.05783       0.071964508
 5       822518       965178.7022        0.173443867
 6      9417681     10830020.03          0.149966752
 7     96400068    109839927.3           0.139417529
 8    988187354   1012777542             0.024884136
 9   9183421888   8535374347            -0.07056711
10  85375278064  66077223933            -0.226037965

But of course the interesting thing now is to extrapolate and predict the subsequent values, to compare against exact values, as they become available.

Here is an extrapolation through ply 26:

11  4.72081E+11
12  3.12601E+12
13  1.92630E+13
14  1.10879E+14
15  5.98252E+14
16  3.03569E+15
17  1.45312E+16
18  6.58063E+16
19  2.82698E+17
20  1.15496E+18
21  4.49817E+18
22  1.67377E+19
23  5.96297E+19
24  2.03796E+20
25  6.69427E+20
26  2.11715E+21


And here is the log plot with the extrapolated values included:
 

Of course there are possible problems with such a simple asymptotic model. Should the two initial points have been excluded, for instance, since in the first 2 plies the pieces have not started interacting? How about the model's prediction that the final ply-10 data point is a +23% outlier? It should be interesting to compare against the ply-11 actual data when available...