Welcome to the Ask Steven Archive, a blog intended to complement the Ask Steven column on ESPN Cricinfo by collating the questions which have been asked and answered (by Steven and others) on Ask Steven's Facebook page. Please note that this blog is entirely unofficial and is not endorsed by Steven Lynch or ESPN Cricinfo.

Many thanks to all those who regularly answer questions on the Facebook page, in particular Charles Davis, Muhammad Asim, Aslam Siddiqui, Sreeram, Martin Briggs, Mike Leach, Pete Church, Manish Yadav, Arnold D'Souza, Hemant Brar, Sujoy Ghosh and of course Steven himself.
Using information from this blog

The answers and statistical tables posted on the Ask Steven page, and collected on this blog, are supplied by cricket enthusiasts who give freely of their own knowledge and expertise to help satisfy the queries of others. They do not generally mind anyone else using this information for their own purposes, but are likely to object strongly if this is done without crediting the original author, thus potentially giving the misleading impression that the research was done by someone who in fact only copied it. The name(s) of the person or people who gave each answer are noted at the end of it; if you wish to reproduce the answer, whether in full or in part, online or in print, quoted exactly or rephrased, please ensure that you cite this blog (Ask Steven Archive) and include the name(s) of the author(s). Failure to identify the author(s) of any work used constitutes plagiarism.

Saturday 17 January 2015

Spanning Test history

Although the question of how many players it takes to ensure that every Test match in history featured at least one of them ostensibly concerns cricket, it soon became clear that finding the answer, and proving that that was indeed the minimum number required, would require rather a lot of mathematics. This was how we eventually reached a conclusion...

The simplest case, considering only the 43 Tests played in 2010, can be solved by trial and error. England and India played 14 Tests each during the year and never played each other, so if each of them had at least one ever-present player, we're most of the way there already. Fortunately they both did: England had Cook, Pietersen, Prior, Swann and Trott, and India had Sehwag and Tendulkar.  Of the remaining 15 matches, all featured at least one of Pakistan, West Indies or New Zealand, so if each of those had an ever-present player (in the Tests not already covered - not necessarily all those which their team played over the year) then we have a set of five spanning all 43 Tests. Pakistan had one (Umar Gul), West Indies four (Bravo, Chanderpaul, Gayle, Nash) and New Zealand eight (Guptill, B McCullum, McIntosh, Martin, Southee, Taylor, Vettori, Watling), so any set containing one player from each of those lists (of which there are 320 possibilities) is a set of five players, at least one of whom played in every Test in 2010.

To prove that five players are necessary, consider a spanning set of teams rather than of players - since no-one played Tests for more than one team in 2010, if it requires at least five teams to span every Test during the year then it certainly requires at least five players. There were four series during the year which had no teams in common (Australia vs Pakistan, South Africa vs England, Bangladesh vs India and Sri Lanka vs West Indies), so if any set of four teams spans all Tests in 2010, it must consist of one team from each of these pairs and no others. Since New Zealand aren't included, Bangladesh must be since the two played each other during the year; therefore the set can't also include India. It can't have both Australia and South Africa because that would miss the England vs Pakistan series, nor can it contain Pakistan and South Africa because that would miss the Ashes. Therefore it can't contain South Africa at all - but if it doesn't contain either India or South Africa then it would miss the two series they played against each other. This is a contradiction, from which we conclude that the conjectured set of four teams does not exist - so five is the minimum numbers of teams required such that at least one of them was involved in every Test in 2010, and hence also the minimum number of players.

Clearly such an approach is not feasible when considering a larger set; some algorithm must be found which can produce a list of players from a set of any number of matches without trial and error. One candidate for such an algorithm is:

1. Let S be the set of all Test matches under consideration.
2. Let X be the player who has played most matches in S (if two or more players have played an equal highest number, pick any of them).
3. Add X to the list.
4. Remove from S all the Test matches played by X.
5. If S is empty, output the list and stop. If S is not empty, go to step 2.

In non-mathematical terms: start by picking the player with most Tests (for the period since 1st January 2000 only, Ricky Ponting; for the entirety of Test history, Sachin Tendulkar), then at each subsequent stage pick the player who has played most Tests which did not also feature any of the players already on the list. Stop when you run out of matches, ie when the list of players you've already picked span all the Tests in the set. After causing Statsguru to crash numerous times, this algorithm produced a list of 54 players to span all of Test history. The exact lists depend on the choices made at the points where two or more remaining players had an equally high number of matches; for example, a few steps from the end one finds that Michael Clarke is the only player to feature in the recent Australia vs India series who is already in the set, meaning that one player who appeared in the three matches which Clarke missed is required to complete the set. 14 players featured in all three matches, so any of these can be chosen in order to include those three matches in the set. 51 players are enough to span all but three Tests, but unfortunately the remaining three matches are spread across Test history (South Africa vs England in 1892, New Zealand vs South Africa in 1932 and India vs Pakistan in 2007) that three further players are required to include them; clearly any of the 22 players can be chosen for each match. Ishant Sharma played in the 2007 match, so until the end of 2014 only 53 players would have been required; his dropping from the final Test of the series necessitated a 54th.

Here is one example of a set of 54 players who between them have appeared in all Tests played to date, listed in the order the algorithm adds them to the set:

SR Tendulkar, SR Waugh, JH Kallis, SM Gavaskar, DPMD Jayawardene, MC Cowdrey, S Chanderpaul, IVA Richards, WR Hammond, Javed Miandad, GS Sobers, IR Bell, SE Gregory, RW Marsh, SP Fleming, TG Evans, FE Woolley, JG Wright, PR Umrigar, Inzamam-ul-Haq, JHB Waite, BE Congdon, JM Blackham, WM Lawry, AJ Stewart, BB McCullum, A Ranatunga, MJ Clarke, RR Lindwall, JH Sinclair, Mohammad Ashraful, DI Gower, WAS Oldfield, Intikhab Alam, Azhar Ali, W Rhodes, KF Barrington, A Flower, CL Walcott, J Briggs, H Masakadza, Hanif Mohammad, AI Kallicharran, BC Lara, CK Nayudu, AD Nourse, V Kohli, RG Nadkarni, HW Taylor, SK Warne, MA Butcher, WL Murdoch, B Mitchell, R Dravid

The above algorithm is guaranteed to find a set of players who between them have played in every Test in the set in question, but not necessarily the smallest set. 

A proof that the algorithm runs in polynomial (more specifically, quadratic) time relative to the number of Tests in the set - returning to the initial algorithm:

1. Let S be the set of all Test matches played to date.
2. Let X be the player who has played most matches in S (if two or more players have played an equal highest number, pick any of them).
3. Add X to the list.
4. Remove from S all the Test matches played by X.
5. If S is empty, output the list and stop. If S is not empty, go to step 2.

The key factors here are how long steps 2-4 take, and how many times you have to do them. Let's break it down a bit:

2. Let P be the set of all Test players, and denote a player in P by Pi. Let n(Pi) = 0 for every i.
a) For each Test match Sj in S, add 1 to n(Pi) iff Pi played in Sj.
b) Let X be the player Pi for whom n(Pi) is greatest.

Step 2.a) involves |S| operations (one check for each match in S) and thus is carried out in |S| time. Step 2. b) involves |P| operations (checking the value of n(Pi) for each Pi in P); the number of players is at most 22 times the number of matches, so |P|=<22|S| and step 2.b) is therefore also carried out in |S| time. So the whole of step 2 is also carried out in |S| time. Step 3 only involves one operation, while step 4 involves at most |S| (since the number of matches being removed from S is at most |S|). Therefore each iteration of steps 2-4 is carried out in |S| time.

Since each iteration of steps 2-4 adds one player to the list, and at most |S| players can be added to it (obviously |S| players are sufficient to span |S| matches), the maximum number of iterations required is |S|. Therefore the algorithm is guaranteed to give the optimum solution in at most |S|^2 time (that is, quadratic time with respect to |S|)


Michael Jones, Manish Yadav, Arnold D'Souza

No comments:

Post a Comment