The tragedy of Mihaela Sandu

FacebookTwitterGoogle+PinterestShare

Just imagine a chess player competing with 6 opponents, each of them higher rated than our player by approx. 150-200 points. The player wins 4 games consecutively, loses in the fifth game, then wins one more. As it turns out, live transmission of the 5th game had been interrupted. Coincidence or not? The performance of a lifetime.or a clear-cut case of cheating?

Cum hoc ergo propter hoc – Latin: With this, therefore because of this.

This year’s European Individual Women’s Chess Championship (EIWCC) in Chakvi, Georgia, was marred by cheating allegations against one of the participants, WGM Mihaela Sandu (2300) of Romania. Half way through the event no less than two petitions were signed by respectively 15 and 32 participants (out of a total of 98), pleading the organisers to implement anti-cheating measures. One of the petitions specifically accused Sandu of foul play.

The petitions do not mention any arguments in support of the petitioners’ claim, nor has any evidence against Sandu been presented to the public through other channels. On the contrary, Grandmasters who have studied Sandu’s games from the EIWCC seem to be unanimous to conclude, that her games show no indication of computer assistance whatsoever.

Accusing a player of cheating is a very serious matter. Mihaela Sandu makes a living teaching chess, participating in the occasional tournament. A cheating accusation like the present can ruin her reputation as teacher and player, regardless of the validity of the claim.

In this blog post I will demonstrate that the mere statistical facts – five wins and one loss against six significantly stronger players, with a transmission failure coinciding with the lost game – cannot stand alone as incriminating proof. A technique known as bayesian inference was applied to estimate the probability of such a scoring/transmission pattern to emerging by chance. The analysis relies on educated guesses with regard to, for instance, the general incidence of cheating through live transmissions. The uncertainty ensued is effectively controlled for through calculation of different scenarios. These scenarios show that the probability of Mihaela Sandu being not guilty remains substantial, regardless of the assumptions made.

Those readers uninterested in the technical details of the statistical analysis can scroll down to find a different kind of analysis: one of Mihaela Sandu’s games from the tournament.

The facts

Let’s take a closer look at what happened. Michaela Sandu started her tournament by beating a 1800-player in the first round. In the second through seventh round, Sandu produced the following series:

Game no. Result Opponent’s rating Expected score1) Live transmission Board number
1. Win 2452 0.30 OK 12
2. Win 2474 0.27 OK 4
3. Win 2479 0.27 OK 1
4. Win 2472 0.27 OK 1
5. Loss 2473 0.27 Fail 1
6. Win 2512 0.23 OK 2

After round 7 (the 6th game in Sandu’s remarkable series) the forementioned petitions were presented. Sandu, understandably disturbed by the stance taken against her, lost all remaining games2). In the rest of this blog post, I will refer to these 6 games by their numbers in the above table, i.e. game 1 through 6.

Coincidence vs. causality

One gets the impression that suspicion against Sandu was based on a perceived causality between two improbable events:

  1. Winning 5 out of 6 games against opposition of approx. +170 rating points on average.
  2. Interruption of the live transmission of a game due to technical failure.

The second event in itself would not have caused any turmoil at all. Everyone who has watched live transmitted games on the Internet has witnessed transmission of a single game or even an entire round to stall. We all accept this type of incident as something that “just happens” from time to time, even though the prevalence of technical breakdown is rare considering the amount of games being broadcasted live these days.

The first event however is the eyecatcher. Beating an opponent much stronger than oneself by rating is, once again, something most chess players have achieved at least once, but doing it 4 times in a row is sensational, obtaining 5 points out of 6 even more so. We may expect such an extraordinary feat from a youngster on the rise, but Sandu is 38 year old and presumably past her peak. Her rating has been quite stable, fluctuating between 2200 and 2300 for many years3). She has never been as high-rated as the players she defeated in those 5 games.

Still, I don’t think Sandu’s performance would have lead to conspiracy theories to flourish, had it not been for the transmission failure during the only game that she lost. In recent times live transmissions have been abused for cheating, for instance by one of the players accessing engine analysis of the live transmission through a hidden mobile device4). It is very important to remember however, that an observed correlation, in this case of a string of wins and losses with the functioning of the live transmission of a player’s games, does not imply causality.

The human brain is prone to perceive plausible. but sometimes non-existing, causal relationships between events. The ability to detect causality is present in humans from a very early age and is generally useful, as it helps us make sense of our constantly changing surroundings and learn about the world5). The tendency to see causal relationships between concurring events is thus generally useful, but also leads to assuming causality when reality is one of coincidence. This logical fallacy is known as questionable cause or by the Latin phrase quoted in the beginning of this post – cum hoc ergo propter hoc. Under particular circumstances the consequences can be disastrous . One has to be especially cautious is in criminal investigations. The rarity of a match combined with the urge to clarify the situation has more than once resulted in conviction of an innocent person. An example of this will be given below.

Mihaela Sandu is singled out as a suspect of cheating by the petitioners at the EIWCC. Cheating is considered a capital crime in chess, similar to doping in physical sports, plagiarism in litterature or data fabrication in scientific research. The evidence against her consists of a perfect correlation between to patterns of the kind, where the perception of causality is unavoidable. It is right here right now, that we should remember, that our perception could be wrong.

The prosecutor’s fallacy

A common mistake in statistical reasoning is to equal the probability of an accused being innocent to the probability of the indicting event to happen by chance. This belief ignores the fact that unlikely events will inevitably occur if only the number of occasions is large enough, known as the law of truly large numbers6). The typically low prevalence of the crime accused of is also ignored (a type of mistake refered to as base rate neglect).

An example: Sudden Infant Death Syndrome or Homicide?

This oversight, also dubbed the prosecutor’s fallacy, is well known from the courtroom resulting in tragical criminal convictions. Wikipedia’s entry  mentions as an example the conviction of mothers, whose babies had died of Sudden Infant Death Syndrome (SIDS, also called cot death or crib death). A leading British paediatrician, Roy Meadow, had successfully argued that due to the rarity of SIDS, it should be considered suspicious if 2 children within the same family died from it, 3 children being a clear case of murder. This rule of the thumb became known as Meadow’s Law and was used in the 1990s in several rulings against unfortunate parents who had experienced multiple infant deaths.

Dr. Meadow served as an expert witness in court, stating that the prevalence of SIDS in non-smoking families was approximately 1 in 8,543. Assuming independence between the single SIDS cases, the probability of 2 cases of SIDS within one family reduce to a stunning 1 in 73 million. Therefore the occurence of 2 or more infant deaths within one family were, according to Meadow, unlikely to be caused by SIDS, the alternative cause being homicide.

Meadow did not comment on the fact that homicide is a very rare event as well, not even to mention two consecutive homicides within one family. Also millions of babies are born around the globe every year, generating plenty of opportunity for a pattern of 2 SIDS instances within one family to arise – just like 2 instances of infant homicide by the way.

For the sake of simplicity, let’s assume that infants die of homicide and SIDS only, leaving accidents and illness aside. Then there are 4 different patterns possible for 2 dead infants within one family:

  1. Two cases of SIDS (“double SIDS”)
  2. First baby dies from SIDS, second baby is a victim of homicide
  3. First baby is a victim of homicide, second baby dies from SIDS
  4. Two cases of homicide (“double homicide”)

What are the odds for a family with 2 or more live-born children to suffer from pattern 1? Dr. Meadow claimed it is 1 in 73 million. What about pattern 4? According to an article by statistician Ray Hill (2004, referenced in footnote 7 below) the prevalence of infant homicide is approximately 1 in 21,700. Using the same reasoning as Dr. Meadow we arrive at odds 1 in 471 million for pattern 4. This means that for each instance of double infant homicide there are no less than 471/73 = 6 cases of double SIDS. Patterns 2 and 3 have the same probability of 1 in 185 million to each. Combining case 2 and 3 into “One SIDS, one homicide” yields odds 1 in 93 million, which is about 5 times as likely as a double homicide.

Obviously there is quite a lot of uncertainty involved in these numbers. But it is not unreasonable to conclude that in the absence of any evidence except 2 dead infants, the odds of at least one of the infants being a victim to homicide is about fifty-fifty. This also means there is a 50% chance7) of the parents being completely innocent. Dr. Meadow’s failure to take the probability of double homicide into account had enormous consequences indeed.

Implications for the Sandu case

In considering the evidence against Mihaela Sandu it is important not to commit to the prosecutor’s fallacy. We have to remember that instances like Sandu’s performance at the EIWCC occur by pure chance out of there being plenty of opportunity for it. Hundreds of tournaments are arranged worldwide on a yearly basis providing opportunity to thousands of players to produce a 6-game series of 4 wins, 1 loss and 1 win against a rating gap of around +170. (Opportunity includes meeting an opponent of the required strength and losing, as a win would typically result in being paired with one more opponent of at least that strength.) Also, a great many games are transmitted live these days, with a small probability of failing transmission connected to each of these games. By this plenty-of-opportunity argument we ought not be all that surprised to see a series of 4 wins, 1 loss and 1 win against players superior by 150-200 rating points to coincide with 4 successful transmissions, 1 failure and 1 successful transmission – to be referred to as the scoring/transmission pattern in the remainder of this blog post.

Particularities in single cases of scoring/transmission patterns like the one produced by Sandu can imply an elevated probability of occurence. Mihaela Sandu played 3 out of the 6 games under investigation at board 1, including the game that she lost. The overall probability of transmission failure may be low, but it is possible that the live equipment for board 1 was defective or something may have been wrong with the wiring of this particular board. Details like this can increase the probability of a rare event by factors. When it comes to malfunctioning of a live board, the odds for failing live transmission may have been 1 in 5 or 10 instead of a more general estimate of, say, 1 in 100.

No less important, the odds of Sandu’s scoring/transmission pattern occuring at random may be low, but so is the proportion of cheating chess players – at least that’s what I would like to believe8). The SIDS-example above clearly shows that also in our case the overall probability of foul play has to be weighed into the equation.

Now that we have seen that unlikely scoring/transmission patterns happen to both honest players and cheaters, the question remains: what are the odds of the player being a honest in the presence of an improbable pattern? Or in courtroom lingo: what is the probability of innocence given the evidence? I will derive this probability working along the same lines as Ray Hill (2004).

What are the odds?

For starters, let’s consider the probability of winning and losing against a player of +170 in rating. In the Elo rating system expected scores are calculated directly from the difference between the players’ individual ratings. According to table 8.1b of the FIDE Rating Regulations effective from 1 July 2014 the expected score of the lower rated player is 0.27. The table above shows Mihaela Sandu’s expected score for each game in the 6-game series. These expected scores do not take into account that cheating might take place9). In other words, the expected scores are conditional on both players being innocent of cheating.

Before proceeding with the actual calculations, it is important to note that the factual odds for failing live transmission in itself are irrelevant. Since the functioning of the live transmission is not affected by the play conducted on the board (the only possible direction of cause and effect goes the other way around: the functioning of the transmission affects the level of play in case of one of the players cheating)  we can take the transmission functioning as a given. The odds for the particular transmission pattern (functioning 4 times, failing once, functioning once) are the same for innocent players and cheaters. Hence, when refering to the probability of the scoring/transmission pattern, I actually refer to the probability of the scoring pattern of 4 wins, 1 loss, 1 win given that transmission worked in the first 4 games, then failed, and worked again in the 6th game.

Odds under the assumption of innocence

The conditional expected score of 0.27 when competing witht a player of 170+ in rating covers wins as well as draws. The Elo rating system does not in itself provide estimates of the probability for a draw, and unlike the expected score, the drawing probability could well be dependent not only on the rating difference between the players, but also on the individual rating levels. It seems reasonable to assume that the drawing probability for a game increases as a function of the lowest rating involved. At the EIWCC 28% of the games were drawn, regardless of the rating level. Mihaela Sandu was ranked no.45 out of 98 players, and therefore I have assumed a drawing probability of 0.30. The probability for a win by the weaker player thus equals 0.27-0.30x½=0.12. With the probabilities for winning, drawing and losing having to sum to 1, the losing probability is 0.58. Note that the losing probability is higher than the drawing probability, which in turn is higher than the winning probability, as one would expect. Once again, these probabilities are conditional on the innocence of the players.

Based on these conditional probability estimates for the outcome of a single game, the overall conditional probability of the scoring pattern of Mihaela Sandu can be calculated. For the sake of simplicity it is assumed that the expected score and the outcome probabilities are equal for all 6 games, but the curious reader can make his or her own calculations based on the expected scores listed in the table.

Another key assumption is independence between the outcomes of the games. This assumption may be incorrect. Winning against a stronger player is likely to give a boost of confidence which in turn could be hypothesised to elevate performance in subsequent games. In the Swiss pairing system this effect is especially prominent at the early stages of a tournament. In game 2 in the series (round 3 in the tournament), Mihaela Sandu possibly had an elevated performance level due to her win over a stronger player in the first game, whereas her opponent in the second game could look back on beating two players she was expected to beat (possibly boosting confidence also, but probably not as much). I will demonstrate below, that the proposed performance elevating mechanism only increases the likelihood that Mihaela Sandu is innocent.

By the independence of the game outcomes the probability estimate of Mihaela Sandu’s scoring pattern under the assumption of her innocence can be calculated as the simple product of the game outcomes, resulting in 0.12×0.12×0.12×0.12×0.58×0.12 = 0.000014 or approximately 1 in 69,000. A very low probability indeed. But let’s not jump to conclusions. The prosecutor’s fallacy teaches us that this probability should not be taken to equal the probability of innocence.

Odds under the assumption of cheating

A similar calculation can and has to be made under the assumption that Sandu was cheating. Cheating by the lower-rated player effectively reduces (or even reverses) the rating gap, dramatically increasing the likelihood for a win. So under the assumption of the lower rated player cheating, a series of 4 wins, 1 loss and 1 win is much more likely to happen. To my knowledge no investigations exists of how much a player’s perfomance level is improved by soliciting engine analyses under circumstances where the player is interested in masking engine involvement. Therefore the expected score has to be based on an educated guess.

It would be wrong to assume an expected score of 1.00, since even the best engines run into positions they don’t “understand”. Also, it would be unwise for the cheating party to pick the engine’s best move every single time, so a cheater would probably employ a more sophisticated strategy, indirectly allowing some space for drawing and losing. Setting the winning probability to 0.90 and the drawing probability to 0.08 (implying a losing probability of 0.02) amounts to an expected score of 0.94. This corresponds to a rating gap in favour of the cheater of 433-456 rating points.

Suppose there had been no failure in the live transmissions. Then by the same reasoning as above the odds for a series of 4 wins, 1 loss, 1 win under the assumption of cheating would then be 0.90×0.90×0.90×0.90×0.02×0.90 = 0.01181, approximately 1 in 85. It is not surprising that the scoring pattern is much more likely to occur in case of cheating than in case of honest play.

But how does the functioning of the live-transmissions mix into these odds? Under the assumption that Mihaela Sandu was innocent there is no reason to expect that a failing live transmission would affect the probabilities for winning, drawing and losing. Under the assumption of cheating however, the situation is different. For a cheater, failing of the live transmission means at best10) a return to the expected score and outcome probabilities based on the rating difference, calculated under the innocence assumption. Indeed there is a causal relationship between the transmission working and the probabilities for winning, drawing and losing given that the player is cheating.

Assuming that the conditional probabilities for winning, drawing and losing given a failing live transmission would correspond to the probabilities for winning, drawing and losing under the assumption of innocence, the odds for the scoring patterng given the transmission pattern become 0.90×0.90×0.90×0.90×0.58×0.90 = 0.34248 for a cheating player, which is more than 1 in 3. (Using a winning probability of 0.95 instead of 0.90 results in odds just below 1 in 2.)

Once again it is important not to get too excited. Analogous to the prosecutor’s fallacy, the probability of the scoring/transmission pattern under the assumption of cheating is not to be confused with the probability of cheating having taken place! The only finding established thus far is that the observed scoring/transmission pattern is more likely to occur when the weaker player is cheating than when the weaker player is not.

Weighing in the base rate

The final element required to arrive at an estimated probability of Mihaela Sandu’s innocence despite the observed scoring/transmission pattern is the prevalence of cheating – in fact the prevalence of cheating by means of the live transmissions (other ways of cheating are possible as well, for instance through messages by an accomplice in the audience or a note with opening variations in the player’s pocket). Once again, I have no knowledge of attempts to quantify this figure in any other way than guessing. In the following it is supposed that 1 in 10,000 players cheat by means of the live transmissions, but other values are considered in the subsection Uncertainties longer down.

Recall that the probability of observing a scoring pattern of 4 wins, 1 loss and 1 win under the innocence assumption was approximately 1 in 69,000. This probability was argued to be unaffected by the functioning of the live transmission, so given a failure of the transmission coinciding with the only loss, the probability is still 1 in 69,000. Having knowledge of a transmission failure in the 5th out of 6 games, the probability of a player being innocent and an observed scoring pattern of 4 wins, 1 loss and 1 win being achieved is then given by (9,999/10,000)x0.000014. Multiplication with the probability of innocence only has an unnoticeable effect, the first fraction being practically equal to 1.

The probability of the player cheating and achieving a scoring pattern of 4 wins, 1 loss and 1 win, given a live transmission failure in the fifth game, is a different story. The cheating prevalence being low causes the probability to plummet to (1/10,000)x1/3 = 0.000034 or approximately 1 in 29,000.

The probability of observing the scoring/transmission pattern regardless of cheating or innocence is the sum of these probabilities: (9,999/10,000)x0.000014+(1/10,000)x0.000034=0.000049 or approximately 1 in 21,000.

Finally, the odds of a player being innocent of cheating, given a perfectly correlating pattern of 4 wins with transmission working, 1 loss with transmission failing and 1 win with transmission working are given by (1/69,000)/(1/29,000+1/69,000)=0.29447 or somewhere between 1 in 4 and 1 in 3.

Uncertainties

In the calculations above, several educated guesses had to be made in the absence of tangible data. The probabilitiy of drawing was a guess (the probabilities of winning and losing were derived from this guess and the expected score), and so was the prevalence of cheating. Also, simplifications were made for clarity’s sake. Assuming an expected score of 0.27 for all 6 games (instead of using the expected scores related to the individual games as listed in the table) is an example. Ignoring the possibility of a boosting effect due to winning is another.

I have experimented a little bit to get a feeling for the impact of chosing other values, changing one parameter at a time. Increasing the prior probability of drawing from 0.30 to 0.35 causes the probability of innocence given the score/transmission pattern to drop from 29.6% to 11.6%. On the other hand, assuming a prior drawing probability of 0.25 corresponds to a probability of innocence given the score/transmission pattern of 52.0%.

Sandu graf 3

Figure 1: The probability of innocence given the evidence – P(I | E) – as a function of the drawing probability under the assumption of innocence

The odds for innocence given the scoring/transmission pattern is quite sensitive to the base rate of cheating. Assuming 1 in 20,000 players cheat (instead of 1 in 10,000) results in a probability of innocence given the observed scoring/transmission pattern of  46% or almost 1 in 2. On the other hand, setting the prevalence of cheating to 1 in 1,000 would result in a probability only 4% or 1 in 25. The probability drops below 1% for cheating rates of 1 in 240 or higher.

Sandu graf 4

Figure 2: The probability of innocence given the evidence – P(I | E) – as a function of the proportion of cheaters in the entire population.

Introducing a boosting effect was also tried. For each consecutive win the probability of winning the next game was increased somewhat. After the loss in game 5 the probabilities for winning and losing in game 6 were set to their initial values. If a boosting effect exists under the assumption of innocence, we should also expect it to exist under the assumption of cheating. However, the winning probability under the assumption of cheating is already very high in the first game, leaving practically no room for improvement. Thus the boosting effect increases the odds of the scoring/transmission pattern for an innocent player, but barely for a cheating player. As a consequence, the probability of innocence given the score/transmission pattern increases quite dramatically when mixing a boosting effect into the model.

Conclusion: Mihaela Sandu is innocent

Despite the unlikely event of 4 wins, 1 loss and 1 win coinciding with 4 well-functioning live transmissions, one failure and one well-functioning, the probability of Mihaela Sandu being innocent of cheating in the presence of this pattern was estimated to be just below 30%. The statistical evidence in itself is not even close to lifting the burden of proof.

Of course it can be discussed if the incidence of cheating is higher than the author of this blog post is inclined to believe. This would lower the probability of Mihaela Sandu’s innocence. On the other hand, there are other factors in favour of a higher probability of innocence given the evidence, for instance the confidence-boosting effect.

Had the probability of innocence in the presence of the scoring/transmission pattern been closer to 0 but not close enough for conviction11), the right decision would have been to acquit Mihaela Sandu – which means that the indicies against the accused are severe, but insufficient for a conviction. In the case of Mihaela Sandu however, the probability of innocence given the evidence is rather large, and therefore Mihaela Sandu should not only be aquitted, but also be freed of all suspicion.

In the absence of any further evidence of foul play we must therefore conclude that Mihaela Sandu’s only crime was an exceptionally good start to a tournament.

Reflections on the statistical approach

Before moving on to one of Mihaela Sandu’s games from the EIWCC I would like to say a few words the similarity between the analysis of the statistical evidence against Mihaela Sandu and the scientific process, which was the topic of my previous post. Accusations of foul play should be treated in much the same way as a scientific hypothesis, the null hypothesis being that the accused in fact is innocent. Just like in the scientific method, where the null hypothesis cannot be rejected unless sufficient proof against the null is presented, the defendant should be considered innocent unless proven otherwise.

In the absence of decisive proof (for instance by direct observation or a confession) it becomes necessary to conduct an analysis of the evidence, be it registeret datamaterial as in the case of Mihaela Sandu (scoring/transmission pattern), witness testimonies or both. The statistical approach used in courtroom applications differs from the approach prevailing in science. In the scientific arena we would proceed by deriving the probability of the data under the assumption that the null hypothesis is correct, then deciding if this probability is low enough to discard the null hypothesis in favour of the alternative hypothesis.

The scientific method accepts a certain level of error in the decision proces, typically allowing a 5% probability of erroneously rejecting the null hypothesis. In the courtroom it is unacceptable to convict an innocent person, a mistake known as an error of the first kind or a type I error. Therefore, instead of calculating the probability of obtaining the observed (or more extreme) data and comparing it to the significance level, the probability of convicting an innocent person is derived from the data instead12).

But why couldn’t one just use the statistical approach typically used in scientific research (also known as frequentist inference), adjusting the significance level to minimise the probability for error of the first kind? Scientific research is typically focussed on population parameters as opposed to particular members of a population. Typical population parameters to be investigated are proportions, correlations, averages and measures of variability, the values of which are estimated on data from the population studied, or from a subsample. The values of the individual data points are of limited interest in hypothesis testing, aknowledging that the probability of observing a particular data vector is as tiny as the probability of any other. The distribution of the population parameters emerging from the collection of all possible data vectors typically shows that an estimate within certain parameter intervals is more likely to occur than others.

Within the scientific framework, we would not be studying the scoring/transmission pattern of one particular player in one particular tournament, but rather the scoring/transmission patterns of a large group of players, possibly at a larger number of tournaments even. The hypothesis could be that cheating takes place through the live transmissions, leading to a prediction that the probability of a loss conditional on the failure of the live transmission was higher than expected if “losing” and “live transmission failing” were independent events. Probabilities for “losing”, “failing” and the events coinciding would be estimated from the proportion of lost games, failing live transmissions and incidences of both from the data. A decision to reject the null hypothesis (of there being no cheating through the live transmissions) or not would then be taken based on the likelihood of those estimates. The result – rejection or corroboration of the null hypothesis – only reflects on the entire population of chess players investigated. It does not carry any information about individual members of the population, nor about any particular sequences like the score/transmission pattern in the Sandu case.

A criminal investigation typically focusses on one particular person and the evidence against this person. The aim is not to make inferences about a property of an entire population. Inferences are to be made with regard to the probability of the null hypothesis conditional on the evidence. A statistical approach suitable for this type of problems is bayesian inference. In this approach, population parameters are only invoked to enable estimation of this probability. Uncertainty about the true values of these population parameters can affect the probability estimate, and therefore the excercise often involves special care to understand the implications of different choices of population parameters.

Time to see some chess!

I have analysed Mihaela Sandu’s game from the 3rd round where she was White against Aleksandra Goryachkina. The game starts with a long theoretical discussion in a main line of the Sveshnikov Sicilian. The middle game primarily evolves around the advancement of White’s b-pawn. In the end White succeeds in achieving the thematical b5-break but only after both players committing inaccuracies. Black however has maintained adequate coordination for approximate equality. It is only after Black declines the white d-pawn that the balance tips in favour of Mihaela Sandu, A final blunder by Black allows White to finish the game in style.

A nice detail related to this game is that Mihaela Sandu’s opponent did not sign either petition. It is good to see a young and promising player like Goryachkina have an independent mind and let the chess do the talking, thus being a good ambassador for the game we all love.

To browse the game on a chess board, just select a move and click!

Footnotes

1)Expected scores are based on conversion table 8.1b of the FIDE Rating Regulations effective from 1 July 2014. Note also comment 12.2 in the regulations: “Tables 8.1a/b are used precisely as shown, no extrapolations are made to establish a third significant figure.”
2)For detailed information on Mihaela Sandu’s individual results at the EIWCC see chess-results.com.
3)For detailed information on Mihaela Sandu’s rating development see her FIDE Chess Profile.
4)In April this year no less than 2 remarkable cases of cheating through live transmission where reported by chess and mainstream media. For starters the Georgian GM Gaioz Nigalidze was disqualified from the Dubai Open for consulting engine analysis on a mobile phone hidden in a lavatory booth at the tournament site. See chess.com for a detailed report. Later in the same month at a tournament in New Delhi the Indian amateur Dhruv Kakkar was caught in a cheating scheme using 2 mobile phones attached to his body, a micro-speaker in his ear and an accomplice at a different location analysing with a chess engine. See chessbase.com for a detailed report..
5)Readers interested in reading more about the psychology of causal perception and causal reasoning could start with this overview
6)The law of truly large numbers is one of the mechanisms governing the improbability principle. The Improbability Principle is also the title of a book by statistics professor David J. Hand. I have not read this book, but the related website is definitely worth a visit, also (perhaps even especially) for those who do not have any deeper understanding of statistics.
7)The correction of other shortcomings in the argumentation of Dr, Meadow implies a probability of innocence as high as 90%. It was demonstrated by Ray Hill in his article that Dr. Meadow’s estimate of 1 in 8,543 was biased. Also it should have been taken into account by Meadow that the odds for a second case of SIDS are considerably higher for families who have already suffered one case of SIDS, due to genetic and environmental factors. The failure to weigh in the homocide prevalence is however the most serious lacune in Meadow’s argument. Hill, R. (2004). “Multiple sudden infant deaths – coincidence or beyond coincidence?” (PDF). Paediatric and Perinatal Epidemiology 18: 322–323. doi:10.1111/j.1365-3016.2004.00560.x
8)The recent cases of proven cheating demonstrate that utilising live transmissions in foul play is not that easy. Dhruv Kakkar’s attempt involved a low cost yet well-functioning technical construction but it takes more than clever engineering to pull of a cheating scam and get away with it. Gaioz Nigalidze did not make things more complicated than absolutely necessary at the cost of having to hide a mobile device at a spot outside his immediate control. Not many of those who are tempted will have the nerve to proceed in this manner. True enough Nigalidze had his trick going for a while before being put on the spot, but signs of something being wrong had been picked up by his opponents nonetheless.
9)Cheating is from all times, also in chess. However, table 8.1b in the FIDE Regulations existed long before cheating through live transmission became a possibility. Therefore the underlying probabilities of the expected scores from the rating tables correspond to proper prior probabilities, uncontaminated by live transmission related cheating.
10)If the player has cheated systematically in the past, the player’s rating is too high compared to actual playing strength. Also, an unexpected malfunctioning of the cheating mechanism is likely to have an unsettling effect on the cheating player’s natural performance level
11)Relying on statistical evidence alone, the highest probability considered low enough for conviction should indeed be very low. This treshold probability can be compared to the significance level In scientific research. Depending on the subject matter, the significance level is typically set to 5%, meaning that the probability of rejecting the null hypothesis by mistake is 5% at most. In courtroom application, the null hypothesis corresponds to the accused being innocent. Rejecting the null hypothesis in this case leads to the verdict that the accused is guilty. It is of foremost importance, that the probability of wrongly convicting an innocent person is kept at a minimum. Allowing an error margin of 5% is clearly unacceptable. The significance level applied should rather be 1% or even lower.
12)For further reading about hypothesis testing in the courtroom i recommend this article on Type I and Type II Errors – Making Mistakes in the Justice System.

8 thoughts on “The tragedy of Mihaela Sandu

  1. That is quite a post! Impressive stuff. Cocerning Meadow’s law, aren’t you drawing a slightly suspicous :) conclusion? If we disregard your footnote, the probability of at least one of the parents being a murderer is as you say about 1/2 and Meadow apparently called the two sudden deaths cause for suspicion, which is in fact in line with the calculated probability. Of course we don’t know which parent is guilty (or both) or in the case of one murder, whether the first or second child was murdered. But I will certainly agree with Meadow’s word suspicious.

    Anyway, I think it would be easier to understand if one does not mention the probability of infant homocide. After all, that is also a conditional probability, namely the probability of homocide being the reason of an infant’s death. Instead, simply presenting the compound probability of infant death would put Meadow’s 1/8500 in immediate perspective (especially as we assume there are only two possible causes of the death). In fact, if one did the same for the case of two infant deaths (instead of simply squaring the one death figure which is of course dubious), one would account for the genetical and environmental issues of the footnote. I’m not saying your reasoning is wrong, just a little unnecessarily complex.

    • Thank you for your feedback. I definitely agree with you on the way my conclusion about the Meadow case turned out. Had the probability of at least one parent being a murderer really been 50% then it would be natural to be suspicious. My conclusion relates to the details in the footnote – which I originally had as an integral part of the text with more detail. Of course I should have either kept the details in the main text, or revised my conclusion on Meadow.

  2. Excellent article, Sandra. Some remarks:

    1) I think the effect of self-confidence could have been put even higher taking into account that she lost all games after the open letters came out. Thus, Sandu’s play is likely to be affected by external factors — perhaps more likely than average. We’re side-stepping to psychology here, so it might be tricky.

    2) The actual difference between, and the different effects on a person’s strength between cheating by using the online transmission and cheating “on your own” is highly unclear. In some cases the cheating becomes completely impossible, sometimes it only makes it slightly more difficult, and in other cases it doesn’t matter. This is mentioned when you speak about the difference of the scientific approach and the courtroom approach. Since it is highly unclear to what extent the absence of live transmission drops the strength of a cheater, in my opinion it’s hardly possible to say anything for here (but I understand that you had to make choice because you were making a point).

    3) In my news report I have refrained from analyzing games to make a point, as I felt that would approve shifting the burden of proof. Your article seems to support my decision!

    4) If anyone should read this piece, it’s the ladies who signed the open letters.

    • Thank you Peter, glad you liked it!

      ad 1)Most of us would be severely disturbed in our performance under a cheating accusation, just like other external factors (our health, bad news from home etc.) would affect our play. But I am not sure what we can conclude from this with regard to a confidence/boosting effect and the size of it. This is tricky territory indeed.

      ad 2)If cheating is taking place and all of a sudden the cheating mechanism is not functioning, I would doubt the cheating player (under the pressure of an ongoing game) would apply a different cheating mechanism right away. It would be more practical for the cheater to accept being on his/her own and play their best under the circumstances – i.e. at their “honest” performance levels. In this case I assumed cheating has a big effect, as this is the assumption putting the innocence hypothesis under maximal pressure. With a lesser effect of cheating on performance level, the probability of innocence given evidence is even higher.

      ad 3)The burden of proof should definitely be on the shoulders of the petitioners, not anywhere else!

      ad 4)I hope they will!

  3. Pingback: Actualité des échecs du 12 juin - Echecs Actualité

  4. Nice try, Sandra.:)

    Being a chess player (an IM) and an academic with 15+ years experience (Economics, Econometrics) allows to me to look at this analysis differently. Unfortunately, this is a back of the envelope calculation and most likely, it is incorrect one.

    1) Econometricians are using notions of confidence intervals. So the goal is to come up with the probability that you can reject the null hypothesis with. Calculating a specific number has very little meaning in the scientific world.
    2) True scientists are not meant to be emotional. For example, the claim “Conclusion: Mihaela Sandu is innocent” is non-scientific. Instead one could say – The claim that Sandu was cheating can be rejected with probability X.
    3) You are not supposed to make assumptions like “It seems reasonable to assume … and therefore I have assumed a drawing probability of 0.30.” Instead, you are supposed to derive a distribution of drawning chances and incorporate it in the analysis.
    4) Finally, the derivation of 0.3 probability makes no sense to me. Your earlier calculations were produced under different hypotheses. One can’t just combine all these numbers and plug them all into one formula.

    I do not want to sound negative. However, it is extremely sad to observe somebody making so strong claims with no real scientific backing.

    • Thank you for your feedback Vladimir. I appreciate your effort to read my post attentively and point out areas of improvement. As soon as time permits I will address the issues as best as I can.

      For those readers who can’t wait for my reply to Vladimir: Vladimir’s comments are currently being discussed on Chess Chat.

  5. Pingback: At spille udenom | Skakkerlakkens strik og skak

Leave a Reply