Cum hoc ergo propter hoc –

Latin: With this, therefore because of this.

This year’s **European Individual Women’s Chess Championship (EIWCC)** in Chakvi, Georgia, was marred by **cheating allegations** against one of the participants, **WGM Mihaela Sandu (2300)** of Romania. Half way through the event no less than **two petitions** were signed by respectively 15 and 32 participants (out of a total of 98), pleading the organisers to implement **anti-cheating measures**. One of the petitions specifically accused Sandu of foul play.

The petitions do not mention any **arguments** in support of the petitioners’ claim, nor has any **evidence **against Sandu been presented to the public through other channels. On the contrary, Grandmasters who have studied Sandu’s games from the EIWCC seem to be unanimous to conclude, that her games show **no indication** of computer assistance whatsoever.

Accusing a player of cheating is a very **serious matter**. Mihaela Sandu makes a living teaching chess, participating in the occasional tournament. A cheating accusation like the present can **ruin her reputation** as teacher and player, regardless of the **validity** of the claim.

In this blog post I will demonstrate that **the mere statistical facts** – five wins and one loss against six significantly stronger players, with a transmission failure coinciding with the lost game – **cannot stand alone as incriminating proof**. A technique known as **bayesian inference** was applied to estimate the probability of such a scoring/transmission pattern to emerging by chance. The analysis relies on **educated guesses** with regard to, for instance, the general incidence of cheating through live transmissions. The uncertainty ensued is effectively controlled for through calculation of different **scenarios**. These scenarios show that **the probability of Mihaela Sandu being not guilty remains substantial**, regardless of the assumptions made.

Those readers uninterested in the technical details of the statistical analysis can scroll down to find a different kind of analysis: **one of Mihaela Sandu’s games from the tournament**.

Let’s take a closer look at what happened. Michaela Sandu started her tournament by beating a 1800-player in the first round. In the second through seventh round, Sandu produced the following series:

Game no. | Result | Opponent’s rating | Expected score^{1)} |
Live transmission | Board number |
---|---|---|---|---|---|

1. | Win | 2452 | 0.30 | OK | 12 |

2. | Win | 2474 | 0.27 | OK | 4 |

3. | Win | 2479 | 0.27 | OK | 1 |

4. | Win | 2472 | 0.27 | OK | 1 |

5. | Loss | 2473 | 0.27 | Fail | 1 |

6. | Win | 2512 | 0.23 | OK | 2 |

After round 7 (the 6th game in Sandu’s remarkable series) the forementioned petitions were presented. Sandu, understandably **disturbed** by the stance taken against her, lost all remaining games^{2)}. In the rest of this blog post, I will refer to these 6 games by their numbers in the above table, i.e. game 1 through 6.

One gets the impression that **suspicion** against Sandu was based on a **perceived causality** between two **improbable events**:

- Winning 5 out of 6 games against opposition of approx. +170 rating points on average.
- Interruption of the live transmission of a game due to technical failure.

The second event in itself would not have caused any turmoil at all. Everyone who has watched live transmitted games on the Internet has witnessed transmission of a single game or even an entire round to stall. We all accept this type of incident as something that “just happens” from time to time, even though the prevalence of technical breakdown is rare considering the amount of games being broadcasted live these days.

The first event however is the **eyecatcher**. Beating an opponent much stronger than oneself by rating is, once again, something most chess players have achieved at least once, but doing it 4 times in a row is **sensational**, obtaining 5 points out of 6 even more so. We may expect such an extraordinary feat from a **youngster on the rise**, but Sandu is 38 year old and presumably **past her peak**. Her rating has been quite stable, fluctuating between 2200 and 2300 for many years^{3)}. She has never been as high-rated as the players she defeated in those 5 games.

Still, I don’t think Sandu’s performance would have lead to conspiracy theories to flourish, had it not been for the transmission failure during the only game that she lost. In recent times **live transmissions have been abused for cheating**, for instance by one of the players accessing engine analysis of the live transmission through a hidden mobile device^{4)}. It is very important to remember however, that **an observed correlation**, in this case of a string of wins and losses with the functioning of the live transmission of a player’s games, **does not imply causality**.

The human brain is prone to perceive **plausible. but sometimes non-existing, causal relationships between events.** The ability to detect causality is present in humans from a very early age and is generally useful, as it helps us make sense of our constantly changing surroundings and learn about the world^{5)}. The tendency to see causal relationships between concurring events is thus generally useful, but also leads to assuming causality when reality is one of coincidence. This logical fallacy is known as **questionable cause** or by the Latin phrase quoted in the beginning of this post – *cum hoc ergo propter hoc*. Under particular circumstances the consequences can be disastrous . One has to be especially cautious is in **criminal investigations**. The rarity of a match combined with the urge to clarify the situation has more than once resulted in conviction of an innocent person. An example of this will be given below.

Mihaela Sandu is singled out as a **suspect** of cheating by the petitioners at the EIWCC. Cheating is considered a **capital crime** in chess, similar to **doping** in physical sports, **plagiarism** in litterature or **data fabrication** in scientific research. The **evidence** against her consists of a** perfect correlation between to patterns** of the kind, where the **perception of causality** is unavoidable. It is right here right now, that we should remember, that **our perception could be wrong**.

A common mistake in statistical reasoning is to equal **the probability of an accused being innocent** to **the probability of the indicting event to happen by chance**. This belief ignores the fact that unlikely events will inevitably occur if only the number of occasions is large enough, known as the *law of truly large numbers*^{6)}. The typically low prevalence of the crime accused of is also ignored (a type of mistake refered to as *base rate neglect*).

This oversight, also dubbed *the prosecutor’s fallacy*, is well known from the courtroom resulting in tragical criminal convictions. Wikipedia’s entry mentions as an example the conviction of mothers, whose babies had died of **Sudden Infant Death Syndrome** (**SIDS**, also called *cot death* or *crib death*). A leading British paediatrician, **Roy Meadow**, had successfully argued that due to the rarity of SIDS, it should be considered **suspicious** if 2 children within the same family died from it, 3 children being a clear case of **murder**. This rule of the thumb became known as Meadow’s Law and was used in the 1990s in several rulings against unfortunate parents who had experienced multiple infant deaths.

Dr. Meadow served as an **expert witness** in court, stating that the prevalence of SIDS in non-smoking families was approximately 1 in 8,543. Assuming **independence** between the single SIDS cases, the probability of 2 cases of SIDS within one family reduce to a stunning 1 in 73 million. Therefore the occurence of 2 or more infant deaths within one family were, according to Meadow, unlikely to be caused by SIDS, the alternative cause being **homicide**.

Meadow did not comment on the fact that** homicide is a very rare event as well**, not even to mention two consecutive homicides within one family. Also millions of babies are born around the globe every year, generating **plenty of opportunity** for a pattern of 2 SIDS instances within one family to arise – just like 2 instances of infant homicide by the way.

For the sake of **simplicity**, let’s assume that infants die of homicide and SIDS only, leaving accidents and illness aside. Then there are 4 different patterns possible for 2 dead infants within one family:

- Two cases of SIDS (“double SIDS”)
- First baby dies from SIDS, second baby is a victim of homicide
- First baby is a victim of homicide, second baby dies from SIDS
- Two cases of homicide (“double homicide”)

What are the odds for a family with 2 or more live-born children to suffer from pattern 1? Dr. Meadow claimed it is 1 in 73 million. What about pattern 4? According to an article by statistician **Ray Hill **(2004, referenced in footnote 7 below) the prevalence of infant homicide is approximately 1 in 21,700. Using the same reasoning as Dr. Meadow we arrive at odds 1 in 471 million for pattern 4. This means that for each instance of double infant homicide there are no less than 471/73 = 6 cases of double SIDS. Patterns 2 and 3 have the same probability of 1 in 185 million to each. Combining case 2 and 3 into “One SIDS, one homicide” yields odds 1 in 93 million, which is about 5 times as likely as a double homicide.

Obviously there is quite a lot of **uncertainty** involved in these numbers. But it is not unreasonable to conclude that **in the absence of any evidence** except 2 dead infants, the odds of at least one of the infants being a victim to homicide is about fifty-fifty. This also means there is a 50% chance^{7)} of the parents being **completely innocent**. Dr. Meadow’s failure to take the probability of double homicide into account had enormous consequences indeed.

In considering the evidence against Mihaela Sandu it is important not to commit to the prosecutor’s fallacy. We have to remember that instances like Sandu’s performance at the EIWCC occur by **pure chance** out of there being **plenty of opportunity** for it. Hundreds of tournaments are arranged worldwide on a yearly basis providing opportunity to thousands of players to produce a 6-game series of 4 wins, 1 loss and 1 win against a rating gap of around +170. (Opportunity includes meeting an opponent of the required strength and losing, as a win would typically result in being paired with one more opponent of at least that strength.) Also, a great many games are transmitted live these days, with a small probability of failing transmission connected to each of these games. By this plenty-of-opportunity argument **we ought not be all that surprised** to see a series of 4 wins, 1 loss and 1 win against players superior by 150-200 rating points to coincide with 4 successful transmissions, 1 failure and 1 successful transmission – to be referred to as the **scoring/transmission pattern** in the remainder of this blog post.

Particularities in single cases of scoring/transmission patterns like the one produced by Sandu can imply an **elevated probability of occurence**. Mihaela Sandu played 3 out of the 6 games under investigation at board 1, including the game that she lost. The overall probability of transmission failure may be low, but it is possible that the live equipment for board 1 was **defective** or something may have been wrong with the **wiring** of this particular board. Details like this can increase the probability of a rare event by factors. When it comes to malfunctioning of a live board, the odds for failing live transmission may have been 1 in 5 or 10 instead of a more general estimate of, say, 1 in 100.

No less important, the odds of Sandu’s scoring/transmission pattern occuring at random may be low, **but so is the proportion of cheating chess players** – at least that’s what I would like to believe^{8)}. The SIDS-example above clearly shows that also in our case the overall probability of foul play has to be weighed into the equation.

Now that we have seen that unlikely scoring/transmission patterns happen to both honest players and cheaters, the question remains:** what are the odds of the player being a honest in the presence of an improbable pattern? **Or in courtroom lingo: **what is the probability of innocence given the evidence?** I will derive this probability working along the same lines as Ray Hill (2004).

For starters, let’s consider the probability of winning and losing against a player of +170 in rating. In the Elo rating system **expected scores** are calculated directly from the **difference** between the players’ individual ratings. According to table 8.1b of the FIDE Rating Regulations effective from 1 July 2014 the expected score of the lower rated player is 0.27. The table above shows Mihaela Sandu’s expected score for each game in the 6-game series. These expected scores do not take into account that cheating might take place^{9)}. In other words, the expected scores are **conditional** on both players being **innocent of cheating**.

Before proceeding with the actual calculations, it is important to note that the factual odds for failing live transmission in itself are **irrelevant**. Since the functioning of the live transmission is not affected by the play conducted on the board (the only possible direction of cause and effect goes the other way around: the functioning of the transmission affects the level of play in case of one of the players cheating) we can take the transmission functioning as a given. The odds for the particular transmission pattern (functioning 4 times, failing once, functioning once) are the same for innocent players and cheaters. Hence, when refering to the probability of the scoring/transmission pattern, I actually refer to the probability of the scoring pattern of 4 wins, 1 loss, 1 win *given* that transmission worked in the first 4 games, then failed, and worked again in the 6th game.

The conditional expected score of 0.27 when competing witht a player of 170+ in rating covers wins as well as draws. The Elo rating system does not in itself provide estimates of **the probability for a draw**, and unlike the expected score, the drawing probability could well be dependent not only on the **rating difference** between the players, but also on the individual **rating levels**. It seems reasonable to assume that the drawing probability for a game increases as a function of the lowest rating involved. At the EIWCC **28% of the games were drawn**, regardless of the rating level. Mihaela Sandu was ranked no.45 out of 98 players, and therefore I have assumed a drawing probability of 0.30. The probability for a win by the weaker player thus equals 0.27-0.30x½=0.12. With the probabilities for winning, drawing and losing having to sum to 1, the losing probability is 0.58. Note that the losing probability is higher than the drawing probability, which in turn is higher than the winning probability, as one would expect. Once again, these probabilities are conditional on the innocence of the players.

Based on these conditional probability estimates for the outcome of a single game, the overall conditional probability of the scoring pattern of Mihaela Sandu can be calculated. For the sake of **simplicity** it is assumed that the **expected score** and the** outcome probabilities** are **equal for all 6 games**, but the curious reader can make his or her own calculations based on the expected scores listed in the table.

Another key assumption is **independence** between the outcomes of the games. This assumption may be incorrect. Winning against a stronger player is likely to give a** boost of confidence** which in turn could be hypothesised to **elevate performance** in subsequent games. In the **Swiss pairing system** this effect is especially prominent at the early stages of a tournament. In game 2 in the series (round 3 in the tournament), Mihaela Sandu possibly had an elevated performance level due to her win over a stronger player in the first game, whereas her opponent in the second game could look back on beating two players she was expected to beat (possibly boosting confidence also, but probably not as much). I will demonstrate below, that **the proposed performance elevating mechanism only increases the likelihood that Mihaela Sandu is innocent**.

By the independence of the game outcomes the probability estimate of Mihaela Sandu’s scoring pattern under the assumption of her innocence can be calculated as the simple product of the game outcomes, resulting in 0.12×0.12×0.12×0.12×0.58×0.12 = 0.000014 or approximately **1 in 69,000**. A very low probability indeed. But let’s not jump to conclusions. The prosecutor’s fallacy teaches us that *this probability should not be taken to equal the probability of innocence*.

A similar calculation can and has to be made under the assumption that Sandu *was* cheating. Cheating by the lower-rated player effectively **reduces (or even reverses) the rating gap**, dramatically increasing the likelihood for a win. So under the assumption of the lower rated player cheating, a series of 4 wins, 1 loss and 1 win is much more likely to happen. To my knowledge no investigations exists of how much a player’s perfomance level is improved by soliciting engine analyses under circumstances where the player is interested in **masking engine involvement**. Therefore the expected score has to be based on an **educated guess**.

It would be wrong to assume an expected score of 1.00, since even the best engines run into positions they don’t “understand”. Also, it would be unwise for the cheating party to pick the engine’s best move every single time, so a cheater would probably employ a more sophisticated strategy, indirectly **allowing some space for drawing and losing**. Setting the winning probability to 0.90 and the drawing probability to 0.08 (implying a losing probability of 0.02) amounts to an expected score of 0.94. This corresponds to a rating gap in favour of the cheater of 433-456 rating points.

Suppose there had been no failure in the live transmissions. Then by the same reasoning as above the odds for a series of 4 wins, 1 loss, 1 win under the assumption of cheating would then be 0.90×0.90×0.90×0.90×0.02×0.90 = 0.01181, approximately **1 in 85**. It is not surprising that the scoring pattern is much more likely to occur in case of cheating than in case of honest play.

But how does the functioning of the live-transmissions mix into these odds? Under the assumption that Mihaela Sandu was innocent there is no reason to expect that a failing live transmission would affect the probabilities for winning, drawing and losing. Under the assumption of cheating however, the situation is different. For a cheater, failing of the live transmission means at best^{10)} a return to the expected score and outcome probabilities based on the rating difference, calculated under the innocence assumption. Indeed there is a causal relationship between the transmission working and the probabilities for winning, drawing and losing given that the player is cheating.

Assuming that the **conditional probabilities **for winning, drawing and losing given a failing live transmission would correspond to the probabilities for winning, drawing and losing under the assumption of innocence, the odds for the scoring patterng given the transmission pattern become 0.90×0.90×0.90×0.90×0.58×0.90 = 0.34248 for a cheating player, which is more than **1 in 3**. (Using a winning probability of 0.95 instead of 0.90 results in odds just below 1 in 2.)

Once again it is important not to get too excited. Analogous to the prosecutor’s fallacy, the probability of the scoring/transmission pattern under the assumption of cheating *is not to be confused with the probability of cheating having taken place*! The only finding established thus far is that **the observed scoring/transmission pattern is more likely to occur when the weaker player is cheating than when the weaker player is not**.

The final element required to arrive at an estimated probability of Mihaela Sandu’s innocence *despite the observed scoring/transmission pattern* is the **prevalence of cheating** – in fact the prevalence of cheating by means of the live transmissions (other ways of cheating are possible as well, for instance through messages by an accomplice in the audience or a note with opening variations in the player’s pocket). Once again, I have no knowledge of attempts to quantify this figure in any other way than guessing. In the following it is supposed that **1 in 10,000 players cheat** by means of the live transmissions, but other values are considered in the subsection *Uncertainties *longer down.

Recall that the probability of observing a scoring pattern of 4 wins, 1 loss and 1 win under the innocence assumption was approximately 1 in 69,000. This probability was argued to be unaffected by the functioning of the live transmission, so given a failure of the transmission coinciding with the only loss, the probability is still 1 in 69,000. Having knowledge of a transmission failure in the 5th out of 6 games, the probability of a player being innocent *and *an observed scoring pattern of 4 wins, 1 loss and 1 win being achieved is then given by (9,999/10,000)x0.000014. Multiplication with the probability of innocence only has an unnoticeable effect, the first fraction being practically equal to 1.

The probability of the player cheating *and *achieving a scoring pattern of 4 wins, 1 loss and 1 win, given a live transmission failure in the fifth game, is a different story. The cheating prevalence being low causes the probability to plummet to (1/10,000)x1/3 = 0.000034 or approximately **1 in 29,000**.

The probability of observing the scoring/transmission pattern **regardless of cheating or innocence** is the sum of these probabilities: (9,999/10,000)x0.000014+(1/10,000)x0.000034=0.000049 or approximately **1 in 21,000**.

Finally, the odds of a player being innocent of cheating, given a perfectly correlating pattern of 4 wins with transmission working, 1 loss with transmission failing and 1 win with transmission working are given by (1/69,000)/(1/29,000+1/69,000)=0.29447 or somewhere **between 1 in 4 and 1 in 3**.

In the calculations above, several **educated guesses** had to be made in the **absence of tangible data**. The probabilitiy of drawing was a guess (the probabilities of winning and losing were derived from this guess and the expected score), and so was the prevalence of cheating. Also, **simplifications** were made for **clarity’s sake**. Assuming an expected score of 0.27 for all 6 games (instead of using the expected scores related to the individual games as listed in the table) is an example. Ignoring the possibility of a boosting effect due to winning is another.

I have experimented a little bit to get a feeling for the **impact of chosing other values**, changing one parameter at a time. Increasing the prior probability of drawing from 0.30 to 0.35 causes the probability of innocence given the score/transmission pattern to drop from 29.6% to 11.6%. On the other hand, assuming a prior drawing probability of 0.25 corresponds to a probability of innocence given the score/transmission pattern of 52.0%.

The odds for innocence given the scoring/transmission pattern is quite **sensitive to the base rate of cheating**. Assuming 1 in 20,000 players cheat (instead of 1 in 10,000) results in a probability of innocence given the observed scoring/transmission pattern of 46% or almost 1 in 2. On the other hand, setting the prevalence of cheating to 1 in 1,000 would result in a probability only 4% or 1 in 25. The probability drops below 1% for cheating rates of 1 in 240 or higher.

Introducing a boosting effect was also tried. For each consecutive win the probability of winning the next game was increased somewhat. After the loss in game 5 the probabilities for winning and losing in game 6 were set to their initial values. If a boosting effect exists under the assumption of innocence, we should also expect it to exist under the assumption of cheating. However, the winning probability under the assumption of cheating is already very high in the first game, leaving practically no room for improvement. Thus the boosting effect increases the odds of the scoring/transmission pattern for an innocent player, but barely for a cheating player. As a consequence, the probability of innocence given the score/transmission pattern increases quite dramatically when mixing a boosting effect into the model.

Despite the unlikely event of 4 wins, 1 loss and 1 win coinciding with 4 well-functioning live transmissions, one failure and one well-functioning, the probability of Mihaela Sandu being innocent of cheating in the presence of this pattern was estimated to be **just below 30%**. The statistical evidence in itself is not even close to lifting the burden of proof.

Of course it can be discussed if the incidence of cheating is higher than the author of this blog post is inclined to believe. This would lower the probability of Mihaela Sandu’s innocence. On the other hand, there are other factors in favour of a higher probability of innocence given the evidence, for instance the confidence-boosting effect.

Had the probability of innocence in the presence of the scoring/transmission pattern been closer to 0 but not close enough for conviction^{11)}, the right decision would have been to **acquit** Mihaela Sandu – which means that the indicies against the accused are severe, but insufficient for a conviction. In the case of Mihaela Sandu however, the probability of innocence given the evidence is rather large, and therefore Mihaela Sandu should not only be aquitted, but also be **freed of all suspicion**.

In the absence of any further evidence of foul play we must therefore conclude that **Mihaela Sandu’s only crime was an exceptionally good start to a tournament**.

Before moving on to one of Mihaela Sandu’s games from the EIWCC I would like to say a few words the similarity between the analysis of the statistical evidence against Mihaela Sandu and the **scientific process**, which was the topic of my previous post. Accusations of foul play should be treated in much the same way as a **scientific hypothesis**, the **null hypothesis** being that the accused in fact is innocent. Just like in the scientific method, where the null hypothesis cannot be rejected unless sufficient proof against the null is presented, the defendant should be considered **innocent unless proven otherwise**.

In the absence of **decisive proof** (for instance by direct observation or a confession) it becomes necessary to conduct an analysis of the **evidence**, be it registeret datamaterial as in the case of Mihaela Sandu (scoring/transmission pattern), witness testimonies or both. The **statistical approach** used in **courtroom** applications differs from the approach prevailing in **science**. In the scientific arena we would proceed by deriving the probability of the data under the assumption that the null hypothesis is correct, then deciding if this probability is low enough to discard the null hypothesis in favour of the alternative hypothesis.

The scientific method accepts a certain level of error in the decision proces, typically allowing a 5% probability of erroneously rejecting the null hypothesis. In the courtroom **it is unacceptable to convict an innocent person**, a mistake known as an **error of the first kind **or a **type I error**. Therefore, instead of calculating the probability of obtaining the observed (or more extreme) data and comparing it to the significance level, **the probability of convicting an innocent person** is derived from the data instead^{12)}.

But why couldn’t one just use the statistical approach typically used in scientific research (also known as **frequentist inference**), adjusting the significance level to minimise the probability for error of the first kind? Scientific research is typically focussed on **population parameters **as opposed to particular **members** of a population. Typical population parameters to be investigated are **proportions**, **correlations**, **averages** and measures of **variability**, the values of which are estimated on data from the population studied, or from a subsample. The values of the individual data points are of limited interest in hypothesis testing, aknowledging that the probability of observing a particular data vector is as tiny as the probability of any other. The **distribution of the population parameters** emerging from the collection of **all possible data vectors** typically shows that an estimate within certain parameter intervals is more likely to occur than others.

Within the scientific framework, we would not be studying the scoring/transmission pattern of one particular player in one particular tournament, but rather the scoring/transmission patterns of **a large group of players**, possibly at a larger number of tournaments even. The hypothesis could be that **cheating takes place through the live transmissions**, leading to a prediction that **the probability of a loss conditional on the failure of the live transmission was higher than expected if “losing” and “live transmission failing” were independent events**. Probabilities for “losing”, “failing” and the events coinciding would be estimated from the **proportion** of lost games, failing live transmissions and incidences of both from the data. A decision to reject the null hypothesis (of there being no cheating through the live transmissions) or not would then be taken based on the likelihood of those estimates. The result – rejection or corroboration of the null hypothesis – **only reflects on the entire population** of chess players investigated. It does not carry any information about individual members of the population, nor about any particular sequences like the score/transmission pattern in the Sandu case.

A criminal investigation typically focusses on **one particular person** and** the evidence against this person**. The aim is not to make inferences about a property of an entire population. Inferences are to be made with regard to the **probability of the null hypothesis conditional on the evidence**. A statistical approach suitable for this type of problems is **bayesian inference**. In this approach, population parameters are only invoked to enable estimation of this probability. Uncertainty about the true values of these population parameters can affect the probability estimate, and therefore the excercise often involves special care to understand the implications of different choices of population parameters.

I have analysed Mihaela Sandu’s game from the 3rd round where she was White against **Aleksandra Goryachkina**. The game starts with a long theoretical discussion in a main line of the Sveshnikov Sicilian. The middle game primarily evolves around the **advancement of White’s b-pawn**. In the end White succeeds in achieving the **thematical b5-break** but only after both players committing inaccuracies. Black however has maintained **adequate coordination for approximate equality**. It is only after Black declines the white d-pawn that the balance tips in favour of Mihaela Sandu, A final **blunder** by Black allows White to **finish the game in style**.

A nice detail related to this game is that Mihaela Sandu’s opponent **did not sign either petition**. It is good to see a young and promising player like Goryachkina have an** independent mind** and let the chess do the talking, thus being a **good ambassador** for the game we all love.

To browse the game on a chess board, just select a move and click!

^{1)}Expected scores are based on conversion table 8.1b of the FIDE Rating Regulations effective from 1 July 2014. Note also comment 12.2 in the regulations: *“Tables 8.1a/b are used precisely as shown, no extrapolations are made to establish a third significant figure.”*

^{2)}For detailed information on Mihaela Sandu’s individual results at the EIWCC see chess-results.com.

^{3)}For detailed information on Mihaela Sandu’s rating development see her FIDE Chess Profile.

^{4)}In April this year no less than 2 remarkable cases of cheating through live transmission where reported by chess *and *mainstream media. For starters the Georgian GM **Gaioz Nigalidze** was disqualified from the Dubai Open for consulting engine analysis on a mobile phone hidden in a lavatory booth at the tournament site. See chess.com for a detailed report. Later in the same month at a tournament in New Delhi the Indian amateur **Dhruv Kakkar **was caught in a cheating scheme using 2 mobile phones attached to his body, a micro-speaker in his ear and an accomplice at a different location analysing with a chess engine. See chessbase.com for a detailed report..

^{5)}Readers interested in reading more about the psychology of **causal perception** and **causal reasoning** could start with this overview

^{6)}The law of truly large numbers is one of the mechanisms governing the *improbability principle*. The Improbability Principle is also the title of a book by statistics professor David J. Hand. I have not read this book, but the related website is definitely worth a visit, also (perhaps even *especially*) for those who do not have any deeper understanding of statistics.

^{7)}The correction of other shortcomings in the argumentation of Dr, Meadow implies a probability of innocence as high as 90%. It was demonstrated by Ray Hill in his article that Dr. Meadow’s estimate of 1 in 8,543 was biased. Also it should have been taken into account by Meadow that the odds for a second case of SIDS are considerably higher for families who have already suffered one case of SIDS, due to genetic and environmental factors. The failure to weigh in the homocide prevalence is however the most serious lacune in Meadow’s argument. Hill, R. (2004). “Multiple sudden infant deaths – coincidence or beyond coincidence?” (PDF). *Paediatric and Perinatal Epidemiology* **18**: 322–323. doi:10.1111/j.1365-3016.2004.00560.x

^{8)}The recent cases of proven cheating demonstrate that utilising live transmissions in foul play is not that easy. Dhruv Kakkar’s attempt involved a low cost yet well-functioning technical construction but it takes more than clever engineering to pull of a cheating scam and get away with it. Gaioz Nigalidze did not make things more complicated than absolutely necessary at the cost of having to hide a mobile device at a spot outside his immediate control. Not many of those who are tempted will have the nerve to proceed in this manner. True enough Nigalidze had his trick going for a while before being put on the spot, but signs of something being wrong had been picked up by his opponents nonetheless.

^{9)}Cheating is from all times, also in chess. However, table 8.1b in the FIDE Regulations existed long before cheating through live transmission became a possibility. Therefore the underlying probabilities of the expected scores from the rating tables correspond to proper prior probabilities, uncontaminated by live transmission related cheating.

^{10)}If the player has cheated systematically in the past, the player’s rating is too high compared to actual playing strength. Also, an unexpected malfunctioning of the cheating mechanism is likely to have an unsettling effect on the cheating player’s natural performance level

^{11)}Relying on statistical evidence alone, the highest probability considered low enough for conviction should indeed be very low. This treshold probability can be compared to the **significance level** In scientific research. Depending on the subject matter, the significance level is typically set to 5%, meaning that the probability of rejecting the null hypothesis by mistake is 5% at most. In courtroom application, the null hypothesis corresponds to the accused being innocent. Rejecting the null hypothesis in this case leads to the verdict that the accused is guilty. It is of foremost importance, that the probability of wrongly convicting an innocent person is kept at a minimum. Allowing an error margin of 5% is clearly unacceptable. The significance level applied should rather be 1% or even lower.

^{12)}For further reading about **hypothesis testing in the courtroom** i recommend this article on Type I and Type II Errors – Making Mistakes in the Justice System.

I have been reading quite a few scientific articles recently, and it has occured to me how poorly many of these articles **present** the hypothesis and the predictions tested for their readers. The information *is* there somewhere, hidden between descriptions of the theoretical framework and recounts of previous scientific efforts, but as a reader one often has to start over several times in order to get a clear picture. As a consequence it can be difficult to separate the **actual testing results** from **explorative investigations**.

After reading this blog post I hope you will find it just a little bit easier to identify the key elements in scientific articles.

The following illustration is presented to many a **university freshman** in one version or another. It depicts the human quest for knowledge as a never ending process of observations leading to questions and investigations, resulting in the development of theory, giving rise to new observations, questions and so forth.

When observing an interesting phenomenon, the researcher will call upon established scientific theories for explanation. In case no satisfactory relationship between theory and observation can be found, time has come to either **extend** existing theories or **replace** them with new and better ones. The scientific process is set in motion.

One of the core concepts of the scientific method is the **hypothesis**. The hypothesis is a statement that provides an answer to the questions that were probed by observation. As scientific tradition has it, a hypothesis has to be sufficiently concrete to enable the derivation of **testable predictions**. Through the investigation of these predictions, the hypothesis can be **corroborated** or **falsified**. The distinction between the hypothesis and the testable prediction is not always obvious, and in statistical hypothesis testing, see below, the hypothesis and the prediction are often considered to be one and the same thing.

The process of formulating meaningful predictions and testing them in order to corroborate or falsify a hypothesis is labeled **hypothesis testing**. How exactly this testing takes place, depends on the type of predictions at hand, as the following examples serve to illustrate.

*It is possible to sail from South-America to Polynesia on a primitive raft*. Thor Heyerdahl (1914-2002) derived this prediction from hypothesis, that the first people to settle on the islands of the Pacific originated from Peru. This prediction stood the ultimate test in 1947, when Heyerdahl and his crew sailed from Peru to the Tuamoto Islands on a balsa raft in his famous**Kon-Tiki expedition**.*Venus has new and crescent phases, but no gibbous and full phases.*This is predicted by the**geocentric**^{1)}model for the movements of the planets (and the sun), which was the official view at the time of Galileo Galilei (1564-1642). He demonstrated this prediction to be false by observing Venus’ appearance on the night sky through a**telescope**(a brand new invention), registering new, crescent, gibbous and full phases, just like the moon. His discovery eventually lead to a complete rejection of the geocentric hypothesis and it being replaced by**heliocentric**^{2)}models.*The incidence of fatal childbed fever can be reduced by washing hands with chlorine,*Ignaz Semmelweis (1818-1865) introduced this practice in 1847 at the obstetric ward he was appointed at. Semmelweis adhered the hypothesis, that “cadaveric material” on the hands of the medical staff (who performed autopsies as well as assisting women during child birth) was a cause of disease. His hand washing policy immediately caused the mortality rate among new mothers admitted to the clinic to drop from 10% to 3%, in accordance with his prediction.

In the last example, the prediction involves a comparison of two **numerical quantities**: the proportion of women contracting fatal childbed fever before and after a change in medical procedure. In order to be able to make a meaningful comparison of these two quantities, Semmelweis had to invoke methods of **statistical inference**: is the difference between the rates before and after large enough to be considered significant?^{3)}

In the vast majority of cases, scientific hypothesis testing involves predictions regarding counts, amounts, or other kinds of numerical quantities. Data in numerical form is collected for evaluation, typically involving **repeated measurements*** *of some sort. For instance, the data collected by Ignaz Semmelweis in the third example above consisted of monthly counts of the number of births at his clinic and the number of deaths due to childbed fever.

Obviously, the number of women giving birth at the clinic varied from month to month, and so did the number of deaths. In order to test his prediction, Semmelweis had to compare the data from before the introduction of the hand washing protocol with the data afterwards – two *time series *of fluctuating counts (the number of deaths) and weights (the number of births).

This raises the issue of deciding, whether the data material is in concordance with the predictions. In Semmelweis’ investigation the difference between the data “before” and “after” was huge, but in general,** intuition** is not considered to be sufficient as a decision making tool and the researcher has to invoke **statistical techniques**.

It is customary in statistical data analysis, that the prediction to be tested is formulated as a *hypotheses pair *(this is where statistical hypothesis testing basically treats the hypothesis and the prediction as one and the same thing). The prediction is labeled the **alternative hypothesis**. Typically the alternative hypothesis postulates that “there is a difference”. In the case of Semmelweis – a difference in observed incidence rates of fatal childbed fever.** **Its’ counterpart is the **null hypothesis**, is states that “there is no difference” or “any observed differences are random”.

In fact, hypothesis testing always implies a null hypothesis and an alternative hypothesis, even if statistical testing is not appropriate. In the latter case, the null hypothesis represents the **common understanding** of a phenomenon, whereas the alternative hypothesis is a **competing view**. In the case of Galilei, the contemporary viewpoint was the geocentric model of the universe, so this model was his null hypothesis. The alternative hypothesis was Galilei’s heliocentric model.^{4)}

Having formulated the null hypothesis and the alternative hypothesis, the testing proceeds by planting itself solidly into the soil of the null hypothesis. The statistical analysis of the data results in the **probability of obtaining the data observed under the assumption that the null hypothesis is true**. Only if this probability is very small, typically the upper limit is set to 5%, in some applications even 1%, the null hypothesis is considered to be falsified and the **alternative hypothesis is accepted**. Otherwise it has to be concluded that **no evidence against the null hypothesis was found**, resulting in its’ corroboration.

Falsification is considered to be the stronger result – the absence of evidence against the null hypothesis guarantees very little with regard to future investigations where evidence against the null hypothesis may actually be found.

In most cases the hypothesis can be interpreted as a modification or extension of an existing theory, but in some cases the hypothesis represents an opposing view with far reaching implications for “the world as we know it”.Especially if the hypothesis can not be reconciled with the current understanding, the onus is on the scientist to formulate a theoretical framework which explains the **underlying mechanisms** for the hypothesis.

A lack of theoretical groundwork reduces the hypothesis to a “black box” where the questions preceding the hypothesis are not really answered at all. As a consequence, those to whom the new insights bear relevance will be reluctant to accept the validity of the results.

Semmelweis failed to provide a mechanism for the spread of disease from corpses to women in childbirth. The contemporary view of what caused disease was very different from today and microorganisms such as bacteria were yet to be discovered. The hands of a gentleman were considered to be clean no matter what and Semmelweis’ colleagues were indignated by the request to wash hands. As many other pioneers of science, Ignaz Semmelweis did not live to receive due credit for his discovery.

^{1)}meaning all bodies circle around the Earth.

^{2)}meaning all bodies circle around the Sun.

^{3)}Further reading: Studies of the history of probability and statistics: Semmelweis and childbed fever. A statistical analysis 147 years later.

^{4)}See Explorable.com under “Development of the Null”.

Upon publishing my post of April 12 on the Danish Championship I announced on Facebook, that I was going to write a post on **Danish women’s chess** in general. It was my plan to elaborate on the huge importance of having a **national championship for women**, given the current state of women’s chess in Denmark. While writing this post, the British press turned the spotlight on a column written by Kasparov’s former challenger, **GM Nigel Short**, for the latest edition of the quality magazine **New in Chess**. Only few had noticed the column thus far, but then controversy arose over an article^{1)} in the newspaper The Telegraph under the telling headline:

Girls just don’t have the brain to play chess

The article kickstarted a veritable media storm with TV-coverage on SkyNews, other newspapers (in the United Kingdom as well as abroad) picking up the story, chess bloggers rushing to react and comments flying back and forth on Facebook and Twitter.

In light of this discussion my post turned out more elaborate than would otherwise have been the case, but the conclusions remain unchanged:

**No biological evidence exists**for women being less able to play chess than men.- The lack of women asserting themselves at the top (both nationally and internationally) is largely attributable to the
**low proportion**of female chess players. - Furthermore women’s results are undermined by the widespread
**stereotype**of women being less suited to play chess. **It is unknown**, whether or not other factors should be taken into account with regard to a possible difference between men and women’s chess potential.

In order to see more women reach top levels, the most important step is therefore to **increase the number of women** playing competitive chess, and one of the primary measures to do so is arranging **women’s tournaments**.

I will return to all of this along the way. Right now we will start where my post originally started…

Normally, when filling up my chess calendar, I go after those tournaments, where I can expect to meet the strongest competition. All-play-all tournaments, where I am among the lowest ranked. Open tournaments with minimum rating requirements. I make an exception for the team competition – here I play for **Seksløberen**^{2)} the cosiest club in Denmark, in spite of the opportunity for stronger opponents, had I played for a team in the national league.

At the age of 15, I already opted for the open qualification tournament for the national youth championships (I still lived in the Netherlands at that time) instead of the girls’ qualification. Back then the national coach tried to convince me to choose the girls’ qualification, as it was uncertain, whether or not I would receive a wildcard to the girls’ national championship in case I didn’t qualify among the boys. I ignored the warnings, did not qualify for the open championship, received a wildcard of course, and won the girls’ championship together with 2 others.

The sore spot is hidden between the lines. The **strongest competition** is seldomly met within **women’s chess**. The level of chess in the girls’ sections pales in comparison to the open sections in the junior category. Women’s championships in general just can’t measure up to the corresponding open championships.

This was certainly the case as well at the recently held Danish Championships in Svendborg. All participants in the top section (**Landsholdsklassen**) were male. Likewise in the challenger section (**Kandidatklassen**) – not a single woman had qualified to play in it. Only three out of 14 participants in the **women’s section** could have participated in **Group 1 of the 7-round Swiss** tournament (**Margarita Baliuniene**, **Miriam Olsen** and **yours truly**). And it’s not just in Denmark, that the gap between male and female players seems to be huge. One gets the same picture **worldwide**.

Are women just worse at playing chess than men? I will return to this question in the next subsection. For now it’ll suffice to note that there is **no compelling biological evidence supporting the stance that women are less able **– in contrast to most sporting disciplines, where men have the upper hand due to their physique.

Nonetheless, female chess players are constantly met with claims such as women’s brains not being built to play chess, women lacking the necessary competitive spirit, or women just being no good at chess altogether.

Nigel Short postulates the first in his column^{3)}. But despite the fact that Nigel invokes several scientific research articles to document his claims, he mistakenly draws conclusions, that are more than the scientific evidence can bear.

It is a fact though, that chess seems to **appeal less to women than to men**. Numberwise, women disappear into the crowd. Only 18 women participated in the Danish Championships out of a grand total of more than 250 players – only just above 7%. The rating list of the Danish Chess Federation counts 114 women out of 4.250 players – less than 3%^{4)}. And something indicates, that **sheer numbers** can explain why the best male players are so much better at playing chess than the best female players.

It’s tempting to conclude, that women are worse chessplayers than men when facing facts like

- All world champions from
**Wilhelm Steinitz**to**Magnus Carlsen**are male. - With the exception of
**Judit Polgár**(who has retired from competitive chess) and**Hou Yifan**, all chess players in the world’s top-100 are male^{5)}. - Only one woman (Judit Polgár) has ever reached a top-10 ranking in chess
^{6)}.

The problem is that by doing so, one uses **the extreme values of a sample** to infer something about **the sample distribution**. And this doesn’t work! A large sample for instance will typically contain a bigger number of extreme values (both high and low) than a small sample taken from the same population. The error committed by concluding on distributional level from extreme values only, is also dubbed **the extreme value fallacy**.

This can be illustrated nicely by men and women in chess. Let’s assume that men and women biologically are equally predisposed to become world champions at chess. We consider two populations, namely **all adult males** (regardless whether they play chess or not) and **all adult females**. These populations are roughly equal in size, and would therefore contain the same number of potential world champions. Within each of these populations, we consider the subset of **individuals, who have learned to play chess during childhood**. Without consulting other sources I dare to say that the group of grown up males, who have learned to play chess during childhood, **outnumbers** the equivalent group of females **by quite a margin**. In this case the **probability** of the male subpopulation containing potential world champions is **considerably higher** than in the female subpopulation.

Hence it is essential to study characteristics of the distribution instead of extreme values. Researchers have studied the rating list of the German Chess Federation, and despite there being some dispute about the results, the latest state of affairs is that **67% of the difference in level between the top male and top female chess players can be explained solely by the small proportion of female chess players**^{7)}. This leaves space for supplementary explanations, including the possible explanation that males have some kind of biological advantage, but 67% still is a whole lot.

This result indicates that the best thing we can do to get more female chess players at the top^{8)}, is to ensure **more women being introduced to and sticking to chess**. I would not know for certain how this can best be achieved, but I would say that a **women’s championship**, such as this year’s women’s section at the Danish Championships, is a good place to start. Moreover I would recommend the national women’s coach and the executive committee of the Danish Chess Federation to look at other countries with similar population counts as Denmark, such as Norway (5.1 million), Finland (5.5 million), Slovakia (5.4 million), Ireland (4.6 million) and Croatia (4.3 million)^{9)}. What is being done in those countries, where the number of female chess players is larger than in Denmark? Is there something we do in Denmark, that is not being done in those countries, where the number of female players is lower than here?

Some countries have favourable conditions with regard to chess, attracting **men and women alike:**

**A national role model**. Norway is a special case among the forementioned countries, as they have**Magnus Carlsen**.**Chess tradition**. The Eastern European countries in particular. Slovakia and Croatia are in this category.**Chess in schools**. The Danish scolastic chess organisation (**Dansk Skoleskak**) is doing great work, but in some countries chess is on the national curriculum.

Factors, which I presume particularly affect the number of women playing chess:

**Female role models**. The Slovakian women team became European Champions in 1999. I would believe the attention raised by this event to have increased the number of girls playing chess [in Slovakia].**Critical mass**. Insofar girls and women are a visible factor in chess to start with, more girls will take on chess. And as we have seen earlier, a larger number of players increases the probability of having female role models.**Women’s tournaments**,**training sessions etc**. These help increasing the visibility of women in chess and strengthening a sense of community in the absence of a critical mass. A good example is the Norwegian girls squad (**Jentebrigaden**).**Respect for women’s capacities**. We need to have a small intermezzo on this item.

It is a widespread misconception, that women can’t play chess. A prejudice fuelled by the forementioned extreme value fallacy, which women themselves ánd their surroundings fall prey to. **Bobby Fischer** is known to have said^{10)}:

They’re all weak, all women. They’re stupid compared to men. They shouldn’t play chess, you know. They’re like beginners. They lose every single game against a man. There isn’t a woman player in the world I can’t give knight-odds to and still beat.

Curious if Bobby would have been able to beat Judit Polgár under those conditions. The quote is preposterous, so can’t we just leave it at that? Apparently not. It’s not entirely without reason, that the organisation Casual Chess reacted to Nigel Shorts column with the following *tweet*:

@nigelshortchess @NewInChess Incredibly damaging/harmful when someone so respected basically endorses sexism in #chess in important magazine

— CasualChess.Org (@CasualChess) 18. april 2015

A recent study has shown that young girls are well aware of the stereotype and that there is a tendency for girls to perform worse than expected based on their own and their opponent’s rating when playing chess against boys. According to the researchers, the girls fall victim to the so-called **Stereotype Threat**, which is described as **fear to confirm a negative stereotype**. This fear not only causes **underperformance**, but also makes girls **hesitant to engage in tournament activity following a tournament with disappointing results**. This tendency was not observed for boys^{11)}.

Therefore, it is of utmost importance to fight the stereotype that women can’t play chess, in order to keep girls interested in the game as well as to boost their performance.

The stereotype threat is known from other disciplines, where prejudice favours one gender above the other^{12)}, for instance in relation to **science and mathematics**. It’s quite common to meet the attitude, also among **teachers**, that boys are better at these subjects than girls. This influences the girls’ **confidence** as well as their **results** – and when the girls are convinced they *cannot* do it, they discard science and math as a career option. It seems that the key to change this pattern is to enhance the girls’ **belief in their own skills**. Only when girls experience science and math as something they are good at, they become **interested** as well^{13)}.

I therefore propose testing the following **hypothesis** in chess: Upon introducing girls to chess it does not matter so much whether or not they find chess exciting. **The most important thing is to let them experience that they are good at it!**

Ever since the stereotype threat became apparent in educational settings, it has been attempted to teach boys and girls separately in subjects ranging from science and math to Danish and home economics. This has lead to mixed results and opinions going in all directions. Advocates emphasize among other things, that **gender segregated schooling** helps girls being more confident in traditionally “masculine” domains. An opposing view is that segregated schooling confirms prejudice providing fertile ground for the stereotype threat to flourish.

What are the implications for women’s chess – is it a good idea to arrange women-only tournaments such as the women’s section at the Danish Championships, or would it be better not to?

There is a crucial **difference** between education in science and math on the one hand, and chess on the other hand. Math and science are **compulsory** elements on the national curriculum, causing **all** girls to be exposed. The critical mass, which I have referred to as important for women, is thereby provided for, and thus the debate on partially segregated schooling can focus on whether or not segregation **increases girls’ confidence** ór if it rather **enhances the stereotype threat**.

At the same time, chess is an **optional** discipline. Not every child learns the rules of the game at all, remember my earlier claim that more boys than girls learn to play chess. Here the issue is to get more girls **through the door**. To this goal girls’ and women’s tournaments, training camps etc.are among the most powerful measures at hand. These events generate **visibility** in absence of a critical mass. Only when we have succeeded at increasing the number of female players to a certain level, can we permit ourselves the luxury of disputing the necessity or beneficiality of special treatment. Therefore I definitely think, that including a **women’s section** in this years’ Danish Championship programme was a much-needed **step in the right direction**.

Couldn’t these 18 female participants in the Danish Championships have registered themselves for one of the great many open sections in the tournament schedule? Yes they could have, and some of them did. Among the 12 participants in the women’s section there are likely to be several, who would have participated in the Danish Championships even if the women’s section had not been an option. But there are also players, who would not have come to Svendborg at all. Yours truly is one of them (as I accounted for in my earlier post Danmarksmester!)^{14)}. In other words, the women’s section succeeds in attracting more women to the Danish Championships than would have been the case without it^{15)}.

Apart from the necessary visibility, this years’ women’s section met another key factor for attracting more women to chess, due to the composition of the field. Namely the opportunity to meet with **role models**. Almost half of the field in the women’s section were junior players. Some of them are on the verge of catching up with the top women and will have to take the leap to “grown up chess” within the next couple of years. If these girls are to keep an interest for chess among many competing interests it is paramount they have role models to look up to. It is not sufficient to hear about chess idols like Judit Polgár or Hou Yifan in the chess media – there have to be some nearby heroes to relate to. Someone to compete with and lean in on. It just *isn’t* the same to compete with boys your own age, being teenage girl. There need to be some female players who have paved the way before them only so slightly. Someone they can fasten their eyes on while saying to themselves: **if she can do it, I can do it too**.

The women’s section at the Danish Championship provided an opportunity for the strongest junior girls to compete directly with some of the **top women** and experience, that they are breathing them down the neck. **Ellen Fredericia-Nilssen** started out to get yours truly on the ropes in our first round encounter – i was lucky and escaped with a draw. Later on in the tournament she came really close to beating **Marie Frank-Nielsen**. In the second round i reeled in the point against **Ellen Kakulidis** after having had a totally lost position. She proceeded making draws with both Marie and **Miriam Olsen**. The latter also had to concede half a point to **Freja Vangsgård**. These successes are great for the girls to bear in mind later this year, when competing at the **European Youth Championships in Porec, Croatia** (Freja and Ellen F.) and the **World Youth Championships in Halkidiki, Greece** (Ellen K.).

The youngest girls **Caterina Wul Micalizio** and **Elisabeth Mechlenburg-Møller** did not get to play against each other (they will have ample occasion to do so in other tournaments!) and demonstrated that they were *best of the rest*. Neither girl is bothered by shyness and they clearly enjoyed the opportunity to bond with the teenagers and chat with the grown up stars. I had the pleasure to compete with Elisabeth in the penultimate round, and she already has the aura of a Grandmaster while she plays.

*My original column proceeds to show one of Elisabeth’s games. The curious reader is kindly referred to Kvindeturneringer – en nødvendighed.*

*Only weeks after the Danish Championships, Elisabeth (Elo-rated 1437) held Anna Cramling Bellon (1925) to a draw at the Nordic Championships for Girls 2015.*

Grandmaster and PhD-student **David Smerdon** has done a great job at illuminating the issue in no less than two blog posts:

- Men, Women and Nigel Short which contains a very nice description of the ideal but hypothetical experimental setup to settle nature-nurture controversies of all kinds.
- Men, Women and Nigel Short 2: An academic response providing a more technical discussion of the claims Nigel actually makes in his column, as well as the scientific evidence or lack thereof.

David seems to agree with me that Nigel draws some conclusions that are rather more extreme than the scientific studies can account for.

Another important thing to remember is, that statistically significant **differences between two populations** are irrelevant with respect to **specific individuals from those populations**. As International Master **Greg Shahade** points out in his blog post Women in Chess: even if convincing scientific evidence should show up that women’s chess abilities are inferior to men’s chess abilities **on average**, this does not rule out that **a woman could become World Champion in chess** some day.

In relation to the subsection on** The Stereotype Threat** earlier in this post, the following statement from Greg’s post should now resonate with my dear readers:

No individual person gains anything from being told that one of their potential heroes thinks their entire group is at a statistical disadvantage.

This brings us to a very interesting blog post by **Daaim Shabazz** at **The Chess Drum**; Women in chess… the long and the Short of it. I only read this article today, while in fact it was published April 23, several days before I outed the Danish original of the post translated above on my **Skakkerlakken**-blog. Apart from touching upon several of the same scientific sources as I refer to in my post, Daaim points out that the** stereotypes and sexism** that women encounter in the chess world are very similar to the** predjudice and latent racism** people of African descent have to cope with.

Apparently, attitude issues among peers and public are not the only challenge women and blacks have in common with regard to chess. He does not mention the **participation theory** as a factor explaining the absence of black chess players in the elite, but my guess is it is relevant here as well. And in an earlier post at the Chess Drum about The Challenges of Black Chess Masters refers to the prominent representation of African players among the World’s strongest **draughts** competitors. My guess is, tradition for playing the game and the presence of role models are beneficial factors that are in place for African draught players, whereas these are sorely lacking for African chess players.

^{1)}In my original post I used the word interview, but the feature in the Telegraph does not really live up to being an interview. The feature quotes several people, among others Nigel Short, but some of these quotes originate from other sources (Twitter) or are construed (for example the phrase “Girls just don’t have the brain to play chess”. Hence I find it more correct to refer to the feature being an article, rather than an interview.

^{2)}The Danish word *seksløber* means revolver, while *løber* means bishop (the chess piece).

^{3)}This footnote was not in the original post. Nigel has repeatedly objected to having either said or written, that women’s brains are not built to play chess. In my blog post from April 27 i have openly admitted, that I had arrived at Nigel actually writing this in his column, and that I only discovered my mistake after publishing the Danish original of the blog post translated here. The blog post from April 27 also discusses the reason for this incorrect inference to happen.

^{4)}Lookup from April 12, 2015.

^{5)}FIDE-rating list from April 1, 2015

^{6)}Source: http://en.wikipedia.org/wiki/Judit_Polgar

^{7)}Michael Knapp (2010). Are participation rates sufficient to explain gender differences in chess performance? Proceedings of the Royal Society B. doi: 10.1098/rspb.2009.2257. This small article comments on the following publication:

Bilalic, Merim; Smallbone, Kieran; McLeod, Peter; and Gobet, Fernand. (2009) “Why are (the best) women so good at chess? Participation rates and gender differences in intellectual domains.” Proceedings of the Royal Society B. doi: 10.1098/rspb.2008.1576. Earlier, a similar investigation had been performed on american chess players: Chabris, Christopher F.; Glickman, Mark E. (2006) “Sex Differences in Intellectual Performance. Analysis of a Large Cohort of Competitive Chess Players.” Psychological Science, 17, 1009-1107. The new investigation mends on several, though apparently not all, methodological issues with the first – see Michael Knapps commentary.

^{8)} A recent analysis of the FIDE-rating archives by Robert Howard attaches question marks to this claim. See chessbase.com for a synopsis of Howards research on this matter. He shows, among other things, that the gap between men and women is larger in those countries, where relatively many women play chess (e.g. in the former Soviet republic of Georgia) than in countries where the proportion of female chess players is low. In my opinion his investigation is seriously hampered by the fact that far from all chess players have a FIDE-rating. By using FIDE-ratings only in stead of a national rating system, one only gets to look at the relative elite. To top it off, Howard restricts his comparison to players, who have played 750 or more rated games, another criteria removing the lower echelons. I am planning to write another blog post to explain, why we need to take the entire distribution into account, if we wish to make claims about the **mode** of the distribution.

^{9)}Estimate per December 1, 2014. Source: Wikipedia.

^{10)}Source: chessquotes.com

^{11)}Rothgerber, Hank; Wolsiefer, Katie. (2014) “A naturalistic study of stereotype threat in young female chess players.” Group Processes & Intergroup Relations, Vol 17(1) 79–90

^{12)}Instead of gender the stereotype can also relate to race or ethnicity, religion, social background, age mm.

^{13)}Source used for my original post: http://videnskab.dk/kultur-samfund/piger-mangler-selvtillid-i-naturfag, September 2008. The equivalent source in English: Tracking The Reasons Many Girls Avoid Science and Math

^{14)}As the blog post i refer to is in Danish, I will give a short explanation in English in this footnote. Being a working mother of two young boys, the time I have available for myself to participate in chess tournaments is limited. My family however agreed to spend their Easter vacation in Svendborg with me so that I could support the first women’s section at the Danish Championships in many years through my participation. Had it not been for the women’s section, I would not have had the opportunity to play chess at all this Easter.

^{15)}In any case I cannot immagine any female players staying away *because of* the women’s section!