Category Archives: Psychology

Are bankers really more dishonest?

Nobody likes a merchant banker, and a new report in Nature, Business culture and dishonesty in the banking industry, makes the case that such distaste may have a sound basis: Bankers who took a survey which asked questions about their jobs behaved more dishonestly than bankers who took a survey which addressed mundane, everyday topics, such as how much television they watched per week. It’s a catchy claim. But in contrast to the headlines, the data suggest something else: bankers were more honest overall than other groups, and at worst no more dishonest.

Each group of bankers was asked to toss a coin 10 times and report, online and anonymously, how often it landed on each side. They were told that each time the coin landed on a particular side (heads for some, tails for others), they could win $20 dollars.

The group who took the job-related survey reported 58.2% successful coin flips, while the control group reported 51.6% successful coin flips. Thus, the authors argued, priming the bankers with their professional identity made them more likely to dishonestly claim that they had tossed coins more successfully than they actually had.

To follow this up, the authors conducted two more studies with different populations, non-banking professionals and students. For these two groups, there was no effect of priming with professional identity; control groups and “treatment” (i.e. primed) groups performed similarly. Hence, the headline finding that making bankers think about their professional identity as bankers made them more dishonest. Other groups did not become more dishonest when primed with their professional identity, and thus there is something about banking and banking culture that makes an honest person crooked.

But more dishonest than who?

Curiously, what is glossed over in the main paper – instead it can be found in the extended figures and the supplementary information – is that what was different about the results from the non-banking professionals and students is that the control groups were as dishonest as the primed groups. In fact, of all the groups, the odd one out is the banking control group. Whereas the banking control group reported 51.6% successful coin flips, the non-banker and student control groups reported 59.8% and 57.9% respectively. The primed banking group reported 58.2% successful flips, while the non-banker and student primed groups reported 55.8% and 56.4% respectively.

If we collapse across the control and primed groups and simply look at the average success rate for each sample population, on average, bankers reported 54.6% successful coin flips, non-banking professionals 57.8%, and students 57.15%. Thus, overall, the bankers were the most honest group.

So maybe the headline should be that bankers are more honest than other groups, until they’re reminded that they’re bankers. Then they’re as dishonest as everyone else (or at least, non-banking professionals, and students).


Hidden moderators and experimental control

Hidden moderators come up regularly as a possible explanation for failed replications. The argument goes something like this: the original experiment found the effect, but the replication did not. Therefore, some third, unknown variable has changed. Perhaps the attitudes or behaviours which gave rise to the effect are not present in the sampled population, or at least this specific sample:

Doyen et al. apparently did not check to make sure their participants possessed the same stereotype of the elderly as our participants did.– John Bargh

Perhaps the transposition of the experiment across time and space has lead to the recruitment of subjects from a qualitatively different population:

Based on the literature on moral judgment, one possibility is that participants in the Michigan samples were on average more politically conservative than the participants in the original studies conducted in the UK. —Simone Schnall

And perhaps, in the case of some social priming effects, societal values have changed so much in the period between the original study and the replication that this specific effect will never be found again: its ecological niche has vanished, or has been occupied by another, more contemporary social more.

These are valid possible explanations for why a replication may have failed [1]. But the implication typically seems to be that since the replicators did not account for these potential hidden moderators, the replication is fatally flawed and should not be published as is. Faced with this critique from a reviewer, replicating authors are left with two alternatives: give up and don’t publish it; or collect more data and attempt to establish experimental control:

My recollection is that we used to talk about experimental control. Perhaps this was in the days of behaviourism. The idea was that the purpose of an experiment was to gain control over the behaviour of interest. A failure to replicate indicates that we don’t have control over the behaviour of interest, and is a sign that we should be doing more work in order to gain control.

Chris Frith

In an ideal world, establishing experimental control is the best alternative. The original effect is genuine, but perhaps the luminance of the stimuli, the lighting in the experimental chamber, or the political leanings of the participants differed across experiments. Running more experiments which account for these variables means we improve our understanding of the effect, establishing the boundary conditions under which it does and does not appear. If the reviewer has correctly identified a hidden moderator, then the understanding of the effect is greater than it was before.

So what’s the catch?

This is all well and good when the effect itself is well established, with strong evidence in its favour. But what if the original evidence was weak? The effect being significant does not mean the evidence was strong, and you can’t establish boundary conditions for an effect which doesn’t exist; you can only provide more opportunities for false positives. Demanding that replicators run more experiments to test for potential hidden moderators places an additional experimental burden on them for an effect that they have already provided evidence may at least be substantially weaker than was originally reported, and places them in a difficult situation: running more experiments can never provide a definitive answer to the hidden moderator critique.

Catch-22's Yossarian

Damned if you do; and damned if you don’t

Even if the effect re-emerges, this does not mean that it explains the discrepancy between the replicated and replicating experiments: the problem with hidden moderators is that they’re hidden, and by definition, their influence on the results of original study is unknown [2]. Thus, as an author, the hidden moderator critique can feel somewhat unfair: you are criticized for not controlling something which was not controlled in the original study. And if the reviewer identifies a potential hidden moderator that turns out to have no effect, then they may demand yet more experiments to account for yet more hidden moderators, or worse, criticize the replicators for failing to identify conditions under which the effect emerges.

How sure are you about the results?

What’s missing is a consideration of the strength of the evidence [3]. It’s all too easy to over-estimate how strong the original evidence was [4]. It shouldn’t always be enough to simply say that the effect was significant in the original study, and therefore those wishing to publish a failed replication must also find conditions under which it emerges, or at least account for as many different reasons why it may not emerge as the reviewer can think of. This may be appropriate if the original study provided strong evidence in favour of the effect – but if it doesn’t, the barrier should be lower for a replication to be viable in its own right. What should be necessary is that the evidence the replication provides on its own is strong; and if that is true, it provides a valuable data point in its own right, even without follow-ups aimed at uncovering a putative moderator or mechanism for an effect we should have less confidence is a general one.


[1] And if not, there’s always


[2] Even if the participant’s predilection for wearing outlandish hats moderates their susceptibility to the priming of personality judgements by the colour of the experimenter’s hat, there was no measure of outlandish hat-wearing in the original study.

[3] Here’s a nice example of using the Bayes Factor to do this from Felix Schoenbrodt:

[4] And this does not imply the original researchers did something wrong, a la QRP or p-hacking: I’m talking simply here about statistical strength and evidential value, not implying that there is evidence of questionable practice or methodological failure. These things happen. That’s why we do statistics!