The wisdom of deliberative crowds
Pooling answers is good, but collectively agreeing a system is better
A neat set of studies shows how groups can be made smarter than the sum of their parts. The studies are reported in a preprint “Collective problem decomposition drives the wisdom of deliberative crowds”. Reasonable People covered the work of corresponding author Joaquin Navajas previously: How to outsmart a crowd of 5000 people in 4 minutes (based on Navajas et al, 2018).
These new studies, like that study, asked people to perform estimation tasks - coming up with answers to questions which most people are unlikely to know the precise answer to.
In situations like this the Wisdom of Crowds effect holds - averaging answers yields a better answer. Navajas’s 2018 study showed that letting people talk - even if only briefly - produced better results than any method of combining individual answers. This latest report gives additional insight into why, and how a simple instruction can be used to improve the performance of groups.
The team from Buenos Aires reasoned that when groups get together to discuss, they can do so in more or less effective ways. One method is to compare their individual answers - the numbers they have come up with - and combine them somehow, averaging, or dropping outliers then averaging, or other more or less sensible methods of combination.
The second method is to find a way of breaking the estimation task down into component parts, and agreeing estimates for the components. So, for example, if the task is to estimate the height of the Eiffel Tower, we could all make guesses - 100, 200, 300 metres! - and average those answers. This is method one. Method two might be something like agreeing that we could compare the Tower to a building with a certain number of stories, and also estimate how high each story in a building is, and use that to produce an estimate.
The physicist Enrico Fermi was famous for his ability to come up with reliable estimates using this kind of problem decomposition. He once successfully estimated the power of the first test nuclear detonation in the US by dropping a piece of paper from his hand as the shockwave from the blast arrived at his observation point. Estimating the weight of the paper, and the distance it travelled, allowed him to estimate the force at the observation point, and combining this with the distance from the observation point to the blast allowed him to estimate the force at the epicentre.
After Fermi, the research team called this second method Collective Fermi Estimation.
Across three studies they showed that how much a group discussion uses Collective Fermi Estimation can be reliably measured (and that measurement automated), that it is associated with better group deliberation, and that groups can be rapidly taught to adopt it to improve their performance.
Here are the questions (and correct answers) used in the research:
Unlike the 2018 study the team put participants in chatrooms, rather than face to face, which allowed them to keep a detailed record of their deliberations.
Their first study showed that the degree to which groups broke a problem down into component parts to arrive at their estimates, rather than merely sharing the initial individual estimates and combining them, was associated with better results:
This figure shows how they coded group discussions, the left side showing an illustrative group which focused on individual’s numerical answers more, and the right side showing an illustrative group which discarded initial individual answers in favour of agreeing a method for making the estimate based on component parts.
Here is the association between the two strategies and performance (in terms of distance from the right answer, so lower on the y-axis is better). Groups which relied more on Collective Fermi Estimation did better.
These scores for how the groups worked were produced by human raters, but the team also validated an automated language analysis showing it could produce the same categorisation.
If you are like me, and this discussion of study 1 leaves the correlation-is-not-causation siren ringing in your ears, fear not!
Study 2 shows that instructing groups to use the Fermi Estimation strategy produces better performance. Participants in this study were split into two conditions. Half of the groups watched a short instructional video telling them to share their initial numerical estimates. Half watched a video giving them the Fermi strategy, telling them to divide the problem into smaller estimation problems and perform all relevant computations to reach their final answer.
The graph on the right shows the higher performance (lower error) of the groups instructed to follow the Fermi strategy (standardised effect size of d = 0.26, a small-medium effect).
Study 3 showed that asking individuals to apply the Fermi estimation method on their own did not produce the same benefit. The averages of individuals who applied this method weren’t as good as when interacting groups applied the method together.
This last result is intriguing. It suggests that reasoning together allowed groups to improve their problem decomposition - and the estimates that went into it - in a way that couldn’t be rivalled by merely aggregating over individuals, even individuals who were doing the same problem decomposition on their own.
Recall that in the 2018 study, groups of five outperformed the average of the much larger group of individuals (who provided answers before any group discussion).
Much has been made of negative group dynamics - herding and other conformity effects. It’s great to see a demonstration that discussion can help a group to be greater than the sum of the parts.
This newsletter is free for everyone to read and always will be. If you can afford it, feel free to chip in to help me keep writing for everyone: upgrading to a paid subscription (more on why here).
Below, references, further reading and other things I’ve been thinking about.
Full reference
Barrera-Lemarchand, F., Charreau, L. V. L., Ruiz, J., Caceres, N., Carrillo, F., Sigman, M., & Navajas, J. (2025, October 6). Collective problem decomposition drives the wisdom of deliberative crowds. https://doi.org/10.31234/osf.io/5xyze_v2
The 2018 study
Navajas, J., Niella, T., Garbulsky, G., Bahrami, B., & Sigman, M. (2018). Aggregated knowledge from a small number of debates outperforms the wisdom of large crowds. Nature Human Behaviour, 2(2), 126-132. https://doi.org/10.1038/s41562-017-0273-4
And my write up
How to outsmart a crowd of 5000 people in 4 minutes. An ingenious experiment shows the secret sauce need to improve on the wisdom of crowds
Methods check
I like to comment on the indicators a study has that it is reliable. Not only do I have high trust in this lab - having looked at their previous work in detail - this study shared preregistered analysis plans, as well as the data and analysis code for each study. All good signs. The studies recruited a good number of participants, although one of the curses of research on groups of four is that you need four times as many participants to achieve the same statistical power as a study of individuals (so study 2 which has 240 individual participants only had 30 groups in each condition - not enough for a well powered study).
Another positive indicator of reliability is that, these studies replicate the findings of the 2018 study (but in chatrooms, rather than live). The report also probes the mechanism behind the effect by running an intervention (study 2), allowing stronger causal inference than the observational Study 1. Fantastic!
Other things…
What we think of each other
This is a very striking finding.
“The U.S. is the only place we surveyed where more adults describe the morality and ethics of others living in the country as bad than good” (Pew on X). Here’s the full results for the 25 countries surveyed:
You can only assume this is to do with the exceptional levels of affective polarisation in the US. Although it seems like it is Democrats who tip the average towards believing the majority are morally/ethically bad:
Democrats and independents who lean toward the Democratic Party are much more likely than Republicans and Republican leaners to rate fellow Americans as morally and ethically bad (60% vs. 46%)
Full report: In 25-Country Survey, Americans Especially Likely To View Fellow Citizens as Morally Bad (Pew, 2026-03-05)
Michael Hallsworth: Why Simplicity Can Be Strength in a Complex World
This is relevant to last week’s post (“Interception at 10,000 miles an hour”).
Michael Hallsworth applies the principle of simplicity to behavioural science itself:
We may think that practitioners are always implementing precise technical plans—and they certainly try. Yet much of the time they are satisficing or muddling through, trying to make progress with limited understanding in an uncertain world.
He concludes that simpler guidance, with strong affordances, is often more likely to be used, and so be more effective, than more comprehensive, but complex, frameworks.
Link: Why Simplicity Can Be Strength in a Complex World
Catch-up
Other recent posts from me
The effects of political disappointment. Testing how we feel about each other when democracy doesn’t go our way, a work in progress.
Good bias. Let’s untangle what people mean when they say the B word
Gambling with research quality. How you get 244 different ways to measure performance on the same test of decision making. And what it means for the reliability of behavioural science
PAPER: The Laziness of the Crowd: Effort Aversion Among Raters Risks Undermining the Efficacy of X’s Community Notes Program
Unfortunately the most plausible, hardest to properly check, falsehoods may be underserved by crowd-sourced content moderation:
We hypothesize that claims requiring less cognitive effort to evaluate, specifically, those that are obviously false and easy to refute, are more likely to receive public notes than claims that are more plausible and require greater effort to debunk. Using eighteen months of vaccine-related Community Notes data (2,250 posts) and ratings from 382 survey participants, we show that claims perceived as more difficult to fact-check are significantly less likely to receive notes that achieve ``helpful''/public status.
Link: Wack, M., Warren, P., & Alam, M. (2026). The Laziness of the Crowd: Effort Aversion Among Raters Risks Undermining the Efficacy of X's Community Notes Program. arXiv preprint arXiv:2603.11120.
… And finally
Poling Upriver against the Wind (detail) 舟行人物図Formerly attributed to: Shutoku 18th–19th century.
via @zencolor
END
Comments? Feedback? Problem decomposition? I am tom@idiolect.org.uk and on Mastodon at @tomstafford@mastodon.online







