Legibility and Legitimacy
Reasonable People #56: On recognising the limits of reasoned decision making.
Lately I’ve been working with research funders, looking at their decision making around allocating money to research proposals. Which researcher wouldn’t want to look behind the curtain at how this happens?
I was at the British Academy in London1 when I heard a story about use of random numbers to allocate funding. This idea isn’t as unreasonable as it sounds. And we prefer to call it ‘Partial Randomisation’ to emphasise that the lottery is only applied to a subset of all proposals a funder receives. Partial Randomisation solves a decision problem for the funder. Not only do they receive more proposals than they can fund, but even after the proposals have been reviewed they have more good proposals than they can fund. It’s a variant of a universal decision problem - too many options. What is a reasonable decision rule? Well, selecting randomly from the good options isn’t so unreasonable. For research funding, you can even argue that, given the uncertainty in review scores, the quality of many proposals are statistically indistinguishable so you really should randomise.
At this meeting, one funder - a national research agency from outside the UK - described how they had trialed the use of partial randomisation in funding allocation and liked it. Maybe it reduces bias in the “grey zone” where proposal quality is ambiguous. Maybe it encourages applications from people outside the circle of the usual suspects who receive funding. Maybe it reduces the agony of deliberation at that point where there is only funding for one more proposal and two both have average scores of 7.09. Certainly, partial randomisation is a release from a bind that all reasonable decision makers must face - yes, we want to use evidence and deliberation to guide our decisions, but not all aspects of all decisions can be arbitrated by the evidence, no matter how much deliberation. When that happens, it is reasonable to roll the dice. Better to recognise the arbitrariness of some decisions explicitly. That is the logic that this research funder accepted.
Once you decide to randomise, you need the random numbers. Easy enough, they come in infinite supply with a single line of computer code. The first year this funder tried randomisation they used a Python script to shuffle the list of all eligible proposals and pick out the winners. Every benefit I’ve mentioned is satisfied by this method, but the funder has since abandoned it. For every subsequent year they’ve run the scheme they’ve used slips of paper in a hat, with a nominated teller to pull slips out one by one and announce the winner to the room. “It’s not as efficient”, said the funder representative, “but we like to see the process happen”.
* * *
For another funder, I was privileged to be in a discussion about the choice of scoring rule. Each funding proposal receives reviewer scores, and the standard thing is to fund the highest quality proposals. But this simple-sounding principle can be implemented in multiple possible ways, all reasonable. For example, a panel can discuss the proposals and revise the scores (or make a decision informed, but not constrained, by them). Or you can just average the scores, but you should have a rule for how to deal with outlier or anomalous scores; that one reviewer who hatred a proposal everyone else loved, or vice versa. Another option is to try and adjust the scores for the biases of individual reviewers. We know it is likely that some people tend to give high scores, and others tend to give low ones. It doesn’t seem fair if my proposal gets scored by mean reviewers, and yours gets scored by generous ones. So maybe the funder should adjust scores to correct for biases, but this opens the can of worms of exactly how. You can only estimate reviewer tendencies from the scores they have given, and if a reviewer has seen four proposals and hated all of them, well, maybe they really did see four absolutely execrable proposals, and those are the scores the proposals deserved. It can get complicated quickly.
Maybe because I’m a psychologist, I went into our meeting strongly believing that we should adjust for bias, and I had a whole toolkit of algorithmic options to do this. Maybe no single adjustment was perfect, my thinking went, but anything would be better than no adjustment at all, leaving the success of individual proposals to the luck of the reviewers they were allocated. The meeting outcome surprised me. I argued hard for adjustment for reviewer bias. I evoked first principles, called on findings showing that reviewer biases are inevitable, demonstrated the algorithms. In the end we went for a scoring rule where we dropped the lowest reviewer score for each proposal, then took a simple average. Statistically, this has nothing to say for it. It throws away information (what if that lowest review really did reflect the low quality of the proposal?) and fails to adjust for biases, in either direction, across the vast majority of reviewer scores. The surprise, though, was that I ended the meeting agreeing with this rule. I went into thinking that reason demanded we adjust scores, but I hadn’t accounted for the need to explain that reasoning. Not only were there too many possible ways of adjusting scores, each had quirks, different costs and benefits. The “drop the lowest and then average” rule is suboptimal with respect to biases in the data, but it has the virtue of being very very easy to explain. The funders care deeply about bias and fairness in proposal scoring. They want to be reasonable in their decisions, but they also need to be seen to be reasonable in their decisions.
This was a case where striving for the very highest levels of reason would have incurred the cost of legibility. I still believe that score adjustment is desirable, but I can see now that sometimes the transparency and comprehensibility of why you do what you do can trade-off against optimising to your own satisfaction. Reasons have use-value and exchange-value. In a social world, it is not enough to be reasonable, you have to convince others you are reasonable, to explain yourself.
* * *
The way we talk about reason is coloured by its history - Greek philosophers with syllogisms, logic and mathematics, the Cartesian thinker working out the truth from first principles. From this view, the exchange of reasons is an afterthought, something you might do, but not something intrinsic to reason.
There’s a counter-story - reason exists first in exchange. It’s elemental matter is the ability to be traded with others, not in its inner logic. It’s an outside-in view.
The value of taking on this perspective is that it helps you see that what counts as reasonable varies depending on where you are looking from. The further away from the decision makers a decision needs to be accepted, the more transmissible the reasons need to be.
This helps me make my peace with the suboptimal but very transparent scoring rule at the second funder. It explains the inefficiency of pulling proposals from the hat at the first. A line of code could do the randomisation in a fraction of the time, but this limits who could verify the randomisation. Yes, anyone could look at the code, but even if they understood the programming language, what could they tell someone who didn’t - “I looked and it was right”? Pulling the winners from a hat is both an easily understandable and explainable mechanism, it also creates common knowledge. Everyone can say they saw and understood the selection, as it happened.
The trade-off between the in-theory-better decisions and more-easily-explainable decisions is provocative. Does it mean that the wider the audience - the greater variety among those I want to perceive my decision as legitimate - the cruder I should make the rationale? And, that maybe sometimes, if we want a better decision from a team, we shouldn’t force them to have to explain it?
These thoughts are tempting, and there may be cases where the limits on what can be explained mean we do have to adjust our reasons. But in most cases, I believe, the long term is best served by fixing on the best reasons, and adjusting our explanations to make them travel as far as they can. It’s another discipline provided by the outside-in view of reason: if we can’t explain ourselves in a way that other people think makes sense, maybe our reasons aren’t any good.
Catch-up service
RP#55 The truth about digital propaganda
Recommendations
Seminar series: The Cognitive Science of Generative AI organised by Ulrike Hahn at Birkbeck. Lots of good speakers coming in 2025. Recordings of past speakers here.
Kevin Dorst: A majority didn't vote for a liberal nightmare Essay from 25 Nov argues that the more informed you think you are about US politics, the more you should distrust your perception of the supporters of the party you oppose.
Kevin Munger: Video Killed the Social Media Star (Sept 2024)
"A politics based on social media is instead anti-ideological; rather than leaders, it rewards reactionary vibe-surfers attuned to the arbitrary whims of users amplified by platform architectures. The longer the social media whirlpool spins, the more people become alienated from the entire self-reinforcing enterprise—and the more vertiginous the gyre for those that remain. "
One for academics: Daniel Nettle on Staying in the Game:
"how do you stay in the game without winning [big prizes]? ...That’s what takes real grit, humanity, wisdom and technique: to just be there, quietly, purposefully, usefully, afflicted with neither pomposity nor despair, whatever the weather."
The Atlantic: The Business-School Scandal That Just Keeps Getting Bigger (Nov 2024). Dan Engber lays out the problematic state of TED-talk optimised psychology research
This crazy strong visual illusion. The horizontal lines look sloped but they are parallel. PARALLEL I tell you. A reminder that our perception isn’t the same as reality.
Science: Megastudy testing 25 treatments to reduce antidemocratic attitudes and partisan animosity. Let’s go Figure 2A:
“Megastudy identifies many efficacious treatments that reduce partisan animosity, support for undemocratic practices, and/or support for partisan violence. (A) Twenty-three treatments significantly reduced partisan animosity (n = 31,835), most efficaciously by presenting relatable, sympathetic individuals with different political beliefs or highlighting a common, cross-partisan identity”
And finally ….
RIP Michael Leunig (2 June 1945 - 19 December 2024), whose cartoons have regularly featured in this newsletter.
END
Comments? Feedback? Crude explanations for complex decisions? I am tom@idiolect.org.uk and on Mastodon at @tomstafford@mastodon.online
It was a private meeting, held in December 2024, so I am omitting identifying details of people and funding agencies. Some other details and quotes are filtered by the unreliability of my memory.