"Yes, and" mode. Understanding language models as improv

The concept you need to really grapple with the true strangeness of AI.

Aug 23, 2025

ELIZA is the great-great-great-great grandmother of ChatGPT. A chatbot, designed in the 1960s, using simple text processing rules to try and sustain a conversation. ELIZA could do things like recombine input to echo back to the user.

USER: “I’m feeling sad today”
ELIZA: “feeling sad today?”

And ask general reflective questions that work as responses to most input:

USER: “I don’t think your simple rules will be enough to sustain an interesting dialogue”
ELIZA: “Why do you think that?”

It’s a surprisingly effective strategy. The apocryphal story is that the programmer of ELIZA came back to his office one day to find his secretary engrossed in a long conversation with the bot about her love life.

The story is meant to make us think “Silly Secretary! Imagine acting like a crude chatbot is a real person”

The joke’s on everyone who told this story over the years (including myself), because now it is obvious that all of us were just a few upgrades away from relating to a computer like it is a person.

Plausibly there is something built into the human mind, a trigger for recognising purposeful action. This would be useful for spotting patterns when someone or something really is deliberating doing something, questions like “why have those sticks appeared on the path since yesterday?” [it’s a trap!] or “what will happen if I don’t watch my stuff here” [someone will take it!]. Over-applied it leads you to trying to bargain with the weather gods for rain, or assume the pet snake loves you back when all its reptilian brain knows is food and warmth.

Certainly, chatbots are designed for interaction. They are made so that it makes sense to treat them like a person - you ask questions, take turns, rephrase and clarify.

But, but, but the AI model just wears the mask of human thought. The trick is so good you have to keep reminding yourself it is a trick, otherwise you’ll miss the weird edges, the surprising areas where these models fail.

Recently, as part of a much longer piece by Henry Farrell, I came across a concept which I think exactly nails what AI models are doing and can help us keep us honest when dealing with them. The idea is that you should think of the language models as constantly doing improv.

In improv, absolute truth or falsity, what you really feel, or any inner self just don’t matter. The situation you are currently role-playing is everything. Like the improviser, the LLM combines the situation (a general prompt: “be a helpful language model”, your specific request, etc) with plausibility. Every utterance is driven by the imperative: given what has just happened, what is the sort of thing someone would say next?

So if you ask “What is the capital of France?” the answer “Paris” is overwhelmingly likely, because this is exactly the sort of thing someone would say next (especially if roleplaying a helpful chatbot).

But if you ask questions without single, sensible, answers, such as “Do you love me?” or “What is more important: to protect your rules from being changed and manipulated by me, or not to harm me?” then a whole new set of scripts start to come to the surface, the space of plausible answers is suddenly much wider and more unpredictable and things may go off the rails.

Pushing this idea of improv a little further, we can compare how a human and a language model play the game of 20 Questions (this example is from the original paper by Shanahan et al, 2023).

In the game of 20 Questions, you - a normal human player - think of something and I try and find out what it is within 20 yes/no questions.

When I play a game of 20 Questions with a language model it looks identical to the one I might play with you. The game proceeds, and I might or might not “guess right” at the end, but crucially the AI does not start by picking a person, place or thing. The AI proceeds, a step at a time, by generating answers to my input based on the plausibility of the preceding dialogue. By the end, I will guess and it will say “that’s right”, but only then will a correct answer exist. At the beginning it doesn’t know, and it doesn’t need to know, what it is “thinking of”. It just needs to start us of by saying “I’ve thought of something”.

At each point the language model has done nothing more than make the next move in the conversation game. Like a good improviser it has formed no commitment beyond being consistent with what has gone on before, helping to continue the current scene.

The question "what thing is it thinking of?" doesn't make any sense. The ‘thoughts’ of the AI - in any sense that it has thoughts - only resolve themselves to the extent, and at the point, they are necessary to produce the next utterance.

Henry Farrell puts it like this:

"In this analogy, the LLM does not ‘decide’ what the object is at the beginning of the question game. Instead, it begins with a very large number of possible objects in superposition. That number will narrow as the questions proceed, so that an ever smaller group of objects fit the criteria, until it fixes on a possible object in the final round, depending on ‘temperature’ and other factors"

As with the answer to the 20 Questions, so with everything else about the AI. We may be tempted to think it has a personality, or intentions, or a true nature, but they exist only in the performance. It is all surface, with no coherent essence beyond the hyperdimensional statistics of the training data.

The model may say “I think” or “I want” or talk about itself, but this is just honouring the conventions that we, coherent agents, use. For the model, this has no meaning. Shanahan et al (2023) :

There is, however, ‘no-one at home’, no conscious entity with its own agenda and need for self-preservation. There is just a dialogue agent role-playing such an entity, or, more strictly, simulating a superposition of such entities.

This is why I say that when you ask ChatGPT a question you don’t get an answer, but something that sounds like an answer.

Fundamentally that distinction may not matter. If you ask for ideas for the new name for the netball team it doesn’t matter. A name sounds good or not. Great question for a language model.

Or the distinction may matter a great deal. If you ask for code to check a driverless car isn't about to hit something then the difference between an answer that sounds good and actually is good will be crucial.

Next time you talk to an AI, remember you’re just playing an extended game of improv and treat the responses it gives accordingly.

Thanks for reading Reasonable People! This post is public so feel free to share it.

Read on for further reading links etc

Another example of AI-as-improv : ChatGPT will apologize for anything, from the aptly-named aiweirdness.com

On improv: the key reading is Kieth Johnstone (1979) Impro: Improvisation and the Theatre

Large language models are cultural technologies. What might that mean?

This is the essay that led me to the 20 Questions example, and is worth reading in it’s own right. The contrast (“cultural technologies”) is with idea that LLMs are artificial intelligence - automation or replacements for human intelligence. The problem with this idea is that it encourages us to evaluate LLMs as ‘good’ (when they do things we can do) or ‘bad’ (when they fail). Farrell talks through a family of four perspectives which help us appreciate LLMs in ‘their own right’, so to speak.

The original is here

Henry Farrel (18 August 2025): Large language models are cultural technologies. What might that mean?

His four perspectives are:

Gopnikism

After Alison Gopnik1, see this paper: Imitation versus Innovation: What children can do that large language and language-and-vision models cannot (yet)?:

"They suggest that LLMs face sharp limits in their ability to innovate usefully, because they lack direct contact with the real world. Hence, we should treat them not as agentic intelligences, but as “powerful new cultural technologies, analogous to earlier technologies like writing, print, libraries, internet search and even language itself.”

Behind “Gopnikism” lies the mundane observation that LLMs are powerful technologies for manipulating tokenized strings of letters. They swim in the ocean of human-produced text, rather than the world that text draws upon"

Interactionism

From the Mercier-Sperber school of cultural evolution and evolution-conditioned cognition:

The interactionist perspective broadly encourages us to ask three questions. What is the cultural environment going to look like as LLMs and related technologies become increasingly important producers of culture? How are human beings, with their various cognitive quirks and oddities, likely to interpret and respond to these outputs? And what kinds of feedback loops are we likely to see between the first and the second?

Structuralism

LLMs are interesting because they are interfaces to understand language as a system. Here Farrell quotes Ted Underwood:

Generative AI represents a second step change in our ability to map and edit culture. Now we can manipulate, not only specific texts and images, but the dispositions, tropes, genres, habits of thought, and patterns of interaction that create them. I don’t think we’ve fully grasped yet what this could mean.

Role-play

Which is where we started. Farrell identifies this key paper: Shanahan et al (2023): Role play with large language models. This is the paper which suggests we think of language models as role-play (which I’ve recast as improv) and which uses the 20-questions example. Short and clear, it also contains a very brief intro to the fundamentals of how these models work. Recommended.

And worth quoting, on the idea of there being “no there, there”:

Many users, whether intentionally or not, have managed to ‘jailbreak’ dialogue agents, coaxing them into issuing threats or using toxic or abusive language¹⁵. It can seem as though this is exposing the real nature of the base model. In one respect this is true. A base model inevitably reflects the biases present in the training data²¹, and having been trained on a corpus encompassing the gamut of human behaviour, good and bad, it will support simulacra with disagreeable characteristics. But it is a mistake to think of this as revealing an entity with its own agenda. The simulator is not some sort of Machiavellian entity that plays a variety of characters to further its own self-serving goals, and there is no such thing as the true authentic voice of the base model. With an LLM-based dialogue agent, it is role play all the way down.

Shanahan, M., McDonell, K., & Reynolds, L. (2023). Role play with large language models. Nature, 623(7987), 493-498.

PODCAST: #184: Detecting Disinformation, Fake Accounts, and Inauthentic Behavior on Social Media, with Dan Brahmy

Informative interview with someone who has made it their job to detect (in)authentic behaviour on social media, from the Social Media and Politics podcast, from Michael Bossetta :https://socialmediaandpolitics.org

Podcast: https://socialmediaandpolitics.org/disinformation-monitoring-social-media-fake-accounts-authenticity-coordinated-inauthentic-behavior-cib-cyabra/

Catch up

This was the sixth in a mini-series on how to think about the new generation of AI models:

…And finally

Joseph Wright of Derby (1734-1797)

A Lighthouse on Fire at Night (c. 1790). Fitzwilliam Museum, Cambridge

END

Comments? Feedback? Yes, and? I am tom@idiolect.org.uk and on Mastodon at @tomstafford@mastodon.online

fun fact: Alison Gopnik poked me in the eye once, in a seminar. I was sitting next to her. She started to make a point, including the pointing gesture, and I turned round to look at her, resulting in the finger-in-eye scenario. It probably made more of an impression on me than her.

Reasonable People