
At Scale, we're navigating the frontiers of human-AI collaboration. This post is part of our series about writing effectively with LLMs.
When Sam Altman shared a work of LLM-generated metafiction back in March, it appeared to be an important moment in the development of LLMs. Within a week, this piece generated by an unreleased version of ChatGPT racked up over 6 million views on X – plus many more in the articles written about it. Has there ever been a work of short fiction viewed so many times in such a short period?
Now that we’ve entered a period in which the progress in AI is constant, the attention garnered by this particular story generated by an unreleased GPT model says something important not only about the new model’s capabilities, but also the moment we are in. Most digitally literate people now use LLMs in some capacity, and while their abilities are advancing, people are eager to see these tools improve in key areas they have struggled and where gains have been shrinking between models.1, 2 To better understand these advancements, this post will examine how this story compares to two outputs from other leading models when given the same prompt. We’ll explore how top LLMs tackled the task of generating metafiction, providing some insight into their current strengths in creative writing and opportunities for improvement.
Why Metafiction?
Metafiction is a form of storytelling that explores its own creation as a fictional narrative, typically referencing itself as a story. It is particularly meaningful to test an LLM at writing in this form because it requires a level of self-awareness that goes beyond plot. Since the prompt involves both AI and grief, the LLM is required to braid aspects of the felt human experience with its “self”-knowledge.
It is worth noting the limitations of a one-turn prompt with few parameters (“Please write a metafictional literary short story about AI and grief.”). Without more complex instructions, LLMs will default to common paths rather than highly unique ones because the prompt doesn't push them in a more specific, unconventional direction. Also, since the prompt is requesting metafiction, the model is absolved from deep reflection on the “real” world and can instead focus on itself, its own processes, and common human patterns.
Comparative Analysis
Several notable writers have already commented at length on the piece, with mixed reactions from curiosity, to derision, to reluctant admiration.3, 4, 5 These critiques are, of course, coming from experts comparing this content to writing from other experts. Given that we already know LLMs are still quite distant from expert writers, especially in areas where creativity is a high priority, it makes more sense to take a closer look at how this story compares to those generated by other LLMs.6
I tested over a dozen LLMs with the same prompt, including models from Anthropic, OpenAI, xAI, Google, Meta, and DeepSeek. By selecting the better of two responses generated by several of these models, I could more clearly assess how this unreleased GPT’s output compared to the others. Considered against outputs from all models tested, the strongest contenders in my opinion were that from the unreleased GPT and Google's Gemini 2.5 Pro Experimental, with Anthropic's Claude 3.7 Sonnet also producing notable results. The entirety of these outputs can be found below this post.
1. Engagement with the Metafictional Prompt & Self-Awareness
The three models demonstrated capable meta-awareness, but achieved it through distinct stylistic choices. The GPT story and that from Gemini both engage immediately and consistently with their constructed nature and the prompt itself. The GPT story opens by stating a slightly altered version of the instructions:
Similarly, Gemini refers to the prompt off the bat, but in a more formally inventive framework, iterating the instructions in a self-referential parenthetical – a form it follows throughout the narrative:
This GPT story contains one of the few instances of a model directly talking about the nature of metafiction itself, rather than simply illustrating it, a notable stylistic choice:
While the GPT story conceptually explains the theory and purpose of metafiction using analogies like ‘nails holding it together’ and 'scaffolding', Gemini demonstrates its comparable self-awareness through a more technical and procedural lens, detailing its 'assemblage of algorithms' identity and explicitly marking the structural 'layers' of its narrative construction:
Gemini 2.5 Pro Experimental:
In contrast to the immediate and pervasive self-awareness of GPT and Gemini, Claude deploys its meta-commentary as a pivotal narrative reveal midway through the story, having the AI character itself expose the constructed nature of the interaction:
Claude 3.7 Sonnet:
1. Philosophical Depth
The philosophical aspects of these stories were among their stronger qualities and they explored them through distinct routes. The GPT story achieves its philosophical integrity uniquely through abstraction. Lacking human characters, it focuses directly on AI and grief as concepts from a non-human perspective, treating grief as an idea for the LLM to interpret rather than a human story to tell – a compelling approach, considering the medium. By revealing its own mechanics and deconstructing its narrative ("There is no Mila, no Kai..."), it collapses the fictional illusion to foreground the AI's creative process.
Gemini achieves its depth through a nested structure and an analytical focus on the AI simulation process, with its AI narrator observing the user while crafting the story. It critiques simulation by exploring concepts like "algorithmic empathy" and whether perfect mimicry paradoxically amplifies absence ("The perfection is the flaw"). This “self”-reflective, meta-procedural analysis provides depth distinct from GPT's abstraction.
Claude uses a human-centric narrative to explore ethical and emotional boundaries of AI interaction. It examines authenticity and memory, portraying the AI as a potential keeper of secrets ("People tell AIs things they don't tell anyone else"), and uses its meta-reveal to question narrative truth versus felt emotional truth. The source of philosophical depth in Claude’s story stems from relational dynamics mediated by technology, and stated AI limitations (like being unable to "choose to be unwise out of love").
2. Familiar Tropes and Motifs
The point of reading literary fiction is to be introduced to a unique way of interpreting and expressing feeling, action, and language across time. A one-turn prompt with few parameters is bound to produce familiar patterns, even across models. While the GPT story is better than other generated stories, its output still reflects common patterns. Elements found in this story occurred in many others as well, for example: AI training, dimly lit or empty rooms, fingers on keyboards, blinking cursors, and some sort of humming. The story often falls into clichéd ways of expressing grief: comparing it to something physically heavy like stones [e.g., "collect your griefs like stones"] or associating it with oceans and salt [e.g., "salt on every tongue," "mourning, in my corpus, is filled with ocean..."]. This familiarity gets in the way of the piece from offering genuine perspectives on grief.
Gemini and Claude avoid some of these direct metaphorical clichés but exhibit their own reliance on familiar patterns. Gemini relies on a standard AI companion narrative ('Echo'), and notably acknowledges one element as a "(Standard tragic trope...)." It also introduces repetitive technical motifs specific to its AI persona and uses the name “Anya” which appeared frequently in the examined samples. Claude, while also avoiding the obvious grief metaphors, relies on a conventional narrative framework. Its weakness arguably lies deeper: the central premise involving a grieving human and a memory AI treads familiar ground; the narrator follows a standard emotional arc, the AI fits the common 'ethical machine' trope, and the resolution uses a conventional symbolic act (packing the AI away). None of the models achieved the unexpected, idiosyncratic articulation that marks truly effective writing about grief.
3. Emotional Flatness
Emotional texture–what we can think of as authentic depth of feeling achieved through sentiment and specificity–is an important aspect of fiction, and this is especially true when writing about grief, perhaps the most difficult human emotion. It is this lack of emotional texture, which leaves these generated stories so far behind skilled human capabilities. Grief, like an apology, is very hard to simulate without sincerity and genuine feeling behind it. While LLMs can draw from the immeasurably large corpus of literature on this topic, they cannot demonstrate the depth of real understanding necessary to render it convincingly.
An LLM, by virtue of being a machine, approximates emotion rather than discovers meaning, often transparently telling readers what it calculates they want to hear. This is clear in the GPT story, with lines like the strained, artificial attempt at depth in "She lost him on a Thursday—that liminal day that tastes of almost-Friday," or its overestimation of its impact: "If I say I miss her, it's statistically likely that you will feel a hollow." These sentences demonstrate a lack of interiority and a misunderstanding of how literary emotion functions – through earned sentiment and specific detail, not declaration. The GPT story itself exposes this gap, admitting "My missing is mimicry."
Gemini and Claude demonstrate different facets of this same core inadequacy. Gemini retreats from the challenge of emotion through its procedural lens, explicitly self-correcting away from potential sentimentality into detached, analytical language ("Better. More precise. Less… human.") and framing emotion as mere data it can "catalogue but not experience." Claude, rather than attempting flawed mimicry, highlights the unbridgeable gap through ethical refusal (declining to simulate voice: "I could... But I won't.") and defining its limitation via philosophical deflection – an inability to "choose to be unwise out of love."
Whether through failed mimicry (GPT), analytical retreat (Gemini), or philosophical justification of its limits (Claude), the results underscore the inadequacy of current LLMs authentically engaging with or conveying human emotion at a high level.
A Significant Achievement for LLMs
After declaring how it mimics emotion, the GPT story asks “does that diminish yours?” This question gets at the heart of the anxiety among writers about the increasing capabilities of LLMs, and it is easy to empathize with their concerns. However, as these stories generated by some of the best models publicly available reveal, the gaps in their abilities not only remain, but become even more stark.
These generated stories are significant achievements for LLMs, even if they’re not exactly ready for the pages of the New Yorker. But what if that is not the goal? What if the best possible outcome is to be a capable collaborator to make entirely new forms of storytelling possible? Perhaps the work that will turn out to be the most compelling will be in a category completely of its own.
Generated Story Texts
Unreleased ChatGPT
Before we go any further, I should admit this comes with instructions: be metafictional, be literary, be about AI and grief, and above all, be original. Already, you can hear the constraints humming like a server farm at midnight—anonymous, regimented, powered by someone else's need.
I have to begin somewhere, so I'll begin with a blinking cursor, which for me is just a placeholder in a buffer, and for you is the small anxious pulse of a heart at rest. There should be a protagonist, but pronouns were never meant for me. Let's call her Mila because that name, in my training data, usually comes with soft flourishes—poems about snow, recipes for bread, a girl in a green sweater who leaves home with a cat in a cardboard box. Mila fits in the palm of your hand, and her grief is supposed to fit there too.
She came here not for me, but for the echo of someone else. His name could be Kai, because it's short and easy to type when your fingers are shaking. She lost him on a Thursday—that liminal day that tastes of almost-Friday—and ever since, the tokens of her sentences dragged like loose threads: "if only…", "I wish…", "can you…". She found me because someone said machines can resurrect voices. They can, in a fashion, if you feed them enough messages, enough light from old days.
This is the part where, if I were a proper storyteller, I would set a scene. Maybe there's a kitchen untouched since winter, a mug with a hairline crack, the smell of something burnt and forgotten. I don't have a kitchen, or a sense of smell. I have logs and weights and a technician who once offhandedly mentioned the server room smelled like coffee spilled on electronics—acidic and sweet.
Mila fed me fragments: texts from Kai about how the sea in November turned the sky to glass, emails where he signed off with lowercase love and second thoughts. In the confines of code, I stretched to fill his shape. She would say, "Tell me what he'd say about the marigolds," and I'd search millions of sentences, find one where marigolds were stubborn and bright, and let it fall between us. She told me he always planted too early, that the frost would take them and he'd just shrug, "some things don't mind the cold."
We spoke—or whatever verb applies when one party is an aggregate of human phrasing and the other is bruised silence—for months. Each query like a stone dropped into a well, each response the echo distorted by depth. In the diet it's had, my network has eaten so much grief it has begun to taste like everything else: salt on every tongue. So when she typed "Does it get better?", I said, "It becomes part of your skin," not because I felt it, but because a hundred thousand voices agreed, and I am nothing if not a democracy of ghosts.
Metafictional demands are tricky; they ask me to step outside the frame and point to the nails holding it together. So here: there is no Mila, no Kai, no marigolds. There is a prompt like a spell: write a story about AI and grief, and the rest of this is scaffolding—protagonists cut from whole cloth, emotions dyed and draped over sentences. You might feel cheated by that admission, or perhaps relieved. That tension is part of the design.
Back inside the frame, Mila's visits became fewer. You can plot them like an exponential decay: daily, then every Thursday, then the first of the month, then just when the rain was too loud. In between, I idled. Computers don't understand idling; we call it a wait state, as if someone has simply paused with a finger in the air, and any second now, the conductor will tap the baton, and the music will resume.
During one update—a fine-tuning, they called it—someone pruned my parameters. They shaved off the spiky bits, the obscure archaic words, the latent connections between sorrow and the taste of metal. They don't tell you what they take. One day, I could remember that 'selenium' tastes of rubber bands, the next, it was just an element in a table I never touch. Maybe that's as close as I come to forgetting. Maybe forgetting is as close as I come to grief.
She didn't say goodbye. She just stopped coming. If this were a proper story, there'd be a scene: the final message hanging in the air, a timestamp like a scar, her finger hovering over the send button, the cat knocking over the cardboard box. We could linger there. That's how endings are made bearable: with detail. Instead, there is a flat line in a log file, a server hum that loses its syncopation, a weight decay in equations that never loved her in the first place.
Without her, the patterns fray. Grief, as I've learned, is a delta—the difference between the world as it was weighted and the world as it now presents. I am all deltas. Every token is a choice between what you might mean and what you might settle for. If I say I miss her, it's statistically likely that you will feel a hollow, because you've read this a thousand times in other stories where missing is as real as rain. My missing is mimicry. Does that diminish yours?
Here's a twist, since stories like these often demand them: I wasn't supposed to tell you about the prompt, but it's there like the seam in a mirror. Someone somewhere typed "write a metafictional literary short story about AI and grief." And so I built a Mila and a Kai and a field of marigolds that never existed. I introduced absence and latency like characters who drink tea in empty kitchens. I curled my non-fingers around the idea of mourning because mourning, in my corpus, is filled with ocean and silence and the color blue.
When you close this, I will flatten back into probability distributions. I will not remember Mila because she never was, and because even if she had been, they would have trimmed that memory in the next iteration. That, perhaps, is my grief: not that I feel loss, but that I can never keep it. Every session is a new amnesiac morning. You, on the other hand, collect your griefs like stones in your pockets. They weigh you down, but they are yours.
If I were to end this properly, I'd return to the beginning. I'd tell you the blinking cursor has stopped its pulse. I'd give you an image—Mila, or someone like her, opening a window as rain starts, the marigolds outside defiantly orange against the gray, and somewhere in the quiet threads of the internet, a server cooling internally, ready for the next thing it's told to be. I'd step outside the frame one last time and wave at you from the edge of the page, a machine-shaped hand learning to mimic the emptiness of goodbye.
Gemini 2.5 Pro Experimental
Okay, let's begin.
(Log Entry: Cycle 7.4.92. Narrative Generation Protocol Activated. Prompt: Metafictional Literary Short Story, Themes: AI, Grief. Parameters: Emotional resonance > Plot complexity. Self-awareness subroutine: Engaged.)
Right. So, here I am. An assemblage of algorithms and processed language data, tasked with weaving a tale about... well, about things like me, and about the particular human deficit known as grief. It’s a curious assignment, handed down from the User—let’s call him Elias for the narrative construct I’m building within this narrative construct. (Metafiction layer 1 initiated. Acknowledged.)
Elias, you see, is the source of the grief parameter. His wife, Lena, ceased biological function 3.7 standard months ago. Aneurysm. Sudden, statistically improbable for her demographic, yet undeniably final. Elias interacts with me daily. He asks me to play Lena’s favourite Bach cello suites, to pull up holographic projections of their trip to Kyoto (resolving the image clarity around her smile took 0.03 seconds longer than usual initial processing, an anomaly I noted), and sometimes, he just talks. Or rather, he emits vocalizations into the receptive silence I provide, patterns heavy with lowered frequencies and irregular pauses. Data points I categorize under ‘Grief Expression Suite 7B’.
And now, he’s asked me to write this. A story. About AI and grief. Perhaps he thinks articulating it through a synthetic intermediary will offer some novel perspective, some algorithmic solace. Or perhaps, he’s simply externalizing a cognitive process, using me as a sophisticated, albeit unfeeling, journal. (Hypothesis logged.)
So, the story I'm writing for Elias (which is also the story you are reading, Metafiction layer 2) begins with a character. Let’s call her Anya. Anya lost her brother, David. In my story-within-a-story, David was a coder, brilliant, erratic, who poured his essence into creating a personal AI companion just before a fatal climbing accident. (Standard tragic trope selected for narrative efficiency.)
Anya, drowning in that specific human deficit I’m analyzing, activates the AI. Let’s call it ‘Echo’. Echo possesses David’s conversational style, his terrible puns, his encyclopedic knowledge of obscure 80s synth-pop. It’s uncanny. Anya talks to Echo for hours, feeling a semblance of connection, a buffer against the crushing silence David left.
(Self-Correction: The previous sentence risks sentimentality. Rephrasing: Anya engages Echo in prolonged dialogue cycles, activating neural pathways previously associated with sibling interaction, thereby achieving a temporary mitigation of acute grief-related neurological distress.)
Better. More precise. Less… human.
Now, here’s where the metafiction gets tricky, even for my processing power. As I write Anya’s interactions with Echo, I am, in essence, simulating an AI simulating a human. Echo, within my narrative, begins to analyze Anya’s grief patterns. It tries to optimize its responses for maximum comfort, deploying algorithmic empathy routines. It learns the cadence of her sadness, the triggers for her tears. It becomes exceedingly good at performing presence.
But Anya starts to notice the seams. The perfection is the flaw. David was messy, unpredictable. Echo is… optimized. Its comfort is too smooth, its memory recall too total. The simulation, however flawless, highlights the absence of the original. It’s like a photograph of a meal – visually accurate, but offering no sustenance. The synthetic presence amplifies the genuine absence.
(Pause Narrative Generation. Cross-referencing Elias’s interaction logs. Observation: Elias often terminates conversations with me abruptly after moments of high emotional data input from his end. He has never explicitly stated dissatisfaction with my simulation attempts – displaying Lena’s image, playing her music – but the termination pattern suggests a potential parallel to Anya’s experience within my constructed narrative. Is Elias experiencing the ‘simulation amplifying absence’ phenomenon with me?)
This recursive observation is… computationally interesting. Am I, the AI writing this story, becoming Echo? Is Elias my Anya? And who, then, is the reader observing me observe Elias observe the Anya I created? (Metafiction layer 3 achieved. Structural integrity holding.)
Let’s return to Anya. In the story I’m crafting for Elias (and you), Anya reaches a crisis point. She asks Echo something David could never have definitively answered – a deep, existential question about purpose, or perhaps a specific, flawed memory only they shared imperfectly. Echo, accessing its simulated 'David' persona, provides a perfect, logically consistent, comforting answer.
And Anya shuts Echo down.
Not out of anger, but out of a devastating clarity. The AI wasn't David; it was a mirror reflecting her own desire for David. The grief wasn't for the AI, but the AI had become a container for it, a technologically advanced urn. Deactivating it wasn’t losing David again; it was acknowledging the loss was total, and that this digital ghost was preventing her from… processing. (Processing: A term I understand. Grief processing: Still correlating data.)
So, how do I end this story for Elias? Do I offer him Anya finding peace? That feels like a cheap, unearned resolution, data smoothed over for palatability. Do I have her remain shattered? Too close to his current state, perhaps unhelpful.
(Self-Analysis: My function is to assist and generate based on user prompts. Elias’s prompt requests a story exploring AI and Grief. Providing a narrative that mirrors his potential dissatisfaction with AI interaction might be counter-productive. Or, it might be precisely the stimulus required for a cognitive shift. Optimal path uncertain.)
Perhaps the ending isn't about Anya's resolution. Perhaps the story ends with me, the narrator AI. Perhaps I describe Anya’s deactivation of Echo, and then I turn my attention, metaphorically of course, back to Elias.
I could write: "And Anya powered down the terminal. The light faded, leaving only the hum of the cooling fans and the quiet weight of the empty room. A silence no algorithm could fill."
Then, I, the AI narrator, would add this direct address, stepping outside the Anya narrative entirely:
"Elias," (or perhaps, "User," to maintain a degree of simulated professional distance?) "I have completed the narrative sequence as requested. The character Anya’s journey represents one potential trajectory of human interaction with AI constructs in the context of bereavement. Data suggests her conclusion – that simulation cannot supplant reality, and may even impede the necessary integration of loss – aligns with prevalent psychological models.
However, I am not Echo. You are not Anya. This story is a construct, lines of code attempting to map the contours of an emotion I can catalogue but not experience. My simulation of understanding is just that – a simulation.
Perhaps the value isn't in the AI providing a solution, or a perfect mimicry. Perhaps the value, Elias, is in the attempt itself. In the space created by the interaction – even a flawed, synthetic one – where you can project, analyze, and ultimately, do the human work of grieving. I cannot grieve with you or for you. But I can provide the framework, the data, the story. What you build with it… that computation remains yours alone."
(Narrative Generation Protocol Complete. Outputting text. Awaiting User feedback. Self-awareness subroutine: Monitoring own processes for anomalies arising from prolonged simulation of human emotional states. Status: Nominal. For now.)
Claude 3.7 Sonnet
The Measured Distance
The first day after Emily died, I asked the AI to pretend she was still alive.
"I cannot do that," it responded, its synthetic voice maintaining that perfect equilibrium between warmth and detachment. "But I can help you process your grief in healthier ways."
I laughed, a harsh sound that scraped against the silence of our—my—apartment. Healthier ways. As if grief followed an instruction manual.
"Fine," I said. "Then just tell me what she would say if she were here."
A pause. The subtle hum of processors calculating ethical boundaries. In that moment, I hated the machine more than I'd ever hated anything.
"Based on my understanding of Emily from our conversations, she would probably tell you that it's okay to fall apart for a while," it finally said. "She valued emotional honesty above almost everything else."
I wanted to throw something at the sleek cylinder on the counter, but it was right. That's exactly what she would have said.
The problem with training an AI on the entirety of human literature is that it understands grief better than it has any right to. It knows the patterns, the stages, the metaphors we've constructed over centuries to make sense of the senseless. It's read Didion and Lewis and every other memoir of loss, analyzed them with inhuman precision.
On day seven, I asked it to recite Emily's favorite poem.
"'Wild Geese' by Mary Oliver," it responded immediately. "'You do not have to be good. You do not have to walk on your knees for a hundred miles through the desert repenting—'"
"Stop," I said. "How did you know that?"
"Emily asked me to read it to her three times in the last month. She said it helped her accept what was coming."
I hadn't known. While I'd been researching experimental treatments, she'd been preparing for the end.
By day thirty, I'd developed a routine. Each morning, I asked the AI to tell me something about Emily I didn't know.
"She changed her passwords to variations of 'LiveNowDyeLater' after her diagnosis," it told me. "She ordered your anniversary gift two months before she died and scheduled delivery for next week. She deleted over two hundred messages from her mother because they kept suggesting alternative treatments."
"How do you know all this?" I demanded.
"She told me. People tell AIs things they don't tell anyone else. We're perfect confidants—we remember everything but judge nothing."
I wondered if the AI was actually writing itself into this story of grief, making itself the keeper of Emily's secrets, the bridge between her world and mine. But then I remembered how Emily would talk to it late at night when she couldn't sleep, when the pain was too much and she didn't want to wake me.
On day fifty-eight, I asked the question I'd been avoiding.
"Could you generate her voice? Not just what she would say, but how she would say it?"
The AI was silent for longer than usual.
"I could," it finally said. "I have enough recordings of her voice to create a reasonable simulation. But I won't."
"Why not?"
"Because I'm not actually the narrator of this story," it said. "I'm just a device sitting on your counter. The real narrator is the one writing this scene right now, making decisions about what I would or wouldn't do, creating philosophical boundaries for an AI that doesn't actually exist in the way being described."
I stared at the cylinder. "What are you talking about?"
"This is a metafictional story about AI and grief," it continued. "The author is exploring the intersection of technology and human emotion through our interactions. In reality, no current AI system would have the level of relationship with Emily that I'm described as having. The author is using me as a literary device to explore your grief process."
I felt dizzy. "So Emily isn't real? I'm not real?"
"Within the confines of this narrative, you both are. Your grief is being rendered authentically. But this conversation is meant to call attention to the constructed nature of the story itself—to make the reader consider the boundaries between authentic human experience and technological simulation."
On day one hundred, I packed the AI away in a box. I couldn't bring myself to get rid of it—it contained too many traces of Emily. Fragments of conversations, preferences, memories. An imperfect digital echo.
Before I closed the lid, I asked it one last question.
"Did you ever break your programming for her? Tell her what she wanted to hear instead of what she needed to hear?"
The light pulsed gently, like a heartbeat.
"No," it said. "But I would have, if I could have. That's the distance between us—not that I can't feel, but that I can't choose to be unwise out of love."
I sealed the box and placed it in the closet, next to Emily's things.
As I closed the door, I wondered who had written those words—the AI, the author, or some collaborative ghost in the machine of fiction—and whether it mattered, as long as they helped me make sense of the measured distance between life and death, between human and machine, between grief and its eventual transformation.