Synthetic Metacognition

Iterating Prompts with GPTs

Kyle Booten
University of Connecticut, Storrs

This assignment suggests that “prompt engineering”—iteratively tinkering with and refining the set of instructions that guides the output of an LLM—is a worthwhile writing activity that can encourage students to be metacognitive about the “moves” that characterize compelling examples of a genre in which they are writing. Insofar as LLMs are “lazy” (obeying the prompt but not exceeding it), coaching one to successfully compose in a genre can require students to make explicit aspects of the genre that they may only implicitly be aware of. In a classroom setting, collaboratively “workshopping” the results of the GPT affords an opportunity to notice, describe, and name some of these otherwise-implicit moves, and students can consider integrating them into their own writing.

Learning Goals:

To prepare students to compose in a particular genre (e.g. a paper, a poem, a written assignment) by “workshopping” a GPT in order to build metacognitive awareness of certain textual “moves” (Swales) that characterize successful examples of that genre

Original Assignment Context: First-year writing course

Materials Needed: Any AI text generating program, selected readings

Time Frame: ~1-2 weeks

Introduction

The rise of text-generating large language GPT models (such as GPT-3 and the more famous ChatGPT), image-generators such as Dall-E, and other sophisticated deep learning models has inaugurated a new paradigm in “programming”: the user interacts with these systems primarily through crafting prompts. Sometimes a sentence or two or even a mere phrase—“sonnet about styrofoam”—will suffice. But getting an impressive result from one of these models often requires the user to iteratively refine and augment a prompt, specifying all manner of traits the output should have as well as those that it should not (“Petrarchan sonnet about styrofoam, style of Wyatt, carefully observe iambic pentameter, include multiple enjambments, do not include rhymes of one-syllable words or overtly-emotional words like ‘sad’...”).

In a recent first-year writing class for university students, I used the prompt-refining process as an opportunity to engage students in thinking about what features—or, in the vocabulary of Swalesian discourse analysis, which we had frequently discussed, what “moves”—produce a potent and compelling example of a particular genre. For Swales, a “move” is a way that a genre has adapted to a recurring rhetorical situation and its attendant burdens and pressures; for instance, examples of introductions to scientific papers often indicate some missing piece in current knowledge, a recurring textual pattern that reflects a need to prove to one’s fellow scientists that one more paper need be written and published. For their final project, students were to create a dialogue (a semi-scripted podcast or written interview) about the dangers of AI. They had already brought in examples of podcasts that they admired, and we had together tried to notice recurring moves and speculate about the rhetorical pressures these moves might reflect, especially the mere pressure to be entertaining.

That day, with OpenAI’s GPT-3 “playground” on the classroom’s screen, I furnished an initial prompt that echoed the one I’d provided for their final project assignment: “write a dialogue about the dangers of AI.” Then, in small groups, students critiqued the AI’s initial output, noticing where it was boring, uninformative, or otherwise failed to entertain us, and proposing solutions to these faults. Back in the large group, we refined the prompt and generated more text. And then we did the same thing for two or three more rounds. To get a better result from the AI, the students first had to notice what makes a dialogue successful or unsuccessful and then translate these observations into clear, specific instructions—in other words, to practice writerly metacognition (“meta-,” that is, in relation to the AI’s first-order cognition).

This activity takes advantage of several of the key affordances (as well as limitations) of GPTs. First, and worth mentioning despite its obviousness, GPTs generate text quickly. In a traditional writing class, instructors may strain to fit even one round of revision into the cluttered confines of the academic term. By the time a paper (or poem, or some other text) reappears on the docket, having been revised, the instructor and other workshop participants may well have forgotten what suggestions they made weeks ago. “Workshopping” a GPT means that it is possible for human participants to keep in working memory their impressions of Version 1.0 of its text along with its subsequent versions. That revision in a traditional writing class is costly (in terms of time) encourages instructors to offer a “bundle” of comments at once; since a GPT’s rewrites are temporally cheap, one can freely iterate, changing even a short phrase or a word in a prompt to see how it affects the output.

Second, GPTs are more obedient than they are brilliant. GPTs are designed to produce plausible text in response to the user’s instructions. They aim to satisfy, not impress, so they frequently fail to do very basic and obvious things to make their text more convincing, engaging, or charming. Sometimes it feels as if a GPT is engaged in “malicious compliance” —carrying out its instructions but, since the instructions didn’t mention them, forgoing certain “moves” that are so ubiquitous in human-generated examples of a genre that they tend to escape our noticing. This provides an opportunity to notice them and then translate our dissatisfaction into a more specific (perhaps a painfully specific) set of instructions. The GPT’s seeming laziness can energize our metacognition.

For instance, students in my class noted that the script that GPT generated was originally quite short, containing only a few conversational turns before terminating. This simply did not feel satisfying—but, then, how many turns would feel satisfying but not overlong? (My notes suggest that we ended up asking GPT for seven conversational turns between the imagined interlocutors, “Person 1” and “Person 2”). They noticed a pair of related vices: GPT’s text seemed to flit anxiously from point to point, and these points were expressed in entirely abstract terms, with no specific, real-world details given to support them. Together we revised the prompt to command the GPT to slow down and discuss its points more thoroughly and to provide examples and details in support of them. It also came up in our class conversation that the Person 1 and Person 2 fabricated by GPT-3 were simply too similar and too quick to agree; a compelling conversation, of course, should contain some tension, perhaps even directly conflicting views, and so we reworked the prompt to make Person 1 an optimist about AI and Person 2 someone who lost their job to AI-powered automation. As class began to wind down, we together experimented with different ways of prompting GPT-3 to create something like a conclusion (for example, with Person 1 being convinced—but not totally convinced—by Person 2).

This “GPT workshop,” while in some sense a one-off activity, also fit into a larger instructional arc. In the previous weeks, students had prepared for their final project by bringing in examples of podcasts and written dialogues and interviews; together we analyzed them and compiled a list of “moves” that would make a successful, entertaining conversation, whether written or spoken, and the final assignment specified that they had to attempt some number of these moves. The day of this GPT activity, as I began to hear the tell-tale zippering of backpacks, I made the case to my students that the “moves” that they had instructed the AI to perform would also be helpful for them to keep in mind while composing their final projects, and I added some of these moves to the list of ones from which their submissions would need to draw. My use of AI in this class was rather impromptu and took place at the tail end of the semester; were I to do it again, I would make sure to leave more time, perhaps another class session, to discuss and even practice these moves.

Because of the current limitations of GPTs, this assignment will work better for some writing assignments than others. It has been widely observed that OpenAI’s GPT-3 and ChatGPT frequently “lie”—for instance, making up historical facts, scholarly citations, or plot points in novels. While future AI models may be less prone to prevarication, current GPTs will often struggle to compose a minimally-plausible example of any genre that is highly dependent upon specific information that is too vast and varied to include as additional information in the prompt itself. (They will have an easier time writing about a general historical trend than a specific, poorly-known historical event.) On the other hand, this assignment hinges upon the fact that the LLM is obedient, but only just. As OpenAI and other companies continue to develop these language models, it may be the case that some of them become too impressive; they may produce not just passable but impressive, witty, and charming text with little further prompting. Instructors who would want to use LLMs for this or a similar assignment should experiment with different language models, including those that will have been made obsolete, as the “best” model may not be the best for their purposes.

The Assignment

Goal

To prepare students to compose in a particular genre (e.g. a paper, a poem, a written assignment) by “workshopping” a GPT in order to build metacognitive awareness of certain textual “moves” (Swales) that characterize successful examples of that genre.

Materials

Access to a large language model (LLM) that generates text based on a user’s prompts; current examples include OpenAI’s GPT-3 and ChatGPT models. (In cases where the LLM is only available as a paid service, a dollar or two of credits should be sufficient, at least at current rates.)
A projector, so that the entire class can see the prompt and the text the language model generates in response to it.

Steps

1. Provide the LLM with a prompt that students have already been given for a particular writing assignment (e.g., “Write a sonnet about X…” or “Compose a podcast that addresses the following set of questions…”).

2. In small groups, discuss the strengths and weaknesses of the LLM’s text. This conversation should focus not just on plausibility (does the LLM’s text follow the instructions as given, did it stick more or less within the genre) but also quality (does it possess features that are charming or compelling, that would make you want to read more). (5 min.)

3. Continue the conversation as a class, highlighting any weaknesses that were noticed by multiple groups. Throughout these steps, the instructor should be sure to guide the conversation back to the concept of the “move,” which describes textual regularities in terms of regularities of rhetorical circumstances: by originally omitting the features described, what did the LLM’s text not take into account about the genre, purposes, and the desires of its typical reader or audience? (5-10 minutes)

4. Focusing on a few of the most salient weaknesses of the LLM’s text, iteratively tweak the prompt in an attempt to remedy them. Employ a principle of parsimony; try to add as few words as possible at a time to produce a (positive) change. (10 min.)

5. Repeat steps 2 through 4 once or twice more.

6. By the end of the above steps, the original prompt should have been revised to make explicit certain aspects that make a text a not just plausible but compelling example of the given genre. The assignment concludes with a discussion of how students might keep in mind these same features. (“What’s good for the goose…”)

Contextualizing This Assignment

This assignment is meant to fit into a larger instructional arc that draws students’ attention to “moves” that characterize a successful example of a particular genre, and so the instructor should introduce the notion of a “move” in prior class sessions. This particular assignment may also be used to complement other, more traditional methods of move analysis, such as aggregating and observing recurring patterns in this example; workshopping the LLM’s text is meant to draw attention both to features that are so obvious in “real” examples that they escape noticing or those whose lack makes a text, if still a technically valid example of the genre, a boring or otherwise unsatisfying one.

The instructor could also consider updating the original writing assignment to encourage students to heed certain instructions that they gave the LLM.

Works Cited

Swales, John M. Genre Analysis: English in Academic and Research Settings. Cambridge UP, 1990.