Learning about Text Technology through the LLM Generation of Papers

Nick Montfort
Massachusetts Institute of Technology

Students are assigned to generate a paper about a highly specific, recent text technology, using a free Large Language Model, and then to reflect on this. Our goals: (1) highlight new aspects of the writing process, (2) see how text technologies (previous to LLMs) have influenced writing, and (3) encounter LLMs. While many more students have now heard about the concept of LLMs and have tried them out, it may actually be more helpful now and in the future to have an assignment that introduces a “raw” LLM (without the additional structures of ChatGPT and Bard).

Learning Goal: Critically discuss and gain understanding of an AI system through exploration

Original Assignment Context: Intermediate level creative writing course, The Word Made Digital

Materials Needed: Accessible transformer-based LLM (GPT-NeoX 20B was used in this assignment)

Time Frame: ~2 week

Introduction

I developed a class, The Word Made Digital (21W.764J / CMS.609J / CMS.846), at MIT in 2008. It’s cross-listed as a subject in Comparative Media Studies and in Writing. I’ve taught it intermittently; eight times in all. In it, we bring poetics and computer science approaches to bear on digital textuality, with an emphasis on understanding non-narrative creative projects throughout history and on doing similar sorts of work. The Word Made Digital deals extensively with digital literary art and requires students to do some more or less traditional writing, but it is not one of our “Communication Intensive” classes here.

In fall 2022, amid growing buzz about new text generation technologies, I changed the second of two critical paper assignments and required that students generate rather than write their papers using a large language model (LLM). Although this was an experimental assignment, it suited the context of the course in many ways. Because the course involves studying text technologies and practices by doing creative work, we were able to deal with a hot topic in a way consistent with other classwork and with the understanding that developments in text technologies have a history.

The Assignment

Computer-generate a short critical paper using a Transformer-based LLM (Large Language Model) such as GPT-NeoX 20B and then write a brief (approximately 2 page) discussion of your experiences using this type of text generation. Specifically:

The paper you generate will be a critical discussion of a particular form of digital writing that traces one or more of its material, formal, explicit, or implicit structures or constraints. The paper should explain the cultural consequences of what this type of digital writing is, in a material and formal sense. Choose a form that arose in the context of digital media, networked or otherwise. This form could be popular or unpopular, low-brow or high-brow, recent or antiquated: Advice animals, Arts & Letters Daily teasers, bash.org quotes, crack screens, dating profiles, mailing list archives, Mastodon posts, Reddit AMAs or AITA posts, Tumblr memes, update accounts, Unix man pages, Urban Dictionary entries, Wikipedia articles, Yahoo! Answers posts, Yelp reviews, etc. Be innovative and come up with one that isn’t on the list, if you can. Your goal in generating the paper is to characterize the form, describe the important constraints related to it, and reveal at least one non-obvious thing about the form.
You will generate your paper using a system such as GPT-NeoX 20B, although if you like you could use another system. GPT-NeoX 20B (a 20 billion parameter model) can be used for free and is a free (libre) software system, open source and open access. You do not need to pay for access to a proprietary LLM to do this assignment, nor have special access due to your work in a research lab. It is also possible for you to use a free (libre) software model that is multilingual and has 179 billion parameters: BLOOM, which you can access online after registering.
After you have generated a critical paper, you will write a brief discussion of your process and an assessment of how useful (or useless) the LLM was for you in this particular case. What did you learn about the form? About the LLM that you used? About writing itself?

Context and Purpose

The Word Made Digital is among other things an arts class (in creative writing) and it is built around creative projects; there are four of them assigned, each with a mandatory draft stage for workshop discussion. These are engaged with many historical approaches to digital textuality, but before I added this assignment, the projects did not deal at all with LLMs, which were hot topics even before they recently became incendiary. This assignment was really an experimental one, meant to offer students some experience with and perspective on these fairly recent models. Rather than having students read technical papers about Transformer or other advances in language modeling, we took an approach consistent with our creative work and simply noodled around with a model to get some sense of how it worked. The main learning objective was an ability to critically discuss and gain understanding of an AI system through the exploration of it.

Outcomes

All but one of the students used GPT-NeoX 20B, as suggested; the one who did not had access to GPT-3 and employed this model. The assignment was given before ChatGPT was released.

What the LLM was supposed to generate, in the case of this assignment, was a paper about a topic that is often very recent and fairly esoteric, and in many cases has not been discussed in academic literature. My expectation was that generating a reasonable paper of this sort would be considerably more challenging than having a LLM write about AI in general or having it produce an essay about a very well-worn topic, such as World War II.

Students selected a reasonable array of digital forms, with some of these selections being more conventional and some more innovative. For instance, one student considered the Listserv as a means of communication, while another looked at text-based terminal email clients (mail transfer agents) with a particular focus on Pine, now known as Alpine. They had a wide variety of reactions to the text they generated. Some found it off-topic and incoherent. Some found that GPT-NeoX made downright bizarre statements about how, for instance, chat messages were essentially unlike other types of writing in that they were “timeless.” Some found that the system would produce instructional, how-to sorts of text but could not be guided into analysis of the digital form in question — this was the case with the attempt to analyze the Alpine mail client. Some others (including one student who worked on tweets, a classic format that has been widely discussed) found the output of the system informative in some ways, saying that it did extend initial knowledge of the topic.

One notable reaction from a student came after she found a factual error that was stated rather brazenly and confidently by GPT-NeoX. After this, she reported investigating everything it generated in a way that exceeded the fact-checking she would have done online with human-written documents. This led us to discuss whether the erroneous outputs of GPT-NeoX were any worse than those one might encounter in a typical Web search.

Most students generated text progressively, writing a prompt, reading the output produced in reply, and writing an additional prompt. One student, however, mentioned that he generated several different papers and pieced the results together, organizing bits of generated text to address the assignment in the way he thought was best. During our discussion, I noted that a similar technique was used by The Guardian in late 2020 when this newspaper published an op-ed purportedly written by AI. While this seemed to us rather deceptive in certain ways, it did represent how newspapers publish writing, with significant intervention by a rewrite desk and editors and (in the case of news stories) with many reporters often contributing to a single story. Finally, we discussed whether this technique provided insight into the writing process. Would we be willing to write several independently formulated drafts of an essay and then piece them together? Well, the answer seemed to be a clear no, as this would consume a huge amount of time, but something similar to this process can be employed in moving from a collection of notes to a rough draft.

Software and Skills Needed

This assignment asks students to use a free (libre) software LLM. Free software is a political movement, not about price; the sort of LLM used could also be understood using terms like open source and open access, if one likes. To me, this is not just convenient, but solves several problems: If students are required to use a closed, proprietary system such as ChatGPT they will be contributing their labor to a company and to helping it improve its proprietary system. (This is true whether or not they pay for access.) Compelling students to do so in exchange for a grade is, in my view, unethical. Assigning students to use closed systems and contribute, for free, to the improvement of these systems in completely opaque ways is quite different than having a supervised internship where the company supports the student’s learning, a student provides work in return, and the arrangement is explicit and can be reviewed. The use of proprietary products/services is also unscientific in at least two ways: First, these are not documented in peer-reviewed papers and many of the most basic things about them are kept secret; Second, these products are updated and changed by companies all the time, so experiments cannot be repeated.

The actual technology of an LLM, which can be accessed through an open model such as GPT-NeoX or BLOOM, or the more recent Falcon 40B from Hugging Face, can be worth investigating. Some people say of the guitar that it has low stairs and a high ceiling: You can make some music with it without being very skilled, but you can also go on to become a virtuoso. I’m not sure that there’s as much range in LLMs, but using one with default parameters in place certainly isn’t hard, and one can move on from that to adjust parameters and develop techniques for different sorts of computer-assisted writing.

Similar Future Assignments

Although this assignment’s general framework can serve in the future, this was an experiment, one intended to bring us into a first classroom encounter with LLMs. In future assignments, it will make sense to acknowledge that students will be aware of, and will have used, ChatGPT and similar systems. Instead of presenting this assignment as a first encounter with a chatbot like ChatGPT, a similar assignment can provide insight into the workings of an LLM specifically—an essential component, but only one component, of such systems. On the one hand, students can investigate how an LLM generates text without the reinforcement learning and other modifications to enforce good behavior that are imposed on corporate bot systems. On the other, they can explore the external parameters of these models to see how changes in temperature and top-k, among other settings, influence output.

Availability of the Assignment

My current syllabi, with assignments — including the syllabus for the Word Made Digital — will always be available at my site, https://nickm.com, for others to read and adapt for their own purposes. A link to the most recent offering will be at https://nickm.com/classes/, while an archive of syllabi for previous classes will also be available.