Using LLMs as Peer Reviewers for Revising Essays

Antonio Byrd
University of Missouri-Kansas City

In this assignment, undergraduates use large language models (LLMs) to assist in revising their essay drafts by asking LLMs to respond to common peer review prompts. Students learn prompt engineering and develop rhetorical judgments on the effectiveness of LLMs’ language analysis to heighten their revision processes. This assignment can be adapted to most disciplines and course levels.

Learning Goals:

Explain the ethical implications of using LLMs to generate content for writers
Create effective peer review prompts in LLMs to revise essays
Implement the most useful revision strategies from human and machine readers
Evaluate how LLMs support individual writing processes
Construct ethical frameworks for when and how to use LLMs in the writing process

Materials Needed:

Large language model with linguistic analysis feature such as GPT-3.5 or GPT-4
Essay from previous assignment

Original Assignment Context: end of final unit in advanced expository course on literacy studies and technology

Timeframe: ~2 weeks

Introduction

The popularity of large language models (LLMs) reinvigorates suspicions of original authorship and concerns students may not achieve the outcomes of writing classrooms: developing rhetorical knowledge, critical thinking, reading, and composing, and writing processes. The unit for “Using LLMs as Peer Reviewers for Revising Essays” has students explore how LLMs reshape our conceptual understandings of plagiarism, copyright law, and remix culture through hands-on work. The assignment itself then directs students to think about how using LLMs evolves their revision processes as text generation technologies become evermore included in our repertoire of literacy practices. By the end of the project, students will have problematized LLMs as tools for generating content and will have reframed LLMs as potential assistants that can be ethically integrated into their writing process. Considering the flaws of LLMs, such as their hallucination (describing untrue facts in a tone of certainty) and using White Standard English as its base variety of English, students will also have learned to judge LLMs’s as useful peer reviewers.

The assignment below is a revision of a different version of this assignment (more on this version described below) I first developed in fall 2021 for a 16-week online asynchronous course called English 305WI: Theory of Composition for junior and senior undergraduates. Theory of Composition explores the nature of writing through a literacy studies perspective. Across three expository writing projects, students explore two ideas: first, writing is a type of knowledge and practice that shapes how we perceive and interact with others. Second, writing changes as technology evolves, and that presents new challenges to how we write and live. Issues in writing and technology prompt our discussing the implications for ethics and social justice.

Instructors may assign a conventional essay project that explores concepts and ideas most germane to the unit. Following typical writing processes, students construct a rough draft for peer review from classmates and the instructor. In this structured peer review, the student writer poses a variety of questions and concerns about their draft that their classmates and instructor then direct their feedback toward. Students then discuss issues related to originality, writing, and technology after the completed peer review. These discussions are the focus for another unit in my class and we spend three weeks before using LLMs to revise the essay. Other instructors need not devote so much time to this topic, especially if they have other concepts and ideas they need to teach in their course and they are more interested in using LLMs as peer reviewers. Instead of reading and discussing scholarly research on copyright law, privacy concerns about TurnItIn, and remix culture for two weeks, instructors may focus on multiple case studies about the ethical considerations of using text generated technologies. They cover a range of topics including, reader manipulation , racism, authorial intent, assistance with processing grief, and how professional writers in marketing and journalism use technologies like Jarvis.AI to make them more efficient writers. The case studies prime students for thinking about how LLMs have multiple problems when they are used to create content for writers and no other purpose. The goal is not to impose instructors’ views on using LLMs but rather to show students discourse about these technologies from different perspectives. Discussion posts ask students to conduct critical analysis of these technologies and make an informed decision on how they think these technologies may or may not fit in their range of literacy practices.

What follows then is a new peer review session, this time with the language analysis of LLMs. Students revise their essay based on comments from both humans and machines; after completing the assignment, they may reflect on how well humans and machines assisted in their revision process.

Finally, the original version of this revised assignment had students use GPT-3’s text completion application. At the time in 2021, writers could prompt GPT-3 with a sentence and then GPT-3 would generate several sentences connected to that original prompt. During revision, students input one to three sentences from their essay into GPT-3’s text completion application as if they were writers simply stumped on how to proceed with their writing and could use a little assistance. This text generator continued the conversation related to literacy experiences. Students could revise, edit, and/or delete this AI-generated text and add it to their second draft. They could also use the AI-generated text as inspiration to write something else to further their revision. I directed students to distribute GPT-3’s influence throughout 20% of the essay (300 words out of 1500-words) and bold the language that comes directly from GPT-3 or that GPT-3 had inspired them to write, as some form of citation. Students explain how GPT-3 shapes their revision process in a Statement on Goals and Choices.

The inspiration for this approach originally came from Vauhini Vara’s “Ghosts” (linked below) published in The Believer. In this article Vara uses GPT-3 to generate nine essays on processing the grief of her sister’s passing to cancer. She bolds texts written by herself and leaves the words GPT-3 generated to continue the narrative unbold. The ninth essay is completely bold, suggesting through multiple drafts Vara finds the words to narrate her grief. The word limit I imposed on students meant to balance the responsibilities of the human writer with the assistance of GPT-3, giving students more agency over technology. However, that balance favored the human to process and extend the few words GPT-3 created into a new text.

Of course, LLMs since 2021 have become more advanced than mere text completion. The assignment description below takes into account these technological developments. My discussion in the following section reflects on what happened when I used the original 2021 assignment and then includes my rationale for the proposed revised assignment that instructors may adapt for their own teaching.

Discussion and Future Teaching

I focus on revision for two reasons. First, in the words of Ernest Hemingway, “all writing is rewriting.” Revision is a commonplace in teaching writing and the practice of writing. Second, students using text generation technology to create language for them concerns writing instructors. The original assignment explored how GPT-3 aids in revising a human-produced draft. But my use for this assignment was to extend students’ theories of writing. I frame for students the potential benefit of GPT-3’s predictive text generation in two ways. Not only does it make “recommendations” to the writer on how to proceed with revision, GPT-3’s generated sentences may attempt to make philosophical statements about the nature of writing, which would offer students new ways to think about their literacy archives and writing itself.

Students understood the benefits of using text generation technologies for writing, while being surprised that standards for plagiarism differ in public discourse. However, they ultimately pushed back against its use, holding to conventional perspectives on authorship, originality, and citation they learned from schooling. Deploying GPT-3 had one flaw, not with the technology itself but with the parameters placed on the human writer. Students needed substantive feedback on their writing to produce significant revision. Their revisions with the assistance of GPT-3’s text generation seemed cosmetic or minor, especially for students who did well on their first draft. Thus, the 300 word limit I imposed on students underutilized the strengths of GPT-3.

An assignment revision would leverage the full capabilities of LLMs like ChatGPT. Students write full drafts in Unit 1, and then use a combination of human rhetorical processes and the AI’s sophisticated language analysis to peer review their work. For example, students copy and paste portions of their draft – especially the ones they find most troubling or even passages they find particularly effective – into the LLM. Students would then experiment with prompt commands to analyze the text for tone, syntax, word choice, and other linguistic features that impact the rhetorical meaning of the draft. They may also command LLMs to clean up the language for clarity. The writer still bears responsibility for the text generated; they would make judgments on the analysis from the LLM, focusing on the accuracy of information and how well the model matches students’ authorial intent or aligns with their values. Thus learning goals expand the conventional outcomes of writing instruction: learn effective prompt engineering and rhetorical judgment of LLMs to heighten revision processes. Thus, instructors would need time to learn with students how to write effective prompt commands.

This unit may include additional conversations on text generation technologies’ influences on the social, environmental, political, and financial spheres of life. For example, discussion and analysis on how the commercialization and monetization of these technologies necessitates the labor of global marginalized communities, how paid subscriptions to access LLMs continue legacies of digital inequality, how LLMs’ powerful hardware contributes to climate change, and how these technologies handle data privacy. These concerns shift the conversation from the ethics of using LLM for writing to the morality of using LLM AI for writing.

The Assignment

Task

Revise your essay using a large language model (LLM) such as ChatGPT. First, revise your essay in response to peer review comments from myself and your classmates.

After you have completed revisions, write a variety of peer review prompts that prompt the LLM to analyze or extract information from your essay’s paragraph. For example, you may ask the same questions you posed to your human readers during the first round of peer review to the LLM. Other prompts may include more general questions related to your thesis, introduction, organization of body paragraphs, evidence and analysis, citation, and conclusion. Here are sample questions, some from the University of Colorado-Denver’s Writing Center, others my own construction.

Does the introduction provide enough context on the paper’s topic?
How can the thesis be more specific and complex?
Is every piece of evidence followed by analysis in the following paragraph?
How do the ideas in the paper progress?
How can the conclusion restate the thesis in a more complex way?
Describe the tone of the following paragraph.
Compare the tone of the first paragraph with the third paragraph.

Cast careful judgment on the responses from the LLM, as the analysis may include misinformation or show that the LLM did not understand the intent of your prompt command. Revise and edit your essay based on the analysis you receive. You may include the text generated by the LLM in your essay but you must use proper citation style.

Include the chat history with the LLM with your revised essay. Finally, in a Statement of Goals and Choices (SOGC), reflect on how your interaction with the LLM shaped your revision process. How did the effectiveness of peer review from your human peers and instructor compare?

Purpose

The purpose this time has expanded. First, the original purpose from Unit 1 still stands: [Explain the original purpose of the essay assignment]. Your second purpose is to experiment with how writing technology more sophisticated than autocorrect and auto completion can help you write more efficiently and clearly according to your stated audience, purpose, and goals. To meet these ends, this assignment has you start developing your competency in prompt engineering or prompt design with LLMs.

Audience

[Audiences may include scholars, instructors, administrators, peers, or other readers essential to the original essay assignment]. You may have other readers in mind to help you solidify your interest in this writing project, such as family members, friends, other teachers, or maybe people you know in the community that helps them understand your argument and lived experiences and why they matter.

Genre and Format

For your essay, Times New Roman, size 12 font, double-spaced, MLA or APA format, if you include outside sources. Your essay can be 1,200 - 1500 words. This word count is a guideline or meant to give you an idea of what the scope of the project may be. Do not beat yourself up for not writing to the minimum. Quality over quantity! :)

When do I submit my rough draft and do peer review?

Submit revised draft with chat history and SOGC by [Turn in Date and Time]

Assigned Texts

Hutson, M. (2021, March 3). Robo-writers: the rise and risks of language-generating AI. Nature. https://www.nature.com/articles/d41586-021-00530-0
Kirby Ferguson. (2021, September 7) Everything is a remix: Part 1. [Video] YouTube. https://www.youtube.com/watch?v=MZ2GuvUWaP8&ab_channel=KirbyFerguson
Knight, W. (2019 , February 14). An AI that writes convincing prose risks mass-producing fake news. MIT Technology Review. https://www.technologyreview.com/2019/02/14/137426/an-ai-tool-auto-generates-fake-news-bogus-tweets-and-plenty-of-gibberish/
Teardgarden, A. (2019). Stories of plagiarism / theories of writing: How public cases of plagiarism reveal circulating theories of writing. Kairos: Rhetoric, Technology, and Pedagogy. 24(1) https://kairos.technorhetoric.net/24.1/topoi/teagarden/index.html
Morris, S.M. and Stommel, J. (2017, June 15). A guide for resisting edtech: The case against Turnitin. Hybrid Pedagogy. https://hybridpedagogy.org/resisting-edtech/
Puiu, T. (2020, October 14). The stunning GPT-3 AI is a better writer than most humans. ZME Science. https://www.zmescience.com/science/gpt-3-better-than-you-043252/
Schwarz, O. (2019, November 2019). In 2016, Microsoft’s racist chatbot revealed the Dangers of online conversation. IEEE Spectrum.https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-revealed-the-dangers-of-online-conversation
Vauhini, V. (2021, August 9). Ghosts. The Believer. https://www.thebeliever.net/ghosts/

Supplemental Text

Calamity AI (2020, November 12). A.I. written essay | peer-reviewed. [Video]. YouTube. https://www.youtube.com/watch?v=Lto6exrpChQ&ab_channel=CalamityAI

Instructional Videos

Byrd, A. (2021 November 7). Unit 3 intro and week 12 module [Lecture Recording]. Canvas. https://umsystem.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=c97e1631-fda0-425d-9e71-afd201044336
Byrd, A. (2021, November 14). Why do we use automated writing? [Lecture Recording]. Canvas. https://umsystem.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=fc1ccb57-0261-4eec-962c-afd2011994a9

Acknowledgements

My thanks to Tim Laquintano for his thoughtful recommended revisions, and to the rest of the editorial team, Annette Vee and Carly Schnitzler. I also appreciate the internal review from John Silvestro. The Atlantic’s Object Lessons series inspired the 2019 version of this assignment before I used GPT-3 in 2021. I’m grateful for the writing instructor who published his Literacy Archive Essay assignment on Wordpress. While I can no longer find the URL, readers can refer to Horror Tree’s “Literary Artifacts: What Are These and How to Use Them in Your Essays” (2022) as a resource. “Student Essay AI Co-Writing Public Demonstration” by Tristan Hanson partly inspired how I would redesign this revision assignment. The article details an informal experiment on AI co-writing run by S. Scott Graham, Casey Boyle, Hannah R. Hopkins, Ian Ferris, Tristan Hanson, Maclain Scott, Emma Allen, Lisa Winningham, and Walker Kohler.