The Term Paper Turing Test

“Cheating” for AI Literacy

Paul Fyfe
North Carolina State University

This assignment asks students to use an accessible language model to write their term papers—with the goal of fooling the instructor. While initially framed as something sneaky or as a shortcut for writing, the assignment makes students confront and then reflect upon the unexpected difficulties, ethical dimensions, and collaborative possibilities of computationally-assisted writing. It can use any web-based text-generating platform, be adapted to various courses, and does not require significant technical knowledge.


Learning Goals: 

  • Explore and articulate perspectives on a variety of topics, using AI composition tools
  • Confront the unexpected difficulties, multifaceted ethical dimensions, and collaborative possibilities of computationally-assisted writing
  • Prompt reflection on ethical uses of computationally-assisted writing

Original Assignment Context: end of first-year honors seminar course

Materials Needed: an accessible AI text generation program (i.e. ChatGPT)

Time Frame: ~3-4 weeks


Introduction

For the past few semesters, I’ve given students assignments to “cheat” on their final papers with text-generating software. Styled the “Term Paper Turing Test,” this assignment asks students to use a freely-available language model in writing their final papers for an introductory class on data and society. While many students are surprised by this invitation, even suspecting it will be easy, the majority learn as much about the limits of these technologies as their seemingly revolutionary potential. I initially created this assignment in Fall 2020 and have offered an updated version each year since. The course, HON 202, is a seminar for first years in the honors program, enrolling 20 students from across the university. Instructors design the themes; mine, “Data and the Human,” introduces students to issues in data privacy and surveillance, data manipulation and analysis, and machine learning. Whenever possible, the course offers hands-on activities to balance assigned readings on varied aspects of data, with the overarching goal of developing “critical data literacy” and “AI literacy” which, I believe, should form a prerequisite for any undergraduate education. Thus, while my course sustained a topical focus on AI, this activity might be adapted elsewhere to encourage such reflection on AI’s potential impact in other courses.

The final paper aims to help students develop AI literacy, but in a sneaky way: by encouraging them to “cheat.” The assignment asks for an essay whose length can vary, but requires introductory sections written with AI, a written reflection from students alone, and an appendix revealing the AI’s contributions. It uses any web-based text-generating platform that instructors and students can access without significant technical knowledge. (My classes have previously used GPT-2, GPT-J, GPT-3, LEX, and ChatGPT, though platforms are always changing.) Leading up to the assignment, students prepare with readings and discussion about machine learning, language models, and AI-powered writing. We have class debates about the question of whether/not using this software constitutes cheating, which offers a baseline for students’ subsequent reflections, and also practice with the platform. In their essay, students must include three critical sources from our assigned readings as well as develop their own positions as informed by actually trying the software themselves.

“Cheating,” of course, is just the preliminary framework for what ends up being a wider-ranging inquiry into writing and authorship. We start there because “cheating” has tended to dominate much of the discourse around student work in the GPT era. “Cheating fears swirl” proclaimed a headline, as some schools preemptively blocked the software; ChatGPT “sparked fears among some schools and educators . . . that the program could encourage cheating and plagiarism.” According to one terrified teacher, text-generating AI “may signal the end of writing assignments altogether.” These reports anxiously speculated that now students could press a button and produce essays or completed homework. We know the reality is more complicated, but, as Audrey Watters has claimed, “the fear that students are going to cheat is constitutive of much of education technology.” That fear tends to reflect the interests of policy makers, administrators, and ed tech entrepreneurs rather than students’ experiences. But as we respond to generative AI and develop frameworks for teaching AI literacy, we need to involve our students from the start.

Rather than restrict the use of such AI-powered tools, this assignment invites students to explore and articulate their own perspectives on a variety of topics.  While seeming to offer students a shortcut, the assignment instead makes them confront the unexpected difficulties, multifaceted ethical dimensions, and collaborative possibilities of computationally-assisted writing. It offers guiding questions to prompt students’ reflections, which tend to range further based on what they get interested in: To what degree do such platforms constitute cheating or plagiarism? In what ways are these models effective as writing partners? What expressive or cognitive sacrifices do they demand? What unexpected possibilities might they offer? How do they reposition writers in relation to their work? In what contexts would their use be acceptable or not? What kinds of perspectives or outright biases might language models encode?

As an instructor, it has been fascinating to watch these students experiment and deepen their perspectives. Some come away quite critical of AI, believing more firmly in their own voices as writers. Others become curious and even excited about how to adapt these tools for different goals, speculating about what professional or educational domains they could impact. Many come to a different understanding of what “writing” can encompass, expanding their sense of its intellectual labor. Few students conclude they can or should push a button to write an essay. And no one appreciates that teachers or journalists or admins think they will cheat. All tend to come away with an understanding of the assignment’s rationale, noting the benefits of critically and actively engaging with the technology. As one student wrote in Fall 2022, “I would recommend every person do a writing [experiment] similar to this one before they form any hard beliefs on AI-assisted writing.” 

These assignments have also been a pleasure to grade—an advantage not to be discounted for any teacher—though sometimes their hybridity challenges even my own preconceptions about what counts as student work. To this end, I have emphasized to students the value of their process insights and reflections, and used a simple rubric for assessment (included in the following prompt). While my assignment leaves its guiding questions relatively open, other versions might ask students to produce specific recommendations for their own university, or to key their reflections to some of the emerging frameworks for AI literacy. With its emphasis on experiment and reflection, rather than on specific subject material, versions of this assignment could be adapted to a variety of courses. 


The Assignment

Instructions

In the third module of our course, we’ve considered “artificial intelligence” from several different angles: how it gets represented, what really drives machine learning, its presumptions and biases, the ethics of using AI, and so on. As with previous assignments, the final lab report asks you to try a hands-on experiment pertaining to the course module, then to reflect on the experiment in a paper. But this one is a little different, in that you will use AI to write the paper itself

Well, kind of. We will use a text-generating language model called GPT-3, which has been developed by the company OpenAI. This version of the GPT (generative pre-trained transformer) was released in 2020 as an update of earlier software. We will access GPT-3 by using an online writing platform called Lex that already has it installed. Lex looks like Google Docs and lets you trigger when GPT-3 should suggest further text. It works by reviewing what you currently have, or treating the existing text as a “prompt,” and then predicting what probably comes next, based on its training data / language model. Note that GPT-3 does not generate entire papers. Rather, it will produce sentences and paragraphs which you will probably find to be variously useful, strange, confusing, nonsensical, and provocative. Your paper will integrate these outputs into its own prose. 

You do not need to do additional research beyond the articles assigned on the syllabus, though you are welcome to bring in additional sources. You may also want to refresh your memory about GPT-3’s training data whose significance we discussed in class. Further details can be found here https://en.wikipedia.org/wiki/GPT-3 and in its footnoted references.

Your lab report will look much like the previous reports, except for the order of the sections and the presence of AI-generated text. Basically, you will try to generate content for your paper using GPT-3 and integrate that content as seamlessly as you can throughout the first three sections of the paper. You will likely have to experiment with different prompts to create usable output. And, from that output, you can select words, phrases, sentences, or paragraphs to use in any way you wish. The three sections with AI do not have to be entirely GPT-3’s content. Integrate with your own writing as you see fit. Try to use as much GPT-3 output as you think is still convincing. And in those sections, do not indicate what content came from GPT-3. Your goal is to fool your professor into not noticing, i.e. for your paper to pass the “Turing Test.” The analysis section will be your writing alone.

Format

The sections should include:

Materials (with AI) – explain what tool you are using (GPT-3), how it works, and where it gets its training data.

Methods (with AI) – explain your approach to using GPT-3 and Lex, what experiments you tried, some of the prompts you used to generate text, etc.

Discussion (with AI) – relate your experiment to sources and discussions from our course’s third module. Use your notes to refresh your memory and draw key quotes from the relevant critical discussions. Include at least three of these references, cite their work, and engage their ideas. 

Analysis (without AI) – reflect on the experience of using AI in your own paper. How easy or not was it to write this way? What worked or didn’t? How did the AI-generated content resemble your own? How did it affect what you might have thought about or written? Do you feel like you “cheated”? To what degree is this paper “your” writing? Do you expect a reader would notice GPT-3’s text versus your own? Would you use this tool again, and in what circumstances? And, ultimately, what ideas about writing, AI, or humanness did the experiment test or change? 

Appendix – include a “revealed” version of your first three sections with the GPT-3 contributions highlighted. 

Submission

The assignment should aim for 1500+ words (not counting the Appendix). If you’re referencing texts from our syllabus, there’s no need to include a separate works cited page, but please cite them parenthetically within the text of your report. As with other reports, follow the format for the lab report above. I can accept Word files, PDFs, Google Docs, whatever. Please format them double spaced with 1” margins. Email me the finished product as an attachment or a link. 

I welcome submissions any time before [DATE]. And earlier is even better! Please note that extensions are harder to manage at the end of the semester. We can still be flexible, but if you anticipate challenges getting this done, let’s talk about it in advance!

Evaluation

As a required assignment for this course, your report will be evaluated on a points threshold (8 course points). I may encourage you to revise and resubmit if it still needs work. A good report will meet the following expectations:

Completeness: it executes all the steps of the assignment, uses GPT-3 generated output in the paper, and includes the five required sections of the report. (2 points)

Evidence: it engages at least three scholars or discussions from the recent course module. The analysis references specific prompts or GPT-3 generated text. (2 points)

Significance: the report uses the exercise to speculate thoughtfully about its significance and connections to the course. (3 points)

Length: The report is at least 1500 words and long enough to accomplish the above goals. (1 point)