Timothy Laquintano
Lafayette College
This assignment asks undergraduate students to translate a complex policy document into plain English and then compare their output to the output of a large language model asked to do the same task. Students critically compare the semantic choices and sacrifices they made during the translation with the meaning lost during the machine translation, which attunes them to the risks and benefits of LLM output. It can be adapted to most disciplines and course levels.
Learning Goals:
Original Assignment Context: Mid-level undergraduate professional writing course
Materials Needed: A policy document relevant to course outcomes and content; Instructor access to a large language model (i.e. ChatGPT); The prompt for the LLM (e.g. “Please translate these paragraphs into a seventh-grade reading level”); Students can also use LLMs to create their own translation, depending on their access to LLMs
Time Frame: ~2-3 weeks
Introduction
I have used this assignment in a mid-level undergraduate professional writing course to help students understand the output of large language models. The strength of the assignment, though, is that instructors should find it useful in any discipline and at any level, especially if the instructor has interest in helping students learn to translate complex material for the reading public.
The assignment required me to find a policy paper related to course content. I then asked students to translate a portion of that paper into a seventh-grade reading level. This is the level at which, given contemporary literacy rates in English in the US, a document will be understood by the vast majority of people. The students then compare their translation to a translation completed by an LLM, which is tasked with the same translation.
I had students measure reading level with the Flesch-Kincaid test, which gauges the readability of a text and provides grade level rating and score. The Flesch-Kincaid test is built into Microsoft Word, which makes it relatively accessible. It measures the number of words in the sentences, the number of syllables in each word, and the percent of sentences in the passive voice. If a document has short sentences, uses short words, and is written in the active voice, it will have a higher readability score and a lower grade reading level. (The readability score and the grade level score use the same measures with slightly different weights). A policy document that scores low will work best for this assignment (something like a grade level 14 or a readability score <50). I used a 2016 policy document about artificial intelligence published by the Obama Administration.
After I found an appropriate policy paper, I asked students to translate the executive summary into a seventh-grade or lower reading level. I only asked them to do the executive summary because this translation can be time consuming. My students took four to six hours of work to translate two single-spaced pages into a fifth-grade level. The next time I run the assignment I will require a seventh-grade level for two reasons. First, I found the fifth-grade level too restrictive, and it tended to give students anxiety (some were checking their grade level after every sentence). Second, I prompted the GPT Davinci model to translate the text into a fifth grade level a number of times, but its output never scored lower than seventh grade on the Flesch-Kincaid test. The assignment as written below reflects the change to a seventh-grade level.
The initial translation activity will spark a variety of interesting conversations with students: What kind of leakage in meaning does the translation have? What kind of metaphors need to be invented to help explain complex concepts in a seventh-grade level? What does it say about the power of writing if, in order to reach most Americans, we are restricted to writing at a low grade level? What is it like to write for two audiences (the human audience and the machine audience assessing the grade level)? What is the Flesch-Kincaid test actually measuring? And did we feel like it accurately measured readability?
One of the peer reviewers of this assignment asked the helpful question: Can students find their own policy paper to do the translation? This would be possible. However, I found during my discussions with students that we compared the language of the original to their translation in a very minute and fine-grained way. The quality of feedback I was able to give depended somewhat on the fact that I knew the language of the executive summary extremely well because students all used the same policy paper. I was also able to talk to them about how other students solved problems in translation that repeatedly came up during the assignment: many students had similar problems with common sections of the document.
After the students completed their translations, I fed two paragraphs of the policy document into a large language model with the prompt: “Please translate these paragraphs into a fifth-grade reading level.” I had to run the prompt several times, and it never went below the seventh-grade level, although it is possible more powerful models will be able to write at a lower level (e.g., GPT-4).
I provided students with the two original paragraphs from the policy document, the LLM translation, and their translation. I asked them to do a line-by-line comparison of the differences between the document and the two translations. Then I asked them to consider the meaning each translation lost and how their translation choices compared with the machine translation.
I ask students to write a brief report of their main takeaways from the exercise, including their assessment of which translation was more fluid and accurate. Most of them concluded that the machine had bested them, but most of them also concluded that they believed they could beat the machine if this were a kind of writing they did everyday. And, tellingly, most also concluded that if this were a kind of writing they did everyday, they would prefer to have a LLM create the first draft and then work from there.
I’ve learned that students will do better if the instructor provides some tips in advance for student success. The instructor should warn students that they should consider the meaning of an entire paragraph before they begin the translation. They should also not check readability until the end of the paragraph. Students who translate sentence by sentence and check the readability of every sentence after completing it will create a laughably bad translation. (I mean that literally as one student laughed herself to tears at how bad her work was in my office when she translated line by line). Students should also be reminded that they are not translating for a seventh grader; they are translating for someone who reads at a seventh-grade level, which also includes well educated adults who are English language learners. This is a subtle difference that can influence the metaphors students use to explain complex concepts. Finally, if you can find a model of a policy document that already has a very high readability score, this will help them understand what their final output should look like.
Assignment Goals
Materials Needed
Here is the original assignment as I gave it to students with the slight modification mentioned above; the target reading level has been changed to seventh grade.
Part I
Although many of us are under the impression that western countries have achieved near universal literacy rates, the reality is far more complex. People’s literacy can range from the ability to read the densest of professional texts to difficulty with subway signage. In the United States, literacy rates are also complicated by the enormous number of languages that people speak. Thus, if we want to reach as many people as possible with our written message, we cannot simply write with a college-level writing style. We have to aim for something like a seventh-grade reading level. This assignment will be a translation exercise where you take a complex text and translate it to a seventh-grade level as measured by the Flesch-Kincaid readability test. The test primarily measures sentence length and word length, but it also considers the passive voice (it assumes that more passive voice makes the document more difficult to read). There is a reading ease score measured on a scale of 1-100 with higher numbers being easier to read. There is also a grade level score; in this case the lower the number the easier it is to read.
For this assignment, if you use MSWord, it will automatically assess your readability when you grammar and spell check (you might have to change some preferences depending on which version of Word you are using). If you do not have MSWord, there are a number of free Flesch-Kincaid checkers on the web.
We are going to translate a document produced by the Obama administration in 2016 on preparing for the future of artificial intelligence. It is written for an audience of policy makers, meaning that it is not completely overrun with jargon, but it is not written for the general public either.
Your task is to take the executive summary and translate it into a seventh-grade reading level. An executive summary is essentially an abstract for busy people who do not have time to read the entire report. It currently sits at about a 30 on the Kincaid scale, or about a 14th grade reading level. You are going to translate the first two and one-third pages, until the end of the section on fairness, safety, and governance.
This is one assignment where a large language model could help you. However, this week, you MAY NOT use large language models to assist you with this task. Next week, we are going to use a LLM to translate the same summary and we are going to compare our output to the output of the AI and see who did a better job.
Part II
For your first assignment, you translated three pages of a policy document into a seventh-grade reading level. This is actually one thing that large language models might be good at. So for this assignment, you are going to compare: 1) paragraph two and three of the executive summary of the policy document; 2) paragraph two and three of your translation; 3) paragraph two and three of a machine translation I provide you below. I want you to compare each paragraph by paragraph and think about the following questions:
After you rigorously compare the original and two translations line by line, write a 1-2 page single-spaced report that seeks to answer the following questions: How did human translation and machine translation differ? And who, in your opinion, translated the document better?