Post-secondary Entry Writing Placement: A Brief Synopsis of Research

Richard H. Haswell

Texas A&M University, Corpus Christi

November, 2004

In most countries, placement into writing courses does not happen on admission into higher education. Testing of writing proficiency may be part of a qualification system for college, as with Britain's A-Level exams, but that proficiency is construed as acceptability for entrance, not readiness for instruction at some point within a curricular sequence. There is no post-secondary composition curriculum in which to be placed. But in the US, where open admissions stands as an ideal and sometimes as a reality (over 60% of two-year institutions currently endorse it), where millions of students are enrolled each year, and where writing courses usually form some sort of instructional sequence, placement into writing courses is the norm. The most recent survey found over 80% of public institutions using some form of it (Huot, 1994), quite similar to the findings of earlier surveys, which also report an equal portion of private institutions with exams in place (Lederman, Ryzewic, and Ribaudo, 1983; Noreen, 1977).
     Equally the norm, however, is a sense of displacement in many aspects of entry writing placement systems. The most pervasive of the disconnects is that the act of assessment leading to placement usually is standardized and decontextualized, whereas the effects of placement lodge a particular student in the concrete surround of a particular academic class. Traditionally the physical site of placement testing is removed from the college writing classroom and indeed from most normal life, more like an isolation chamber, perhaps a summer-orientation room full of computer monitors with shielded screens, or an eleventh-year schoolroom with proctors hawking students hunched over bubblesheets. This clash between ungrounded assessment and grounded teaching has energized placement, and the research into placement, from the beginning.
     Consider US post-secondary education's most characteristic writing course, freshman English. In 1874 Harvard University added a writing component to their entrance examinations, a short extemporaneous essay rated by teachers. More than half of the students failed, and many had to take "subfreshman" courses or undergo extra-curricular tutoring. Ten years later the outcomes were no better, and Harvard moved its sophomore forensic course to the first year, turning it into a remedial writing course required of everyone who didn't exempt out of it. Yet as placement pressures were shaping early curriculum, teachers were resisting such pressures. Frequently one form of their resistance was to distance themselves from placement by turning it over to someone else. The College Entrance Examination Board began its work in regularizing the certification of applicants in 1900. As documented by a number of fine studies touching upon the history of writing placement, the rest of the century saw testing firms grow ever more influential and departments of English grow ever more divided between using ready-made goods, running their own placement examinations, or foregoing placement altogether (Elliott, 2005; Mizell, 1994; Russell, 2002; Soliday, 2002; Trachsel, 1992; Wechsler, 1977).
     A good deal of the research into writing placement involves issues that inhere in all acts of writing evaluation, such as rater agreement, rater bias, reader response, test equivalency, construct validity, and social, cultural, and political influences. For these assessment issues, good research reviews are available: Ewell, 1991; Hartnett, 1978; Huot, 1990; Ruth and Murphy, 1988; Schriver, 1989; Speck, 1998. Here I focus on three assessment issues especially crucial for studies of placement per se: writer reliability, indirect measurement, and predictability. The first asks how much we can trust a single piece of writing from a person to reflect that person's level of writing as shown by further pieces. The answer, long demonstrated and long ignored in standard placement testing, is not much. In a study whose careful controls can still be admired, Kincaid (1953) found that 58% of Michigan State students significantly changed their score on writing a second essay one day after the first. A fifth of the lowest quartile rose from the bottom with their second essay, and about half of the highest quartile fell from the top. Diederich (1964) found a quarter of students at the University of Chicago changed their grade with a second essay. A number of later studies show that performance variability increases when a second prompt requires writing in a different mode or on a different topic (e.g., Hake, 1986; Quellmalz, Capell, and Chih-ping Chou, 1982). Although there are periodic calls for more test-retest studies, few have been done in the last thirty years. Testing firms are not about to find evidence that they need to pay for more raters to achieve a valid score, nor are colleges that give local tests eager to give more of them (although students allowed to retest their placement decision show a high rate of reversal; Freedman and Robinson, 1982).
     On the other hand, the issue of indirect versus direct testing of writing, of even longer heritage, is still unsettled. Teachers have always wanted students placed into their writing classes on evidence that they lack but can learn the kind of rhetorical skills the course actually covers. Normally the college writing curriculum does not center on mastery of spelling demons, naming of grammatical parts, recognition of learned vocabulary, and other skills measured by item examinations, that is, by indirect testing of writing proficiency. Unfortunately, direct testing, simply asking students to write an essay to show their essay-writing proficiency, runs into problems of rating and therefore of cost. To achieve acceptable scorer reliability, an essay has to be read independently at least by two scorers and sometimes a third when the first two don't agree. Historically the large testing firms, caught between efficiency and credibility, have waffled. For two decades after 1900 the College Entrance Examination Board read essays, but changed to short answer after Hopkins (1921) proved that there was more variability between raters than between essays. Eventually they switched back under pressure from teachers, but during and after WWII many schools (including Harvard) dropped the College Board's essay exams in favor of their Scholastic Aptitude Test, which measured verbal skills with machine-scored items. In 2004, under threat of a boycott of the SAT by the University of California, the College Board decided to add a 25-minute essay, holistically scored. Current holistic methods of scoring essays, which the College Board, ACT, Pearson Educational, and other testing firms simplify to the point it is cost effective, on the surface seem one viable resolution of the clash between teachers and testers.
     But only so long as a third issue of writing placement is kept under wraps, the issue of predictability. How well does holistic scoring or any other method of writing assessment predict the future performance of students in the courses into which it has placed them? As a sorting tool, what is its instructional validity or—to use Smith's useful term (1993)—its adequacy? The answer is that all these methods, direct and indirect, have about the same predictive power, and it is painfully weak. This has been known for a long time. Only six years after the College Entrance Examination Board began operations, Thorndike (1906) began finding very low correlations between standing in the exams and standing in junior and senior classes. In 1927 A. D. Whitman found an average .29 correlation of College Entrance Examination Board essay scores with future college performance in courses. Huddleston (1954) reviewed fifteen studies of indirect tests and McKendy (1992) thirteen more studies of direct tests, all correlating test score with first-year college composition course grade or teacher appraisal of student work, and the range was from random to .4. Correlations can be increased with methods that institutions rarely apply in actual placement—averaging scores on several writing samples or running multiple regressions with variables such as high-school English grades and verbal proficiency scores—but all with little gain in predictability. A review of thirty or so more studies not reported in Huddleston and McKendy supports their finding, that for decades college writing placements have been made on scores that leave unexplained, at best, two thirds of the variance in future course performance, and, on average, nine-tenths of it. Yet supporters of the tests will sometimes put such a positive spin on discouraging outcomes that even the most statistically gullible shouldn't be fooled. Mzumara et al. (1999) write about a machine-scoring placement procedure that they implemented at their institution, "The placement validity coeficients for a sample drawn from Fall of 1998 averaged in the mid-teens, but this finding is still useful for placement purposes" (pp. 3-4). Useful, one supposes, to argue that the procedure should be scrapped, since such coefficients account for no more than 2% of student performance in the courses.
     Converting typical placement statistics to actual head count is a little dismaying. Many students are put in courses too hard for them or, worse, suffer the stigma of assignment to a remedial course when they could pass the regular course. Matzen and Hoyt (2004), for example, rated first-week course writing and calculated 62% of the students at their two-year college had been misplaced by standardized testing. Smith (1993), which goes further than any other study in plumbing and explaining the intricacies of the placement/curriculum interface, determined that 14% of his university's freshmen were being placed too low, and that was with a local placement system far better than most, where the teachers of the target courses read two-hour essays and assigned the writer to an actual course, not to a point on an abstract scale of writing quality. Even when the degree of misplacement may seem to impact very few students, Smith rightly points out that "for the students and for the teachers, 'very few' is too many" (p. 192).
     The placement system Smith designed, where scoring of essays is done by the teachers of the curriculum affected, is only one of several attempts by local writing programs to replace ungrounded national testing procedures with local contextualized ones and then to validate the new outcomes.  Students with marginal scores can be assigned into the mainstream course but with required ancillary work in a writing center (Haswell, Johnson-Shull, and Wyche-Smith, 1994) or a "studio course" (Grego and Thompson, 1995). Or they can be assigned to a two-semester "stretch" course (Glau, 1996). Students can submit a portfolio of previous writing, allowing teacher-raters a better sense of where they should start (Huot, 2002; Willard-Traub et al., 1999). On evidence of essays written early in a course, misplaced students can be reassigned by their teachers to a more appropriate course (Galbato and Markus,1995, report that teachers recommend retrofitting of one third of their students placed into their courses by indirect testing). At the "rising-junior" or halfway point in the undergraduate course of studies, portfolios of previous academic written work can be quickly reduced to a few problematical ones, which are evaluated by groups of expert readers that include a teacher in the student's major (Haswell, 1998). All of the studies cited here include careful validation of placement adequacy, although they do so with a variety of methods. Perhaps on ethical grounds, however, none uses the most informative method of validation, which would be to test the predictability of placement decisions by mainstreaming all of the students and conducting follow-up studies comparing putative test placement with actual course performance (for one study using that design, see Hughes and Nelson, 1991, which found 37% of the students whom ASSET scores would have put into basic writing passed the regular course).
     Mainstreaming, moreover, raises volatile issues connected with minority status, nonstandard dialect, bilingualism, second-language acquisition, cultural and academic assimilation, as well as the writing curriculum itself. Typically a much higher portion of non-majority students than is represented in the student body end up in basic writing courses. Does the placement apparatus serve a classist system of controlled access to higher education and of oppressive tracking within it? The evidence is mixed. Soliday and Gleason (1997) found that most students who would have been barred from general-education courses because of their poor performance on a placement writing test performed passably if allowed to take the courses, and Shor (2000) and Adams (1993) describe students who avoided their placement in basic writing and passed the mainstream course. On the other hand, retention studies have found that students who take basic writing are more likely to stay in college and graduate (Baker and Jolly, 1999; White, 1995). These questions about tracking necessarily involve issues of class, ethnicity, and language preference, and take us back to commercial testing of writing proficiency, on which minority students, first-language speakers or not, tend to perform worse than majority speakers, especially on indirect tests (e.g., Larose et al, 1998; Pennock-Román, 2002; Saunders, 2000; White and Thomas, 1981).
     Commercial firms excuse the poor predictability of their tests, direct or indirect, by arguing that academic performance is a "very fallible criterion" (Ward, Kline, and Flaugher, 1986). Yet from the beginning theorists have responded by pointing out that the tests themselves are fallible since they measure performance not potential, are "examinations of past work, rather than of the power for subsequent work" (Arthur T. Hadley in 1900, cited by Wechsler, 1977). As Williamson put it nearly a century later, teachers reading student writing truly for placement "do not judge students' texts; they infer the teachability of the students in their courses on the basis of the texts they read" (1993, p. 19). In Haswell's terms (1991), diagnosis, good placement practice, looks through a placement essay in order to predict the student's future performance, whereas pseudodiagnosis, poor placement practice, pretends to do that while actually only ranking the essay in comparison with other essays (pp. 334-335).
     Today the situation of college entry-level writing placement could be called schizophrenic with some justice. On the one hand, for reasons of expediency many open-admission schools in the USA, Canada, and Australia are now placing students with scores produced by computer analysis of essays (e.g., ETS's E-rater, ACT's WritePlacer, the College Board's e-Write), even though the scores correlate highly with human holistic rates and therefore inherit the same weak predictability (Ericsson and Haswell, forthcoming). On the other hand, studies of local evaluation of placement essays show the process too complex to be reduced to directly transportable schemes, with teacher-raters unconsciously applying elaborate networks of dozens of criteria (Broad, 2003), using fluid, overlapping categorizations (Haswell, 1998), and grounding decisions on singular curricular experience (Barritt, Stock, and Clark, 1986). Perhaps in reaction to such dilemmas, some institutions are returning to a venerable though largely unvalidated method of placement, informed or directed self-placement, in which students decide their own curricular fate based upon information and advice provided by the program and upon their own sense of self-efficacy (Blakesley, 2002; Royer and Gilles (Eds.), 2003; Schendel and O'Neill, 1999) For an overview of current placement practices, see Haswell, 2005.
     All in all, the most solid piece of knowledge we have from writing-placement research is that systems of placement are not very commensurable. Winters (1978) tested the predictive validity of four measures of student essays—general-impression rate, the Diederich expository scale, Smith's Certificate of Secondary Education analytic scale, and t-unit analysis—and found that they would have placed students quite differently (on the first three, students judged by their teachers as low performing did better than those judged as high performing). Olson and Martin (1980) found that 1,002 (61%) of their entering students would be placed differently by indirect proficiency testing than by teacher rating of an essay, and 1,051 (64%) would be placed differently by that teacher assessment than by their own self-assessment. Meyer (1982) reports that faculty who read take-home essays placed 44% of the students in a lower course than the one into which they would have been placed by an indirect verbal-skills examination. Shell, Murphy, and Bruning (1986) found that a holistic evaluation of a 20-minute essay correlated with three measures of self-efficacy, the students' "confidence in being able to successfully communicate what they wanted to say," at .32, .17, and .13. Much can be made of this often confirmed finding of incommensurability, but one conclusion seems hard to gainsay. Educators who wish to measure writing promise, through whatever the system of placement, should implement multiple measures and validate with multiple measures.

Works Cited

Adams, Peter Dow. 1993. Basic writing reconsidered. Journal of Basic Writing, 12.1, 22-36.

Ewell, Peter T. 1991. To capture the ineffable: New forms of assessment in higher education. In Janet Bixby; Gerald Grant (Eds.), Review of research in education (No. 17) (75-126). Washington, D. C.: American Educational Research Association.

Baker, Tracey; Peggy Jolly. 1999. The "hard evidence": Documenting the effectiveness of a basic writing program. Journal of Basic Writing, 18.1, 27-39.

Barritt, Loren; Patricia T. Stock; Francelia Clark. 1986. Researching practice: Evaluating assessment essays. College Composition and Communication, 37.3, 315-327.

Blakesley, David. 2002. Directed self-placement in the university. WPA: Writing Program Administration, 25.2, 9-39.

Broad, Bob. 2003. What we really value: Beyond rubrics in teaching and assessing writing. Logan, UT; Utah State University Press.

Diederich, Paul B. 1964. Problems and possibilities of research in the teaching of written composition. In David H. Russell (Ed.), Research design and the teaching of English: Proceedings of the San Francisco Conference, 1963 (52-73). Campaign, IL: National Council of Teachers of English.

Elliott, Norbert. 2005. On a scale: A social history of writing assessment in America. New York: Peter Lang.

Ericsson, Patricia; Richard H. Haswell (Eds.). Forthcoming. Machine scoring of student essays: Truth and consequences.

Freedman, Sarah Warshauer; William S. Robinson. 1982. Testing proficiency in writing at San Francisco State University. College Composition and Communication, 33.4, 393-398.

Galbato, Linda; Mimi Markus. 1995. A comparison study of three methods of evaluating student writing ability for student placement in introductory English courses. Journal of Applied Research in the Community College, 2.2, 153-167.

Glau, Gregory R. 1996. The "stretch program": Arizona State University's new model of university-level basic writing instruction. Writing Program Administration, 20.1-2, 79-91.

Grego, Rhonda C.; Nancy S. Thompson. 1995. The writing studio program: Reconfiguring basic writing/freshman composition. Writing Program Administration 19.1-2, 66-79.

Hake, Rosemary. 1986. How do we judge what they write? In Karen L. Greenberg, Harvey S. Wiener, and Richard D. Donovan (Eds.), Writing assessment: Issues aned strategies (153-167). New York: Longman.

Hartnett, Carolyn G. 1978. Measuring writing skills. ERIC Document Reproduction Service, ED 170 014.

Haswell, Richard H. 1991. Gaining ground in college writing: Tales of development and interpretation. Dallas, TX: Southern Methodst University Press.

Haswell, Richard H. 2005. Post-secondary entrance writing placement.

Haswell, Richard H. 1998. Rubrics, prototypes, and exemplars: Categorization and systems of writing placement. Assessing Writing, 5.2, 231-268.

Haswell, Richard H.; Lisa Johnson-Shull; Susan Wyche-Smith. 1994. Shooting Niagara: Making portfolio assessment serve instruction at a state university. WPA: Writing Program Administration, 18.1-2, 44-55.

Hopkins, L. T. 1921. The marking system of the College Entrance Examination Board. Harvard Monographs in Education, Series 1, No. 2.

Huddleston, Edith M. 1954. Measurement of writing ability at the college entrance level: Objective vs. subjective testing techniques. Journal of Experimental Education, 22, 165-213.

Hughes, Ronald Elliott; Carlene H. Nelson. 1991. Placement scores and placement pracctices: An empirical analysis. Community College Review, 19.1, 42-46.

Huot, Brian. 2002. (Re)articulating writing assessment for teaching and learning. Logan, UT; Utah State University Press.

Huot, Brian. 1990. The literature of direct writing assessment: Major concerns and prevailing trends. Review of Educational Research, 60.2, 237-263.

Kincaid, Gerald Lloyd. 1953. Some factors affecting variations in the quality of students' writing [dissertation]. East Lansing, MI: Michegan State University.

Larose, Simon; Donald U. Robertson; Roland Roy; Fredric Legault. 1998. Nonintellectual learning factors as determinants for success in college. Research in Higher Education, 39.3, 275-297.

Lederman, Marie Jean; Susan Remmer Ryzewic; Michael Ribaudo (1983), Assessment and improvement of the academic skills of entering freshmen: A national survey. ERIC Document Reproduction Service, ED 238 973.

Matzen, Richard N.; Jeff E. Hoyt. 2004. Basic writing placement with holistically scored essays: Research evidence. Journal of Developmental Education, 28.1, 2-4, 6, 8, 10, 12, 34.

McKendy, Thomas. 1992. Locally developed writing tests and the validity of holistic scoring. Research in the Teaching of English 26.2, 149-166.

Meyer, Russell J. 1982. Take-home placement tests: A preliminary report. College English, 44.5, 506-510.

Mizell, Linda Kay. 1994. Major shifts in writing assessment for college admission, 1874-1964 [dissertation]. Commerce, TX: East Texas State University.

Mzumara, Howard R.; Mark D. Shermis; Jason M. Averitt. 1999. Predictive validity of the IUPUI web-based placement test scores for course placement at IUPUI: 1998-1999. Indianapolis, IN: Indiana University Purdue University Indianapolis.

Noreen, Robert C. 1977. Placement procedures for freshman composition: A survey. College Composition and Communication 28.2, 141-144.

Olson, Margot A.; Diane Martin. 1980. Assessment of entering student writing skill in the community college. ERIC Document Reproduction Service, ED 235 845.

Quellmalz, Edys S.; Frank J. Capell; Chih-ping Chou. 1982. Effects of discourse and response mode on the measurement of writing competence. Journal of Educational Measurement, 19.4, 241-258.

Pennock-Román, Maria. 2002. Relative effects of English proficiency on general admissions tests versus subject tests. Research in Higher Education, 43.5, 601-623.

Royer, Daniel J.; Roger Gilles (Eds.). 2003. Directed self-placement: Principles and practices. Cresskill, NJ: Hampton Press.

Russell, David R. 2002. Writing in the academic disciplines: A curricular history. 2nd ed. Carbondale, IL: Southern Illinois University Press.

Ruth, Leo; Sandra Murphy. 1988. Designing writing tasks for the assessment of writing. Norwood, NJ: Ablex.

Saunders, Pearl I. 2000. Meeting the needs of entering students through appropriate placement in entry-level writing courses. ERIC Document Reproduction Service, ED 447 505.

Schendel, Ellen; Peggy O'Neill. 1999. Exploring the theories and consequences of self-placement through ethical inquiry. Assessing Writing 6.2, 199-227.

Schriver, Karen A. 1989. Evaluating text quality: The continuum from text-based to reader-focused methods. IEEE Transactions on Professional Communication, 32.4, 238-255.

Shor, Ira. 2000. Illegal illiteracy. Journal of Basic Writing 19.1, 100-112.

Shell, Duane F.; Carolyn Colvin Murphy; Roger Bruning. 1986. Self-efficacy and outcome expectancy: Motivational aspects of reading and writing performance. ERIC Document Reproduction Service, ED 278 969.

Smith, William L. 1993. Assessing the reliability and adequacy of using holistic scoring of essays as a college composition placement technique. In Williamson, Michael M.; Brian Huot (Eds.), Validating holistic scoring for writing assessment: Theoretical and empirical foundations (142-205). Cresskill, NJ: Hampton Press.

Soliday, Mary; Barbara Gleason. 1997. From remediation to enrichment: Evaluating a mainstreaming project. Journal of Basic Writing, 16.1, 64-79.

Soliday, Mary. 2002. The politics of remediation: Institutional and student needs in higher education. Pittsburgh, PA: University of Pittsburgh Press.

Speck, Bruce W. 1998. Grading student writing: An annotated bibliography. Westport, CT: Greenwood Press.

Thorndike, E. L. 1906. The future of the College Entrance Examination Board. Educational Review 31 (May), 470-593.

Trachsel, Mary. 1992. Institutionalizing literacy: The historical role of college entrance exams in English. Carbondale, IL: Southern Illinois University Press.

Ward, William C.; Roberta G. Kline; Jan Flaugher. 1986. College Board computerized placement tests: validation of an adaptive test of basic skills. ERIC Document Reproduction Service, ED 278 677.

Wechsler, Harold. 1977. The qualified student: A history of selective college admissions in America. New York: John Wiley.

White, Edward M. 1995. The importance of placement and basic studies. Journal of Basic Writing 14.2, 75-84.

White, Edward M.; Leon L. Thomas. 1981. Racial  minorities and writing skills assessment in the California State University and Colleges. College English, 43.3, 276-283.

Whitman, A. D. 1927. The selective value of the examinations of the College Entrance Examination Board. School and Society, 25 (April 30), 524-525.

Willard-Traub, Margaret; Emily Decker; Rebecca Reed; Jerome Johnston. 1999. The development of large-scale portfolio placement assessment at the University of Michigan: 1992-1998. Assessing Writing, 6.1, 41-84.

Williamson, Michael M. 1993. An introduction to holistic scoring: The social, historical and theoretical context for writing assessment." In Williamson, Michael M.; Brian Huot (Eds.), Validating holistic scoring for writing assessment: Theoretical and empirical foundations (1-44). Cresskill, NJ: Hampton Press.

Winters, Lynn. 1978. The effects of differing response criteria on the assessment of writing competence. ERIC Document Reproduction Service, ED 212 659.