Making the Grade: Assessing Student Learning in Education

Article Image

The Challenge:  The federal No Child Left Behind Act (NCLB) requires that states implement annual tests of grade level achievement in reading and in mathematics in grades 3 through 8 plus one high school grade. While the target of NCLB is the improvement of school systems, some educators are concerned that these achievement tests are missing the improvement that individual students make from grade to grade. Thus, measures of these gains are of high interest. Creating them, however, presents a number of challenges.

Traditional approaches to training evaluation would recommend that students be given a pretest at the beginning of the school year followed by a posttest at the end of the school year. Both tests would focus on the material to be learned that year. With a few exceptions (e.g., Oregon), such a strategy is considered impractical in educational settings. As a result, alternative approaches have grown out of the testing industry, which is more accustomed to viewing education as an essentially continuous process where progress can be captured along a common achievement scale that runs across grades. With these approaches, there is no 'pretest' or 'posttest.' Rather, tests are administered at the end of the school year to locate students along a common vertical scale. However, pretest/posttest approaches are not incompatible with vertical scaling, nor are they impractical to implement using conventional educational testing designs.

What We Did

HumRRO researchers began the creation of a vertical scale for Florida's student testing system in 2000. Each grade's test covers the content standards for that particular grade. The key to constructing a vertical scale is to create special grade-level tests that are augmented with items from adjacent-grade tests. Differences in performance on common items for adjacent grades can then be used to compare the difference in performance at the end of the two grades and construct a vertical scale.

Some testing companies argue against testing students on material that they have not been taught. Instead they recommend that students be administered only items from the grade below during vertical scale construction. However, vertical scale construction, although cross-sectional in nature, can be placed in a training evaluation framework. Specifically, by administering higher grade items to lower grade students one can compare grade-to-grade, end-of-year performance. Such an approach amounts to a pretest/posttest linking design in which students have an opportunity to show gains on content taught between the 'pretest' and 'posttest.' Conversely, administering lower grade items to higher grade students, as traditionally recommended, can be interpreted as a test of recall, forgetting, remediation, or some mix depending on individual student experiences, but not a measure of gain on the material taught at the new grade. For these reasons, HumRRO recommended that samples of students in each grade be administered a subset of the easier items from the grade above.

What We Found

Adopting this approach enabled HumRRO to discover informative trends that otherwise would have been missed using a traditional vertical scaling approach. First, we found that differences between students from one grade to the next vary by item source, particularly for reading. We observed greater differences in performance using items from the upper grade for the linking versus using items from the lower grade for the linking.

Second, the data confirm that students within a grade are not all learning the same material. Remediation is taking place for some students while others are receiving enriched curriculums. Thus, there is a tendency for lower performing students to show more grade-to-grade differences on items from the lower grade - essentially they are learning the prior grade material. Conversely, we observed greater differences between grades for higher performing students on the items from the upper grade.

Finally, vertical scale construction data from two additional states show considerably more variability in student achievement within a grade than across grades. Students in the top and bottom thirds for a grade are more similar to students in the top and bottom thirds of adjacent grades than students in the middle of their own grade. This calls into question what 'grade' and 'grade level' performance actually means.

The implications of these findings for testing and for instruction are certainly challenging under political pressures to treat students within grades as being all alike. Students, on the other hand, learn at different rates and at different times. Our findings illustrate that educational evaluation methods, as well as our students, would be better served by acknowledging these differences.

For more information, contact
www.HumRRO.org
RESEARCH NOTESare intended to update our friends and colleagues about recent work performed by the Human Resources Research Organization (HumRRO).
 
Established in 1951, HumRRO is an independent, nonprofit corporation dedicated to the development and application of  state-of-the-art scientific principles and technologies to solve the real-world challenges facing private and public sector organizations and educational institutions. Our professional staff is composed of psychologists with diverse expertise in strategic human resource management, personnel selection and classification, performance assessment and compensation, training and instructional design, educational research and evaluation, survey design and analysis, credentialing, and program and policy analysis. Our client base includes the military, government agencies, private industry, and professional associations. We are proud of the reputation we have for providing responsive, high quality, and cost-effective services.
 
Further information regarding HumRRO can be obtained by calling Dr. Bill Strickland, President and CEO, at 703-549-3611.