Summary: A review of the literature pertaining to the effect and influence that interim assessments have on student achievement lacks quantitative data to determine the efficiency of their use in the classroom as a school reform tool. This study examined the strength and the direction of the relationships between interim pre and posttest assessments in language arts and mathematics in Grade 8 and student achievement on the New Jersey Grade 8 state standardized tests in those subjects. Analyses were conducted using simultaneous multiple regression models. All student data explored in this study pertained to 670 students in Grade 8 enrolled in four middle schools located in a suburban/urban central New Jersey community during the 2009-2010 academic school year. The results of the study revealed each school produced a combination of site specific results and the interim pretests accounted for the same or almost the same amount of variance in state test scores as the interim posttests.
Una revisión de la literatura que pertenece al efecto y la influencia que evaluaciones provisionales tienen en faltas de logro de estudiante los datos cuantitativos para determinar la eficiencia de su uso en el aula como una herramienta de reforma de escuela. Este estudio revisó la fuerza y la dirección de las relaciones entre interín pre y evaluaciones de examen posterior en artes de idioma y matemáticas en el Grado 8 y logro de estudiante en el Nueva Jersey Gradúan 8 estado las pruebas estandarizadas en esos sujetos. Analiza fueron realizados utilizando múltiples modelos simultáneos de retroceso. Todos los datos del estudiante exploraron en este estudio pertenecido a 670 estudiantes en el Grado 8 matriculado en cuatro colegios ubicados en una comunidad central suburbano/urbano de Nueva Jersey durante el 2008-2009 año escolar académico. Los resultados del estudio revelaron cada escuela produjo una combinación de sitio resultados específicos y las pruebas preliminares provisionales justificaron el mismo o casi la misma cantidad de variación en puntuaciones de estado como los exámenes posteriores provisionales.
School and district administrators use the results from state assessments and other standardized assessment tools, such as interim assessments, to make decisions about things like revisions to curriculum and instruction programs, teacher quality, overall student achievement, and student placements in special programs and academic ability groups (Tienken, 2008a). Increasingly, school principals are turning to the use of student results from interim assessment products to make predictions about which students might pass their state mandated tests of skills and knowledge and which might need additional interventions to pass the test. The practice of school administrators targeting interventions to specific groups of students based on test results, especially those closest to their state’s proficiency cut-score, is a national issue. Some have called the practice “educational triage” for the “bubble kids” (Booher-Jennings, 2005, p. 1).
School administrators around the nation are not immune to the pressures to raise student test scores in order to retain their jobs. There are multiple proposals for school administrator evaluation in place or under consideration in several states and all of them include the use of student test results as a major indicator for school administrator pay and job security. Accordingly school administrators in many states, including New Jersey, search for interventions to raise student achievement on state mandated tests. In some cases state education bureaucrats make recommendations to school administrators on the interventions they should use.
In a 2008 memo released by the former New Jersey Department of Education (NJDOE) Commissioner of Education, Lucille Davy recommended the use of the Learnia computer-based formative assessment product, vended by Pearson, and claimed that research demonstrated that the formative assessment product was able to predict students who would have trouble on the state test and that the product had a positive impact (cause and effect) on student achievement. The memo did not state whether contextual factors such as student demographics, grade levels, or socio-economics influenced the effectiveness of the product. The memo and marketing for the product made it seem as if context did not matter.
We note that although Pearson and the former NJ Commissioner of Education used the term “formative assessment” to describe the Learnia product, Learnia or other “formative assessment” products on the market do not conform with the empirical definition of formative assessment. In fact, most products on the market are interim assessments. As we define later in the manuscript, there are important differences between the two types of assessments.
As a result of the former NJ Commissioner’s recommendation, school administrators across the state were led to believe that computer-based pretest/posttest interim assessments were an effective intervention to predict student proficiency and raise student achievement regardless of school context and student demographic variables. Many school administrators mandated the use of interim assessments, the Learnia product in particular, in classrooms throughout their districts. During the 2007 through 2011 school years over 250 school districts in New Jersey, representing hundreds of schools, used the Learnia product. Learnia is the most popular formative assessment tool used in New Jersey and one of the most popular tools used nationwide. Millions of dollars have been spent on the product in New Jersey alone.
Nationally, middle school student proficiency percentages on state mandated tests are generally lower than proficiency percentages for elementary school student results. Therefore, there is a heightened sense of urgency on the part of middle school administrators to seek interventions to raise achievement. Our purpose for this study was to evaluate the influence of the Learnia pretest and posttest interim assessments in language arts and mathematics on Grade 8 student scale-scores on state mandated New Jersey Assessment of Skills and Knowledge (NJASK) in Grade 8 language arts and mathematics.
We guided the study with the overarching question: When controlling for other factors found in the extant literature to influence middle school student achievement, how much variance in middle school student achievement on the NJ ASK8 language arts and mathematics sections do the results from pretest and posttest interim assessments in language arts and mathematics explain?
The results from the existing literature suggest that principals and other school administrators use the student results from interim assessments to make decisions about which students receive targeted intervention and which do not. However, there is little empirical literature on the efficacy of interim assessments for such use. The results of this study will provide school administrators empirical data they can use to make decisions about which resources to purchase, if in fact they choose to purchase interim assessment resources, and the efficacy of those resources for decision making purposes. Several of the existing studies on the topic are methodologically flawed, use small samples, and/or use simple pre-experimental designs that are not appropriate for drawing conclusions about efficacy (e.g. Takacs, 2010). Our design and methodology represent improvements over much of the existing literature.
We draw upon production / function theory to guide our understanding of the topic and our analysis. The theory rests on the idea that the quantity of output (Q) is a function (f) of various inputs (X1, X2, X3…). The symbolic representation for the theory is Q=f(X1, X2, X3…). In the case of our study, Q represents the student results on the state mandated LA and math tests as a function of the various student and school inputs that contribute to student achievement. Interim assessments are one such input, among others.
A review of the empirical literature found support, albeit mixed in some cases, for four additional input variables: students’ socioeconomic status (e.g., Coleman, et al., 1966; Koretz, 2009; Sirin, 2005), specific formative assessment strategies (Butler, 1988; Schunk, 1996) student gender, (Else-Quest, et al., 2010; Willingham and Cole, 1997) and teacher advanced degree status (Goldhaber & Brewer, 2002; Michel, 2008). Figure 1 presents the final theoretical framework used to guide this study that includes some variables deemed by the NJDOE bureaucrats and the host district, and some of the variables found in the literature that influence student achievement.
Further analysis of district-specific characteristics of the initial variables included in this framework resulted in the elimination of teacher advanced degree status. We found based on an analysis of personnel records that over 90% of the teachers had a Masters degree in education. Thus, we later eliminated the variable of teacher degree status because at the school level there was almost no variance in the status of the language arts and mathematics teachers for the district.
An additional possible influential variable was identified: Academic Support Instruction (ASI), also known as Basic Skills Instruction. In an effort to specialize instruction based on the student’s individual needs district personnel provide ASI in language arts and mathematics in Grades 6-8. ASI Mathematics and ASI Language Arts courses are conducted during the regular school day and replace student’s mandatory mathematics and language arts classes. We added ASI as a variable because the district administration obviously believed it influenced achievement, otherwise they would not have expended tax dollars on the program. Although the empirical literature suggests mixed results for ASI-type programs nationally, the district administration at this site never conducted an evaluation of the local ASI program. Therefore, we could not know if it did or did not influence achievement and we added it into our models. Because there is ambiguity between the terms formative assessment and interim assessment we reviewed the empirical literature for each term and provide defining characteristics to resolve the ambiguity of terms and to assist the reader in understanding the results of our study.
According to Perie, Marion, and Gong (2007) “formative assessment is a process used by teachers and students during instruction that provides feedback to adjust ongoing teaching and learning to improve students’ achievement of intended instructional outcomes” (p.1). There are specific characteristics of formative assessments that separate them from interim assessments. For example, the frequency of administration and the manner in which the results permit teachers to modify instruction help to determine if the assessment is formative or interim.
State standardized test are considered summative assessments. Summative assessments are generally given one time at the end of a unit or at the end of the school year to evaluate students’ performance against some type of content standards. The results from these assessments do not provide teachers with information that could be integrated into instruction daily.
Asking questions in class is a type of formative assessment that can help teachers monitor and adjust their lessons in “real time” to improve student achievement of the intended instructional goals. The most effective formative assessment occurs frequently and it is structured to provide opportunities for the students to practice the technique of self-evaluation, and reflection in order to monitor and adjust their individual learning (Sadler, 1989; Schunk, 1996). When looking at studies involving web-based formative assessment programs, small (e.g. 0.03) or statistically non-significant effect sizes were reported (Buchanan, 2000; Sly, 1999; Velan, Rakesh, Mark, & Wakefield, 2002; Wang, 2007). One reason is that web-based formative assessment products are often stand alone programs. They are the end, so to speak. They are not integrated into the teaching and learning process. They are simply another activity to accomplish.
When looking at self-reflection and self-assessment as a formative assessment strategy integrated into the teaching and learning processes, statistically significant effect sizes of .73-1.97 have been reported across the Grades 4-12 spectrum (Sadler, 1989; Schunk, 1996; White & Frederiksen, 1998). The type of formative assessment strategy seems to matter greatly in terms of the influence on student achievement.
Perie, Marion, and Gong (2007) contend that formative assessment activities possess the following characteristics:
There has been an increase in use of interim assessments by schools and districts in the United States in an effort to improve student achievement (Goertz, Olah, & Riggan, 2009). Dunn and Mulvenon (2009) delineated the differences between formative and interim assessments:
Formative assessment is defined as assessment used by teachers and students to adjust teaching and learning, as compared to interim assessment that informs policymakers or educators at the classroom, school, or district level about student achievement levels and curriculum effectiveness. (p.5)
Perie, Marion, and Gong (2007) consider interim assessments to be:
Interim assessments (1) evaluate students’ knowledge and skills relative to a specific set of academic goals, typically within a limited time frame, and (2) are designed to inform decisions at both the classroom and beyond the classroom level, such as the school or district level… As such, the timing of the administration is likely to be controlled by the school or district rather than by the teacher, which therefore makes these assessments less instructionally relevant than formative assessments. (pp.1-2)
Goertz, Olah, and Riggan (2009) conducted an exploratory study in which they investigated the use of interim assessments, as well as the policies that support their use in the classroom. Included in their study were 45 elementary school teachers (Grade 3 and 5) selected by a purposive sample of nine schools located in two Pennsylvania school districts. The researchers purposely selected an urban and a suburban district to study in an effort to gather information on how “policy supports for assessment and instructional improvement function in these different environments” (p.2). The results showed “that interim assessments are useful but not sufficient to inform instructional improvements” (Goertz, Olah, & Riggan, 2009, p.8). The authors uncovered minimal evidence indicating that the interim assessments they “studied help teachers develop a deeper understanding of student’s mathematical learning—a precursor to instructional improvements. Most items in the assessments did not provide actionable information on students’ misunderstandings” (Goertz, Olah, & Riggan, 2009, p.8).
According to Perie, Marion, and Gong (2007) the best commercially prepared interim assessment programs can:
The purpose and intended uses of an assessment determine whether it is classified as a formative assessment or an interim assessment. Far from being definitive, the literature on the effectiveness of formative and interim assessments to raise student achievement is mixed. The results in the literature suggest that specific strategies of formative assessment might be helpful, but the literature provides less certain guidance for the use of interim assessment and calls into question any large-scale use of the strategy without further evaluation of its efficacy with children.

Figure 1. Production / Function Theoretical Framework
“Non-experimental research is frequently an important and appropriate mode of research in education” (Johnson, 2001, p.3) due largely in part to the inability of education researchers to perform randomized experiments and quasi-experiments on children. We used a non-experimental, cross-sectional, explanatory design. An explanatory study must meet the following criteria: (a) The researchers attempted to develop or test a theory about a phenomenon to explain “how” and “why” it operates; (b) The researchers attempted to explain how the phenomenon operates by identifying the possible factors that influence change in it (Johnson, 2001).
Neither cause nor trends in performance can be determined. The data for this study were collected from five middle schools that serve students who lived in home and neighborhood environments that exhibited lower middle class to upper middle class wealth characteristics. Generalizations should not be made to schools that serve less wealthy or wealthier populations than those served by the schools in the study.
We analyzed student level data by school building and not aggregated to the district level due to the differences in student demographic and achievement among the schools. The dependent variables were the Grade 8 student scale scores from the language arts and mathematics sections. The independent variables were student attendance, gender, eligibility for free or reduced lunch, eligibility for the ASI basic skills remediation program, pretest interim assessment scores in language arts and mathematics, and posttest interim assessment scores in language arts and mathematics.
All data explored in this study pertained to students in Grade 8 enrolled in four middle schools located in a ethnically and economically diverse suburban/urban central New Jersey community during the 2009-2010 academic school year. The final student sample population for this study was 670 Grade 8 students. Data from students who met the following criteria were included in the study (a) general education program; (b) received a valid score on both sections of the 2009 NJ ASK language arts and mathematics tests; (c) received a valid score on both sections of the Learnia pretest and posttest assessments in language arts and mathematics during the 2008-2009 school year; and (d) do not qualify for the district’s English language learners (ELL) program.
Table 1
Demographic Characteristics of Grade 8 Students by School 1 Reported in Number and Percentage of Population

Concerning sample size for regression, Field (2009) references Green (1991) for establishing a minimum acceptable sample size. Field wrote,
“…if you want to test the model overall, then he [Green] recommends a minimum sample size of 50 + 8k, where k is the number of predictors. So, with five predictors, you’d need a sample size of 50 + 40 = 90. If you want to test the individual predictors then he suggests a minimum sample size of 104 + k, so again taking the example of 5 predictors you’d need a sample size of 104 + 5 = 109” (Field, 2009, p.222).
We needed a minimum of 112 cases to meet Field’s (2009) and Green’s (1991) threshold for sample size to ensure proper power to test the full model. We exceeded the minimum number of cases required for each model. We needed a minimum of 214 cases at each site to ensure proper power to test individual predictors. We exceeded the minimum requirement in three of the four schools and failed to meet the minimum requirement by 14 cases in School C.
Instrumentation for the study consisted of the NJ ASK in Grade 8 and the computer-based interim assessment product Learnia. “The New Jersey Department of Education is required by federal law to ensure that the instruments it uses to measure student achievement for school accountability provide reliable results” (NJDOE, 2009, p. 116). Because high-stakes decisions are made often using solely a student’s test scores (Tienken, 2008a, b), it is important to discuss the standard error of measurement (SEM) associated with these assessments. When considering the conditional SEM, Tienken (2008b, citing Harville, 1991) summarized that it is “an estimate of the amount of error or lack of precision one must consider when interpreting a test score” (p.37). He stated that “the SEM describes how far the reported results may differ from a student’s true score” (p.37). To clarify on the issues of reliability and SEM, the NJDOE (2009) published in the following in the NJ ASK Technical Report:
Although the conceptualization of reliability and SEM is relatively straightforward, issues underlying the estimation of reliability are not. Reliability can be estimated via the correlation of scores on parallel forms or from test-retest data, or it can be estimated from a single test administration using any one of a variety of techniques (e.g., Cronbach, 1951; Kuder & Richardson, 1937). A very popular technique for estimating reliability from a single test administration is Cronbach’s coefficient alpha (p.117).
The Cronbach’s alpha for full-test reliability of the language arts and mathematics sections exceeds .80 and the conditional SEM is approximately 10 scale score points at the proficiency cut-score for individual scores.
As of 2010, Learnia was available to school districts in California, New Jersey, and Texas. Learnia is marketed as a web-based “formative” assessment tool produced by Pearson. Learnia is designed to help diagnose the academic strengths and weaknesses of individual students in language arts and mathematics.. Assessments delivered by Learnia align with the New Jersey Core Curriculum Content Standards (NJCCCS) as well as with the criteria set forth by Measurement Incorporated (MI), the corporate vendor that develops NJ ASK state mandated standardized test. Reliability data for Learnia were not available.
Student-level NJ ASK test scores were collected from the district database. The free and reduced lunch report, attendance records, and ASI class rosters were provided by the District’s Data Analyst. Learnia pre and post assessment results were collected via the computerized Learnia summary report feature. All reports were coded to guarantee confidentially. Each coded student identifier represented a record. Each complete report contained the following data unique to each record: NJ ASK test scores (language arts and math), Learnia pre assessment scores (language arts and math), Learnia post assessment scores (language arts and math), free and reduced lunch identification, attendance record, and ASI eligibility. Incomplete cases with records void of a least one component of data were excluded from the study.
Because of differences in the demographics and academic makeup of the students in the four middle schools, and to meet the simultaneous regression assumption of data independence, we analyzed the data from the four schools separately instead of aggregating the scores across the schools. Separate analyses were also necessary because the district leadership interpreted the test results on the individual school level and made decisions for each school based on that school’s results, not based on district aggregated test scores. Resources are allocated in the district based on the individual schools’ output, not the aggregate of the district’s middle schools. Although the district is categorized as lower middle class, the four schools used in the study have characteristics representative of working class to upper class socio-economics.
Because not all the independent variable were continuous (i.e., free/reduced lunch eligibility, attendance policy violations, and ASI eligibility), we used dichotomous dummy coding for categorical variables. The following recoding was used for the student variable of SES: 1 = eligible for free/reduced lunch, 0 = not eligible; for the student variable of attendance: 1= student exceed district policy of 16 absences, 0= student did not exceed 16 absences; for the school variable of ASI eligibility: 1= eligible for ASI services, and 0= not eligible. Gender was coded as 0=male, 1= non-male.
We ran scatter plots and correlation matrices in order to check the assumption of linear relationships of each predictor with the dependent variable. We conducted Pearson Product Moment correlations between the dependent variables and the predictor variables for each subject (M and LA) for each school to determine the relationships among the variables prior to conducting simultaneous multiple regression as a preliminary check for multicollinearity. We do report Pearson correlations for the sake of brevity but they can be requested from the authors. As part of the simultaneous multiple regression, all appropriate student and school variables were entered at the same time. From this procedure we were able to ascertain which predictors contributed statistically significantly to the multiple regressions. Multicollinearity issues were tested by computing the variance inflation factor (VIF) from the data loaded into the regression models. Multicollinearity was not an issue in the models as the tolerance values for the predictors were not exceedingly low (<1-R2) for any variable except NJ ASK8 Math. From the t value and the p value, we determined if one specific variable statistically significantly contributed to the prediction equation for NJ ASK scores from all the independent variables.
We present results by school and subject within each school.
A statistically significant model emerged (F= 31.209, p=< .001, Adjusted R² .588) and accounted for approximately 59% of the variance in student performance on the NJ ASK8 in language arts. Statistically significant predictor variables, their betas, and p values were as follows: (a) NJ ASK8 Math, .363, p<.001, (b) interim LA Posttest, .275, p<.001, (c) interim LA Pretest, .248, p<.001, and (d) gender, .152, p=.004.
NJ ASK8 Math achievement was the best predictor of NJ ASK8 LA achievement. However, the predictive power of the NJ ASK8 Math is suspect. It stands to reason that the true direction of the relationship comes from a student’s language arts skills and those skills influence his/her mathematics scores because of the amount of reading necessary to engage the NJ ASK8 mathematics test. The seemingly apparent influence of math achievement on language arts achievement might be a false finding and should be interpreted with caution.
Readers should instead focus on the almost identical strength of the interim LA pretests and posttests. We find it interesting that the interim LA pretest, given in September of a school year has similar predictive power as the posttest, usually given in March or April of the school year. Are both tests and the time taken from instruction necessary in this school?
A statistically significant model emerged (F = 31.866, p < .001, Adjusted R2 .594.) accounted for approximately 59% of the variance in NJ ASK8 math scores. Two statistically significant predictors variables were, (a) interim Math Posttest with a beta of .383, (p<.001) and a t value of 5.296, and (b) NJ ASK8 LA (p<.001) with a beta of .358 with a t value of 4.904.
A statistically significant model emerged (F = 36.329, p < .001, Adjusted R2 .639.) that accounted for approximately 64% of the variance in student performance on the NJ ASK8 LA. Statistically significant predictors variables, their betas, p values, and t values were as follows: (a) ASI, - .270, p<.001, t value of 4.732, (b) gender, .242, p<.001, t value of 5.132, (c) NJ ASK8 Math, . 227, p<.001, t value 2.773, (d) interim LA Posttest, .206, p=.001, t value of 3.509 , and (e) student SES, -.132, p=.005, t value of -2.875. However, as stated earlier in the results for School A, the finding for the strong prediction value of NJ ASK8 Math for NJ ASK8 LA is suspect.
A statistically significant model emerged (F = 49.333, p < .001, Adjusted R2 .707.) Approximately 71% of the variance in NJ ASK8 Math scores was explained by the variables in the model. Statistically significant predictors variables were as follows: (a) ASI,- .279, p<.001, t value of -5.539, (b) interim Math Posttest, .240, p=001, t value of 3.541, (c) interim Math Pretest, . 211, p=002, t value 3.093, (d) NJ ASK8 LA, .184, p=.007, t value of 2.733, and (e) interim LA Pretest,.115, p=.02, t value of 3.351. As with the interim LA pretests and posttests in School A, the interim math pretests and posttests in School B were similar in their prediction strength. Potentially, the interim pretest math scores from a test given in September could predict about as much of a student’s chance of being proficient on the state mandated test administered in May.
A statistically significant model explained (F = 20.246, p < .001, Adjusted R2 .614.) approximately 61% of the variance in student performance on the NJ ASK8 LA. Statistically significant predictor variables, their betas, p values, and t values were as follows: (a) NJ ASK8 Math .703, p<.001, t value of 7.310, (b) interim LA Pretest, .214, p=.003, t value of 3.067, (c) interim LA Posttest, .212, p=.005, t value of 2.884, and interim Math Posttest, -.189, p=.021, t value -2.347. However, as stated earlier in the results for School A, we question the finding for the strong prediction value of NJ ASK8 Math for NJ ASK8 LA.
The findings for the predictive strength on the interim LA pretests and posttests mirror those from School A. The beta and t values for the interim LA pretest and posttest in School C are almost identical, calling into the question the usefulness of the pretest / posttest assessment scheme in School C.
A statistically significant model emerged (F = 33.437, p < .001, Adjusted R2 .728.) and it accounted for approximately 73% of the variance in student performance on the NJ ASK8 Math. Statistically significant predictor variables, their betas, p values, and t values were as follows: (a) NJ ASK8 LA, .495, p<.001, t value of 7.310, (b) interim Math Posttest, .319, p<.001, t value of 5.163, (c) gender, -.208, p<. 001, t value -3.992, and (d) ASI, -.179, t value -3.230. The betas suggested that being female, not eligible for ASI services and doing well on the NJ ASK8 LA and interim math posttest predicted positive results for the mathematics portion.
A statistically significant model emerged (F = 30.316, p < .001, Adjusted R2 .559) and accounted for approximately 56% of the variance in NJ ASK8 LA scores. Statistically significant predictor variables, their betas, p values, and t values were as follows: (a) NJ ASK8 Math, .443, p<.001, t value of 5.956, (b) interim LA Pretest, .187, p<.001, t value of 3.750, (c) ASI, -.119, p=.033, t value -2.141, (d) interim LA Posttest, .113, p=.044, t value of 2.023, and (e) gender,.102, p=.033, t value -2.147. The betas suggested that not being in need of ASI services, being female, and doing well on the interim LA pretest and posttests accounted for the most variance in performance on the NJ ASK8 LA test. Interestingly, the interim LA pretest was a stronger predictor of achievement on the state LA test than the posttest.
A statistically significant model emerged (F = 45.910, p < .001, Adjusted R2 .660.) that indicated approximately 66% of the variance in NJ ASK8 Math scores was explained by the variables in the model. Statistically significant predictor variables, their betas, p values, and t values were as follows: (a) NJ ASK8 LA, .341, p<.001, t value of 5.956, (b) interim Math Pretest, .336, p<.001, t value of 5.999, (c) ASI, -.203, t value -4.301, (d) interim Math Posttest, .104, p=.056, t value 1.919. As we found in School B, the interim math pretest was a stronger predictor of student achievement on the state mandated math test than the interim posttest.
The results of the study revealed each school produced a combination of site-specific results and results common across sites regarding the strength of each independent variable to predict student achievement. The strength of the relationships between the various independent variables and the dependent variables varied greatly across sites. This suggests that the influence of interventions differ across sites due to contextual factors such as demographics of the community and student body. One size does not fit all in terms of interventions and influence on achievement.
The interim pre and posttests were statistically significant predictors of achievement across all schools for LA and Math, with the exception of School C where the math pretest was not a statistically significant predictor of achievement. Seemingly, in all cases, the interim pretest accounted for the same or approximately the same amount of variance in student output on the state mandated tests as did the posttest. If the pretest predicts as much or almost as much of the variance in student achievement as the posttest, of what value is the posttest, and pretesting/posttesting cycle for that matter? The Learnia interim assessment tool is marketed to school and district administrators as being able to provide high quality feedback to teachers from the pretest so that they can adjust their instruction before the posttest, and before the gran cru of summative assessments, the state mandated NJASK8. In this case, similar to Thorndike’s (1901; 1924) findings so many years ago, those who came to Grade 8 with the most academic achievement left with the most and Learnia testing could not influence the performance trajectory of those labeled as not-proficient on the pretest in order to become proficient on the posttest or the NJ ASK8.
Pearson recommends using Learnia as an important intervention to improve student achievement on state mandated tests. However, it is important to note that the reviewed research regarding interim assessments indicated that there is not a statistically significant relationship between the use of interim assessments and informed instructional change in the classroom (Goertz, Olah, & Riggan, 2009) nor is there a relationship between the types of assessment structures that are potentially available from the Learnia product and increased student achievement. More of an effort must be made to ensure that interim assessment products marketed to schools and interim assessment practices used in schools are vetted empirically by the corporation, not a common practice, and also vetted locally on a pilot basis to determine efficacy at various contextually unique school sites. The context in which the product is developed might not match the context in which it is deployed, and thus the results might not match those marketed by the company vending the product. Context matters.
The empirical literature revealed that the most effective classroom assessment in terms of influence on student achievement is formative assessment that is structured to provide opportunities for the students to practice the technique of self-evaluation, and reflection to help students learn to monitor and adjust their learning. For example, an experimental study conducted by Schunk (1996) revealed that students who had structured opportunities to practice the formative assessment technique of self-evaluation frequently, had greater motivation and increased achievement outcomes in comparison to those who did not participate in the practice of self-evaluation frequently. Generally, interim assessments do not provide those types of opportunities.
The existing empirical literature and the results from this study seem to suggest that the more proximal (closer to the student) the formative assessment activity is (i.e., self-evaluation), the greater the influence it has on learning (Sadler, 1989; Schunk, 1996) whereas the further the assessment practices, either formative or interim, are from the student (distal), the less influence they have on learning. The assessment product, Learnia, and products like it, are distal from the student. They are not formative assessments and the student is not actively involved in self-monitoring or self-assessing. It is unclear how the product, as currently used in the four schools in this study and marketed by the corporation facilitates reflection beyond superficial error identification.
School and district administrators might want to consider redeploying the funds earmarked for the distally developed corporate assessment products toward building more internal assessment capacity through teacher developed formative and interim assessment practices based on prevailing evidence. Doing so could help to build local assessment capacity and allow for customization of practices to meet the needs of the district’s diverse student population. Making formative and interim assessment activities more proximal to the student might help to increase the chances that those practices will actually lead to improved teaching and influence student achievement and more effective use of scarce public funds. At the very least. school administrators should know the efficacy of the products they bring into their buildings and they should understand the quality of the data from which they make decisions.