A little more on VAM in Pinellas
On Saturday, one of my colleagues, Curtis Krueger, wrote a story about the confusion and frustration of teachers in Pinellas County who received their value-added scores last week. The scores, based on a complicated statistical formula, compare predicted student performance with actual student performance in an attempt to evaluate how well a teacher is teaching.
Under the Pinellas County School District's new system (as required by state law), about 65 percent of teachers were rated highly effective or effective, the top two categories. Fewer than 10 teachers were in the lowest category, unsatisfactory. (Keep in mind that these numbers could change. The district still is crunching the data.)
Compare those numbers to the 2010-2011 school year when 98.8 percent of the county's teachers were rated satisfactory. At many schools, not a single teacher was rated unsatisfactory. Stats like those - where just about no one is considered a "bad" teacher and teachers are rarely fired for poor classroom performance - are part of the reason that many states have started looking at new ways of evaluating teachers.
It seems highly unlikely, after all, that every teacher is effective.
Florida has taken a number of passes over the years at establishing merit pay for teachers - anyone remember MAP and STAR - with Gov. Rick Scott signing the state's current merit pay law last year. The law puts a lot of weight on student test scores - 40 percent to 50 percent, depending on how many years of data is available - and the rest on the principal's observation.
But, of course, there's a huge caveat here. How do you evaluate a teacher based on student performance in a way that's fair, transparent and reliable? So far, no one has really come up with it.
Value-added measures, as used by school systems so far, tend to be unreliable. In New York, where there was a hugely controversial release earlier this year of scores with teachers' names, the margin of error in some cases was more than 50 percentage points. That kind of margin, critics said, rendered the results almost meaningless. (The release also created a media storm where some newspapers blasted the names of the "worst" teachers and their photos without delving too much into the enormous margin of error.)
Check out this 2010 study that found that value-added models had a 25 percent error rate, which went up when just one year of data was used.
A decent value-added system should look at more than one year of data - three or more - and more than one type of data. (Using performance on only one standardized test, for instance, isn't as informative as using several measurements.) It also should take into account the students being taught. Are they poor, disabled, struggling learners, gifted students?
A pure growth model could penalize teachers of gifted students, for instance. It often is easier to get big learning gains among low performers - because they have a lot of room to improve - but not so easy to move the needle among top performers. (Sometimes called the "ceiling and floor effects." See page two of this analysis for the state Department of Education.)
If the model looks only at overall performance and not growth, on the other hand, than a teacher could be unfairly penalized for working with a tougher population of students - those who are coming to school hungry or several grade levels behind or with behavior problems or chronic absenteeism. Same goes for the special education teacher.
(And how do you use test scores to evaluate an arts teacher or a drama teacher?)
The reliability of the data itself is hugely important. FCAT scores dropped dramatically this year because of changes to the test and passing scores. In some cases, state officials warned the public against making year-to-year comparisons at all.
All of which is to say that value-added measures are far from simple.
In Pinellas County, school officials wrote a little explainer piece about the value-added measures in which they say that no teacher is in danger of losing their job over the scores. It also says the most important piece of the new evaluation system is still the part completed by the principal. (See the district's document below.)
The document says that it's possible for a teacher to show state learning gains, but get a low VAM score. That's because a different method of calculation is being used. Here's what it says:
"Students can earn a state learning gain by going up an achievement level from the previous year, remaining at a high achievement level (3-5), or for students remaining at levels 1 and 2, going up enough developmental scale score (DSS) points to show one year’s worth of growth. Value-added is calculated using only the current and prior-year’s DSS scores and then performs its estimate of predicted growth compared to actual performance for each student. Teachers may have students at Level 1 and Level 2 who stayed at those levels but showed enough DSS growth to earn a state learning gain, but those same students may not have done better than predicted growth for other students with similar characteristics, which would contribute to lower VAM scores."
The document says that the VAM model being used in Pinellas provides a more "defensible foundation" for teacher evaluation than methods judging teachers based on the percentage of students meeting a fixed standard of achievement. It also says that other school districts have similar challenges to Pinellas, yet have enjoyed better academic performance.
If you want to learn more about VAM, the Hechinger Report has written extensively about the issue. See this story here.