A weekend interview with Sam Foerster, chairman of the Florida Race to the Top value-added committee
As part of its Race to the Top effort, the Florida Department of Education has convened a committee to help pick a value-added model to evaluate teachers. The committee has begun reviewing methods to achieve that goal, and has already begun narrowing the field. Members are keenly aware of how controversial the concept is, and are trying to remain focused on coming up with a model that helps improve instruction without becoming mired in politics. Committee chairman Sam Foerster, associate superintendent of Putnam County schools, spoke with reporter Jeff Solochek about the effort.
Florida and the whole country seem to be headed toward using more of a growth model and value-added model to evaluate kids and evaluate teachers and schools. I'm interested in knowing where you are right now and what you see coming forward.
As a committee we convened for the first time a couple of weeks ago and were brought by a vendor ... American Institutes in Research ... eight variants of value-added models that are either in the literature as things we should have confidence in or are already in play in districts and/or states in America already. Those eight models broke out into two basic categories, one being a covariant model and the other being what they described as a learning path model. And assumptions that are built into each of those models are pretty dramatically different. ...
Can you give a simple, totally easy version of what the differences are?
I can try. I am not a statistician.
That would be better because you can give us your understanding of it.
Well, I can tell you what I took from it was that in these two basic camps ... the covariant is the simpler of the two. It works on the premise that you can use a student's prior year FCAT score to predict with some certainty what a typical outcome for them should be in the following year. ... The idea is that the fit line is calculated for fitting the data is a description of a typical outcome. Kids that lie above that line are believed to have an above expected outcome, and those students who fall below that line are those students who we believe have demonstrated a lower than expected outcome. By looking at how the students fared relative to the expected outcome, and then aggregating that to the teacher, we come up with some understanding of how that teacher performed, above or below average. ...
Arguments can be made that different students should have different expectations for learning. Or instead of saying they should, we could use the words we can demonstrate that they have demonstrated different histories of learning or achievement. And by studying groups of students we can see whether or not the performances of those subgroups vary from one to another. And when we see that they are different, we can make a decision of whether or not we want to acknowledge that and level the field for the teacher. Or we could choose not to acknowledge that. ...
Who is the we? The teachers union? The school district? The state?
In this context it is deemed to be the work of the committee to have a discussion about what these diffferences mean and whether or not they should be acted on in terms of integration into the model. Because in some instances, in leveling the field you may actually have an unintended consequence. An example of that might be if you find that student attendance has bearing on how much growth typically a student shows year over year, you might be tempted to say, If I have a student who is out of school x number of days, then I'm going to count that student less or I'm going to have lower expectation of growth because I've seen that historically. The problem is that in what the work of the committee got termed perverse incentive, potentially you've got a a teacher who has got some control over whether or not that student is in class. If that teachers knows if the student hits a threshhold and then the student has a lower expectation of growth, there could be the perverse incentive for the teacher to encourage absenteeism. ...
So there are the two different kinds. Was there one that the committee felt was better yet?
Well, yeah. The committee was unified in its conviction that the covariant model was superior to the learning path model, for a few reasons. The learning path model is much more complex statistically, so it is much more difficult to describe exactly how it works to anybody, including ourselves. And we perceived that to be a real serious barrier to the model's utility. We're all hopeful that this type of information proves valuable to teachers. After all, I think why most of us think it's important is to inform instruction. And if the model that is being used to measure and record teacher effectiveness is so complex and nuanced, it won't be believed by the teachers whom we are trying to help. So we saw that as a problem.
We saw our own inability after many hours of having tried to understand the complexity, our own inability to articulate it clearly, would be a problem. We didn't feel like we would be great ambassadors for it. And then additionally that type of model has some assumptions built into it that just seemed counter to our own instincts. One of those assumptions was that there is a teacher effect that is constant. As more and more information is accumulated ... the teacher effect is believed to be constant and is readjusted every time we have new information. ... The problem is that the information is then changed going backward. ... From a teacher evaluation standpoint and a legal defensibility standpoint, this seemed also to be problematic. ...
Did anybody talk at all about how controversial this whole idea is, and that there are concerns that it doesn't really say what you want it to say? The whole idea of value-added, there are questions about whether it will get you where you want to go. And there's the point that a school that's always been doing well is going to look like it's going to do less well because it has less room to grow. ... Did any of those kinds of things come up?
You know, it did come up in the context of what's being called the school effect, as opposed to the teacher effect. I've got to be honest with you, at that point some of the statistics really left me. ...
Are you getting any feedback from people in your community? ... Is anybody saying this is what they want or don't want? Because it's so political, that this has become an argument nationally over whether to move to these value-added models.
I work in Putnam County. And have been working with a simple growth model for a couple of years now. Our district is well downfield in terms of developing a model and introducing it to teachers and talking through what it means and doesn't mean. This thing that we do locally is different from the value-added model in that we don't really try to parse what is teacher effect and what is student effect. And while I can understand why that is important, particularly if you're using the instrument for evaluation, it's hard. Which is why I think the models that exist right now are imperfect in a number of ways. What we are trying to do from a value-added side is determine which is the best model. ... That having been said, the value-added models complicate things because they do try to parse teacher effect and student effect and to make accommodations for leveling the playing field in terms of having expectations for different kids. .Whereas the simple gwoth model, which is what I am more familiar with ... doesn't speak to those distinctions but rather offers simple feedback to the teacher in terms of this is what happened in your classroom, presented without judgment or bias, really, but rather as a point of conversation as to what happens and to why that result was what it was ... and what we think we can learn from it and apply to other teachers in our district.
I would say those discussions have been very productive, although somewhat nervewracking. Because once you've looked at the information ... sometimes that information isn't what they want to see. Their results are not what they would have believed them to be. That conversation can be touchy. When approached with the proper motivation -- that is how to get better -- that information is well received. I don't know if that answers your question.
Do you think that people really understand what it is you're trying to accomplish at the state level? Or do you think it can get bogged down, like when they released those value-added scores in Los Angeles and all the fallout that came as a result of that. ... Do you think this is something that can be accepted for what you want it to be? Or is it always going to be difficult to even get buy-in?
I'm going to say yes, and yes. Can we get there? Yes I believe we can. Is it always going to be difficult? I believe it will be. It's an emotional, loaded topic to begin with. And the math, even in the more simple circumstance, is complicated. So in my view the art is figuring out how to make the model sufficiently complex as to be useful, but not so complicated that nobody really understands it. And that's a challenge. I walked away from that first committee session feeling the burden that we try to do this well and have the right conversations and try to keep our eye on the ball. What I mean by eye on the ball is, is this going to be useful? ...
To answer your question as to whether the model is going to live up to its intended purpose, I don't know, partly because I'm not sure we all agree what the intended purpose is. And that's part of what I think our job is on the committee, to be certain that we at least articulate clearly what our intended purpose is.
Just hearing you say that makes me think, if I'm about to be evaluated in any way with a system that people are still arguing about and aren't sure what the intended purpose is, do we really need to do it?
Well, in my opinion, yeah. Because what it is bringing to the process is some measure of objectivity. I don't know anybody who would argue that having some objectivity in terms of outcome based analysis isn't in order when you're trying to improve the efficacy of the system. And I am not talking education only. I am talking about any system you are trying to improve. ... You've got to measure what you are trying to improve and then make some judgments in terms of what you see to change your course of action. I am a staunch advocate that looking at outcomes matter.
Does that mean having more and more tests, which some people seem not to like?
I don't believe that it means that. What I would offer here is that we have to be careful with what we mean when we say more tests. Specifically there are lots of academic demands... and while some get tested one could argue too much, or enough, some don't get tested at all, really. So it's wide open in terms of what our outcomes are and whether they should be better. Because we have no metrics of what our outcomes are. ... Probably, what it really means is we've got to make better use of the information we already collect. ...
What do you have to do now?
The charge at this point is to get back together I think on the 19th and 20th of May to review the findings of AIR with respect to the three models that were selected .... Then we will get some idea of how that information played out ... Then our job is to make a recommendation of those three models and all of the various variables that were considered ... or relationships that were considered to be important enough to study in terms of correlations of outcomes in the growth model and reality on the ground in a school ...
Can I ask you a question? What do you think about the value-added model?
Well, I have just been hearing and reading and trying to learn about it. It just strikes me that there are so many questions about it. There are people who say it's definitely the way to go to evaluate in a more objective way. At the same time, there's been no real demonstrated model that definitely works, according to the experts who even sometimes try to create them.
When you ask can they work, how would you define work?
That's a good question. I would want to know that it actually measures what you say it measures, and that it provides information that benefits people in a way that they're not questioning the motivations or the value of it, as opposed to saying, This is something that is helping me to be a better teacher or helping me get the student to be a better student. And knowing that they're actually successful .. instead of wondering if this is just a way to "get" me.
I think those are great observations. I want to reinforce that I feel like a lot of people on the committee want this to be valuable and under no circumstance want to be perceived as a gotcha. But it's difficult with such an emotionally loaded concept to not have some perception on the part of teachers that it is about gotcha. That's why I think it's so important that we in our discussions talk through what is it that it's supposed to be about, and how can we within the constraints of what we're being asked to do be sure that happens. Our 'what is this about' is, does it inform instruction? Does it help our teachers become more effective teachers? That as I understand it ... seems to be the intention. This need to be absolutely right about the conclusion we come to, the statistical defensibility issue that comes up when this information is used for the purposes of evaluation and possible firing of teachers. Those two things seem to be intentioned.
Because the more certain you are about a conclusion, the less specific you can be about the conclusion. And that's not helpful from an instructional standpoint. From an instructional standpoint you would hope you can be really specific, but you're going to have a substantial amount of error associated with an assessment for any given teacher. How that tension is resolved, I don't know exactly ... and I don't believe anyone else does either. Which is why they struggle coming up with a perfect model, if you will. I'm not sure one exists because the intent beteen being insightful, from an instructional standpoing, and being defensible from an employability evaluation standpoint, those two things being in conflict causes a problem. And where this lands I'm not sure.