A weekend interview with ...
... Kurt Geisinger, executive director of the Buros Center for Testing at the University of Nebraska. Geisinger has been working with the Florida Department of Education to monitor and improve the FCAT exam, following scoring problems with the 2006 third-grade reading test. He spoke with reporter Jeff Solochek after the release of this year's scores.
Did the test scoring and the test results go better this year?
I think that's right, and in part because they had a larger what we call calibration sample. Calibration is one of those things that is very technical, but it basically gets around to the type of adjustments that we make to bathroom scales. Where you're raising it up or lowering it down a little bit depending on if the items are a little easier or harder than the previous year. ... If you have a test of 30 items, it's hard to generate a test that is exactly equivalent year after year. ...
Did you do them? Or did someone else?
No. We don't do them. We were hired for what I call an audit, which is to spot check some of the work. ... There are really four parties, and we were asked as part of our contract with the state of Florida to sort of listen in. So we were added to the e-mail listings that were going back and forth as they were looking to do the calibrations. Because it really is a very complicated mathematical and technical problem. And we thought they did a really fine job, and everything was within - even the one you are probably calling about, that is the fifth-grade reading - was still within the normal variation. I mean, people don't quite understand sometimes that this is not a truly exact science. So you look at even annual results for a state, and you say, If Florida were - actually the kids in the fifth grade - were improving their reading one or two percent a year, could on a given day the test result in a five percent decline? And that's actually not that improbable.
There are people in the schools, though, who are expressing their surprise at the fifth-grade reading and they're saying they don't jibe with their reality. They want to look at them more closely to see if there was a problem. You're saying there is no problem?
Well, I'm saying you can have a five percent decline even when kids are improving, and that's not outside the normal variation. We shouldn't be, you know, making decisions on one or two or three percent. In this particular case, the state of Florida had eight grades tested and tested in reading and math, and in 15 of those there was an increase from one year to the next. In one case there was a decline. That tells a pretty strong story, I would think. And that's where I would focus the discussion. But the reality is, in one case there was a decline, it was not out of range that tells you, OK, they've dropped in reading.
It's sort of like, I don't know you at all, I have a weight problem. So if I step up on a scale and one day it says 220 and the next day it says 219, I don't immediately jump up and down and say I've lost weight. OK. And that's what I am saying. Is we are within those scale calibrations that we really don't make a big deal about. On the other hand, if I've lost weight 15 days out of 16, then I'm feeling pretty good. If that makes sense.
Absolutely. I think that's a way everybody could understand it.
And, let's see. The only artifact here at all that impacted it, and this does relate to what happened in the third grade the previous year, is in 2007 the anchor items that are used for the equating moved up in the test. And that's one of those things, there's no way around it. You can't always put the items in the exact same spot. If the students in the state knew questions 5-8 every year don't really count, they were for equating, the students would just skip right over them. So the state has to move them around so they stay more or less in the same spot. ... It doesn't make any sense to me at all why moving items up makes them a lot easier, but, frankly, it does. I don't know if it's fatigue, or whatever, but it does. And so what happened was in 2007 the anchor items were earlier than they were in 2008, in fifth grade reading.
And reading is actually much harder to deal with in anchors than math, because math items are all discrete. In other words, each item stands more or less on its own. But since reading items always follow a passage, if you move one anchor item you're usually moving a bunch of them. ...
What about the fact that these fifth graders are the same kids who were the third graders who had the testing issues from before? Is there anything we should take from that?
I think that's just anomalous. Not that I know at this time. I'm enough of a scientist that I'm not going to speculate. But my sense initially would be that it's just anomalous. Because in the fourth grade, as I recall ... they were more or less average.
So it's just a testing situation? It's just something that happened because a test is taken at one point in time?
I think so. Yeah.
So, as people here in Florida move forward with the FCAT, should they feel confident that the FCAT is doing what it is supposed to be doing? Or is there always going to be, or should there be, a lingering concern?
I think what the FCAT is experiencing is exactly what is happening with tests around the country. Tests are really good indicators of gross levels of performance, but they shouldn't be used as overly precise instruments. And unfortunately, we have nothing else. I mean, what else could you use that would give you a barometer of how the educational process is proceeding? There really isn't anything else. And so the bottom line is, it's the best technique we have but it's still imperfect. And I think they're taking steps to make it more and more perfect, frankly.
For more on testing calibration, equating and anchor questions from the Buros Center, click here.