TAMPA — Everyone seems to have a bright idea in the tug-of-war to fix America's public schools.
Pay teachers more. Adopt a common curriculum. Give parents a voucher and let them pick the school.
But this spring, one solution is looming above all others, both nationally and in bills before the Florida Legislature. It rests on a simple claim: that it's possible to predict each student's performance on tests based on their track record, and then hold teachers accountable for making those annual predictions come true.
It's called value-added analysis. And the Hillsborough County School District is preparing to push the new science to its limits.
Nearly every Hillsborough student this spring will take exams in rarely-tested areas like physical education and the arts. Such scores, along with those already collected on the Florida Comprehensive Assessment Test, will allow the district to rate virtually every classroom teacher using student tests.
"What it means is every student has their own starting line, and students are compared to themselves. That's a good thing," said Anna Brown, assessment director for the district's seven-year partnership with the Bill & Melinda Gates Foundation.
Teachers say the new emphasis on testing adds pressure to teach memorizable facts at the expense of exercise or creativity. They worry of being wrongly labeled and facing pay cuts or even termination as a result.
And experts say they're right to worry. Even those who find value-added methods useful say Hillsborough is venturing into uncharted waters by including non-academic subjects and special-needs students.
"All of these indicators are fallible," said Henry Braun, an education professor at Boston College and former vice president of research for Educational Testing Service. "I think we overestimate what statistical analysis can do for us."
Value-added is being used by hundreds of school districts nationwide, including New York City and Chicago. But research shows it's often inaccurate.
One federal Education Department study found such systems misclassify up to 35 percent of teachers in a single year. That error rate falls to 25 percent using three years' worth of data.
Steven Glazerman, a senior fellow at the consulting firm Mathematica, said it's not clear whether it's fair to use a 20-question test to determine 40 percent of an elementary art teacher's evaluation, as Hillsborough plans to do.
"Unfortunately, we don't know the answer," he said. "Because most of what we do know is based on the traditional grades in the traditional subjects. I'd have to say it is an open question."
Hillsborough officials say they won't rely solely on value-added. Starting this fall, such scores will make up 40 percent of a teacher's evaluation, rather than the 50 percent being considered by Florida legislators. And the district will use three years of scores to make decisions on teacher pay.
Observations by principals and peer evaluators will make up the remaining 60 percent in Hillsborough, with support from a $100 million Gates grant. Officials say their new system will be tougher than in previous years, when 99.5 percent of their 12,500 teachers were rated satisfactory or outstanding and one-third were called flawless.
It's true that value-added is imperfect, said David Steele, who oversees the district's Gates reforms.
"But is it better than what we have done?" he asked. "Is there more error built into value added? Or is there more error built into one principal sitting in his office, evaluating every person on his staff whether he's ever actually seen them teach or not?"
• • •
It's kind of like growing oak trees.
That was the analogy offered by Brown during a visit to teachers at Williams Middle School in Tampa.
She pointed to a picture of two trees, one of which had clearly done a better job of reaching its full, leafy potential. Would it be fair to judge their gardeners without knowing more about things like soil quality and climate?
"Gardener B must be superior, (because) he has the higher tree?" she asked. "I think we all know that doesn't tell the whole story."
In the same way, Brown said, University of Wisconsin statisticians will help Hillsborough to factor in variables like poverty or language fluency in predicting annual student gains.
But several national value-added specialists argued against Hillsborough's plan to use such scores as part of an automatic rating system.
Braun of Boston College said value-added often fails to account for things like a principal's weak leadership or school climate differences, lumping such factors into a teacher's score. He advised using it only to focus attention on potential concerns.
"What you (should) use it for is to do detective work," said Derek Briggs, an associate professor of education at the University of Colorado at Boulder. He favors his state's approach of using value-added methods to spot potential problems with schools — not individual teachers.
Jesse Rothstein, an associate professor at the University of California at Berkeley, said using value-added as part of a teacher's evaluation can prompt them to change their teaching in unhealthy ways, dropping useful activities that aren't being measured by a narrow, simplistic test.
"You do have to worry that you create incentives for teachers to aim at the measure you're using, rather than aiming at being effective," he said.
• • •
"Okay, wait a minute!" called out PE teacher Tecca Kilmer. "I want you to take two fingers and check your pulse."
Her students at Turkey Creek Middle School were playing an energetic game of capture the flag on a wind-swept field. But now they dropped to one knee, pressed their necks and counted silently.
"What does it mean if it's beating faster than usual?" she asked.
"You're using more oxygen," said eighth-grader Logan Holland.
Even before Hillsborough won its Gates grant last year, PE teachers were teaching and testing more — both as part of a voluntary state merit-pay program, and to set the pace in an age where every teacher must show they're making a difference.
Kilmer said she's not doing anything differently this year. But she worries that a single, written exam that tracks student knowledge — not physical improvements — can't capture all of what she teaches.
"It's good in the sense that it's looking at what students are learning," Kilmer said. "But a written test is not enough for PE."
Hillsborough arts teachers, too, say the new tests miss a lot.
"It's a tiny snapshot," said Frank Hannaway, a music teacher at MacFarlane Park Elementary.
He said he likes the new teacher observation system, which includes visits by peer evaluators with experience in the arts. But teachers want an evaluation that measures musical learning, and not just facts about music.
"We're working on that," said district arts supervisor Melanie Faulkner. Music students will listen to a song as part of an "experience-based" test, and art students will look at a picture.
"It's not writing definitions," she added. "We want children to apply what they've learned."
Elementary art classes have already been reduced to 30 minutes per week due to budget cuts. With the new tests, some teachers have been forced to cut back on projects, said Amy Klepal of Ballast Point Elementary.
"The biggest complaint I've heard from teachers is that it really takes away from the creative process for children," she said. "We're stopping more, we're talking more."
On a recent morning, her third-graders used a full period to finish collages. Klepal said she'd wait until an early-release day, when she sees each class for 15 minutes, to brush up on test topics like the difference between Van Gogh and Renoir.
"We want them to gain deeper meaning in their learning," she said. "We're talking about cultures, we're talking about history. But when you see a child once a week for 30 minutes, that's a tall order."
Tom Marshall can be reached at firstname.lastname@example.org or (813) 226-3400.