And a Music Teacher Shall Lead Them...

Photo credit: Emily Rose Bennett | MLive.com
Photo credit: Emily Rose Bennett | MLive.com

A recent article from MLive on the state's efforts to "improve" teacher evaluation practices manages to get nearly everything wrong, with the only saving grace being the contributions of a brilliant music teacher, Mandy Mikita Scott. But first, let's go over what the article got wrong:

 

  • Venessa Keesler, deputy superintendent of accountability services at MDE, said measuring student growth is a "challenging science," but student growth percentiles represent at "powerful and good" way to tackle the topic. "When you don't have a pre-and-post-test, this is a good way to understand how much a student has progressed," she said. Under the new law, 25 percent of a teacher's evaluation will be based on student growth through 2017-18. In 2018-19, the percentage will grow to 40 percent. State standardized tests, where possible, will be used to determine half that growth. In Michigan, state standardized tests – most of which focus on reading and math – touch a minority of teachers. One study estimated that 33 percent of teachers teach in grades and subjects covered by state standardized tests.

    What Dr. Keesler doesn't seem to understand is that the student growth percentiles she is referring to are nothing more than another name for Value Added Measures, or VAM--a statistical method for predicting students' academic growth that has been completely and totally debunked, with statements from nearly every leading professional organization in education and statistics against their use in making high stakes decisions about teacher effectiveness (i.e., exactly what MDE is recommending they be used for in teachers' evaluations). The science here is more than challenging--it's deeply flawed, invalid and unreliable, and its usefulness in terms of determining teacher effectiveness is based largely on one, now suspect study conducted by a researcher who has been discredited for "masking evidence of bias" in his research agenda.

    Dr. Keesler also glosses over the fact that these measures of student growth only apply to math and reading, subjects that account for less than a third of the classes being taught in the schools. If the idea of evaluating, for example, music and art teachers by using math and reading test scores doesn't make any sense to you, there's an (awful) explanation: "'The idea is that all teachers weave elements of reading and writing into their curriculum. The approach fosters a sense of teamwork, shared goals and the feeling that "we're all in this together,' said Erich Harmsen, a member of GRPS' human resources department who focuses on teacher evaluations." 

    While I'm all for teamwork, this "explanation" is, to be polite, simply a load of hooey. If Mr. Harmsen truly believed in what I'll call the "transitive property" of teaching and learning, then we would expect to see math and reading teachers be evaluated using the results of student learning in music and art. Because what's good for the goose...right?

    The truth is, as any teacher knows, for evaluation to be considered valid, the measures must be related to the actual content that is taught in the teacher's class--you can't just wave some magical "we're all in this together" wand over the test scores that miraculously converts stuff taught in band class to wonderful, delicious math data. It just doesn't work that way, and schools that persist in insisting that it does are now getting sued for their ignorance.
  • Dr. Keesler goes on to say: "So much of the local data is actually unreliable and totally unfair to educators," she said. "It's not helpful in terms of improvement. What many other states have done is that they've actually put collaborative efforts together to develop new data sources for non-tested subjects." A 2013 study by researchers at the University of Michigan found that local assessments can vary among "teachers at the same grade, in the same school, teaching the same subjects." The study, which examined teacher evaluations in 13 districts across the state, said this diversity in assessment makes it nearly impossible to apply a uniform standard for judging teachers' success in promoting students' academic growth.

    What's missing from this "analysis" is any sort of justification for why "applying uniform standards for judging teachers' success" is important, necessary, or even meaningful.

    Teachers work with children, and these children are not standardized.

    Teachers work in schools, and these schools exist in communities that are not standardized.

    And teachers work with other teachers, custodians, secretaries, administrators, school board members, and other adults--none of which are standardized.

    So why should teacher evaluations systems in schools in communities as diverse as the Upper Peninsula and downtown Detroit evaluate their teachers using the same system? And why is the finding that "local assessments can vary among 'teachers at the same grade, in the same school, teaching the same subjects'" a bad thing?

    The thing that we should be valuing in these children, schools and communities is their diversity--the characteristics, talents and interests that make them gloriously different from one another. A school in Escanaba shouldn't look like a school in Kalamazoo, and the curriculum in each school should be tailored to the community in which it resides. The only parties that benefit from "standardizing" education are the Michigan Department of Education and the testing companies that produce these tests, because standardizing makes their jobs easier. Standardizing teaching and learning doesn't help students, teachers or schools, so why are we spending so much time and money in a futile attempt to make Pearson and ETS's jobs easier?
  • Sandi Jacobs, senior vice president for state and district policy at the National Council on Teacher Quality, said state oversight is important. A new report by her organization says "states have a responsibility to make sure measures are meaningful by providing strong examples, requiring oversight and holding principals and districts accountable for the quality of performance indicators."

    Sometimes it's not about what is said so much as who is saying it. In this case, Ms. Jacobs' employer, the misleadingly-named National Center for Teacher Quality, is an organization that has appointed itself to judge the quality of teacher education programs nationally, while holding exactly zero authority, credibility or expertise to do so. The NCTQ's "methods" for evaluating these programs consist largely of cruising websites and printed materials--they conduct no site visits, interview no faculty or students, and never leave their cozy offices to engage in the messy and complicated work that is true "program evaluation." The NCTQ was created by the conservative Thomas B. Fordham Foundation, and is funded by generous grants from the Bill and Melinda Gates Foundation, which explains the group's obsession with "data-driven decision making," and their uncritical acceptance of the belief that any test score is a useful and valid piece of information when it comes to defunding and destroying the institutions that have traditionally prepared our nation's teachers.

    The NCTQ has a long history of unsavory practices, including the unauthorized use of federal funds to plant positive stories about NCLB in the media--a direct violation of federal policy, for which the NCTQ was cited. In a fascinating twist of fate, the recipient of nearly $250,000 of those funds was a media personality, Armstrong Williams, who is now the chief campaign consultant for the Presidential candidate and retired neurosurgeon, Dr. Ben Carson. So, the next time someone suggests to you that the education reformers are political "progressives," keep this little nugget in mind.

The article concludes with a much-needed breath of fresh air from Mandy Mikita Scott, a choral music teacher in the Rockford Public Schools:

With all the debate over how to measure student growth, it's easy for educators like Rockford Public Schools Choir Director Mandy Scott to be skeptical.

Scott said she believes teachers should have a say in how student growth is measured, and she's hopeful that evaluation methods will be fine-tuned to greater reflect the heart of what she's teaching.

Currently, Scott quizzes her students on music theory, once at the start of the semester and again at the end. If her students' scores improve, they've shown growth. She says the system works, but she envisions a different approach. It would be interesting, she said, to record her students and see how their singing changes over the course of the year, but that's hard to put on a "spreadsheet."


Ms. Scott cuts through the jargon and misdirection of the current rhetoric on teacher evaluation to get directly to the heart of the matter. The assessment strategies she is using are appropriate and related to the content that she's teaching. She isn't evaluating her students on their reading ability by administering a music theory test--she's using those music theory tests to understand what her students know about music theory.

What a concept.

She intuitively understands that while recording her students' singing test scores on a "spreadsheet" might be "interesting," the real value in administering these assessments is to know more about how her students' singing has changed "over the course of the year." And that converting a person's singing to a number is a reductionist act that fundamentally changes the nature of that evaluation.

Ms. Scott concludes by saying the following: "At the moment, it doesn't feel like it's really touching the heart of what I'm doing," she said. "I feel like having a number on a page makes it very difficult. If it could be something like submit these recordings and let us take a listen to what we're doing, that could be kind of exciting."

And here is the true brilliance of Ms. Scott's commentary. Perhaps if we listened to the real experts on evaluation--teachers--instead of the self-appointed (NCTQ) and state-appointed (MDE) "experts", we could get back to "the heart of what we are doing" in our schools: encouraging our children to find their voices as scholars, musicians, artists, scientists, mathematicians, geographers, athletes, and citizens; allowing students to find and nurture their talents, interests and passions by offering a rich, diverse curriculum that values and privileges more than just math and reading; and helping our children to become more fully human, rather than simply "career and college ready."

Thank you to Mandy Mikita Scott, and all teachers who are committed to encouraging, helping and nurturing our children so they can find and follow their interests and passions, and who understand that learning is about much more than "numbers on a page."

Write a comment

Comments: 6
  • #1

    Duane Swacker (Thursday, 07 January 2016 11:46)

    "And that converting a person's singing to a number is a reductionist act that fundamentally changes the nature of that evaluation."

    Exactly! As Noel Wilson explained in his never refuted nor rebutted 1997 dissertation “Educational Standards and the Problem of Error” found at: http://epaa.asu.edu/ojs/article/view/577/700

    Brief outline of Wilson’s “Educational Standards and the Problem of Error” and some comments of mine.

    1. A description of a quality can only be partially quantified. Quantity is almost always a very small aspect of quality. It is illogical to judge/assess a whole category only by a part of the whole. The assessment is, by definition, lacking in the sense that “assessments are always of multidimensional qualities. To quantify them as unidimensional quantities (numbers or grades) is to perpetuate a fundamental logical error” (per Wilson). The teaching and learning process falls in the logical realm of aesthetics/qualities of human interactions. In attempting to quantify educational standards and standardized testing the descriptive information about said interactions is inadequate, insufficient and inferior to the point of invalidity and unacceptability.

    2. A major epistemological mistake is that we attach, with great importance, the “score” of the student, not only onto the student but also, by extension, the teacher, school and district. Any description of a testing event is only a description of an interaction, that of the student and the testing device at a given time and place. The only correct logical thing that we can attempt to do is to describe that interaction (how accurately or not is a whole other story). That description cannot, by logical thought, be “assigned/attached” to the student as it cannot be a description of the student but the interaction. And this error is probably one of the most egregious “errors” that occur with standardized testing (and even the “grading” of students by a teacher).

    3. Wilson identifies four “frames of reference” each with distinct assumptions (epistemological basis) about the assessment process from which the “assessor” views the interactions of the teaching and learning process: the Judge (think college professor who “knows” the students capabilities and grades them accordingly), the General Frame-think standardized testing that claims to have a “scientific” basis, the Specific Frame-think of learning by objective like computer based learning, getting a correct answer before moving on to the next screen, and the Responsive Frame-think of an apprenticeship in a trade or a medical residency program where the learner interacts with the “teacher” with constant feedback. Each category has its own sources of error and more error in the process is caused when the assessor confuses and conflates the categories.

    4. Wilson elucidates the notion of “error”: “Error is predicated on a notion of perfection; to allocate error is to imply what is without error; to know error it is necessary to determine what is true. And what is true is determined by what we define as true, theoretically by the assumptions of our epistemology, practically by the events and non-events, the discourses and silences, the world of surfaces and their interactions and interpretations; in short, the practices that permeate the field. . . Error is the uncertainty dimension of the statement; error is the band within which chaos reigns, in which anything can happen. Error comprises all of those eventful circumstances which make the assessment statement less than perfectly precise, the measure less than perfectly accurate, the rank order less than perfectly stable, the standard and its measurement less than absolute, and the communication of its truth less than impeccable.”

  • #2

    Duane Swacker (Thursday, 07 January 2016 11:47)

    In other words all the logical errors involved in the process render any conclusions invalid.

    5. The test makers/psychometricians, through all sorts of mathematical machinations attempt to “prove” that these tests (based on standards) are valid-errorless or supposedly at least with minimal error [they aren’t]. Wilson turns the concept of validity on its head and focuses on just how invalid the machinations and the test and results are. He is an advocate for the test taker not the test maker. In doing so he identifies thirteen sources of “error”, any one of which renders the test making/giving/disseminating of results invalid. And a basic logical premise is that once something is shown to be invalid it is just that, invalid, and no amount of “fudging” by the psychometricians/test makers can alleviate that invalidity.

    6. Having shown the invalidity, and therefore the unreliability, of the whole process Wilson concludes, rightly so, that any result/information gleaned from the process is “vain and illusory”. In other words start with an invalidity, end with an invalidity (except by sheer chance every once in a while, like a blind and anosmic squirrel who finds the occasional acorn, a result may be “true”) or to put in more mundane terms crap in-crap out.

    7. And so what does this all mean? I’ll let Wilson have the second to last word: “So what does a test measure in our world? It measures what the person with the power to pay for the test says it measures. And the person who sets the test will name the test what the person who pays for the test wants the test to be named.”
    In other words it attempts to measure “’something’ and we can specify some of the ‘errors’ in that ‘something’ but still don’t know [precisely] what the ‘something’ is.” The whole process harms many students as the social rewards for some are not available to others who “don’t make the grade (sic)” Should American public education have the function of sorting and separating students so that some may receive greater benefits than others, especially considering that the sorting and separating devices, educational standards and standardized testing, are so flawed not only in concept but in execution?

    My answer is NO!!!!!

    One final note with Wilson channeling Foucault and his concept of subjectivization:

    “So the mark [grade/test score] becomes part of the story about yourself and with sufficient repetitions becomes true: true because those who know, those in authority, say it is true; true because the society in which you live legitimates this authority; true because your cultural habitus makes it difficult for you to perceive, conceive and integrate those aspects of your experience that contradict the story; true because in acting out your story, which now includes the mark and its meaning, the social truth that created it is confirmed; true because if your mark is high you are consistently rewarded, so that your voice becomes a voice of authority in the power-knowledge discourses that reproduce the structure that helped to produce you; true because if your mark is low your voice becomes muted and confirms your lower position in the social hierarchy; true finally because that success or failure confirms that mark that implicitly predicted the now self-evident consequences. And so the circle is complete.”
    In other words students “internalize” what those “marks” (grades/test scores) mean, and since the vast majority of the students have not developed the mental skills to counteract what the “authorities” say, they accept as “natural and normal” that “story/description” of them. Although paradoxical in a sense, the “I’m an “A” student” is almost as harmful as “I’m an ‘F’ student” in hindering students becoming independent, critical and free thinkers. And having independent, critical and free thinkers is a threat to the current socio-economic structure of society.

  • #3

    Mack Benton (Wednesday, 17 May 2017 14:36)

    hello!! Very interesting discussion glad that I came across such informative post. Keep up the good work friend. Glad to be part of your net community.

  • #4

    Electronic drum set (Saturday, 08 July 2017 14:13)

    I admire this article for the well-researched content and excellent wording. I got so involved in this material that I couldn’t stop reading. I am impressed with your work and skill. Thank you so much.

  • #5

    (download mp3) (Saturday, 22 July 2017 04:49)

    If more people that write articles really concerned with writing great content like you, more readers would be interested in their writings. Thank you for caring about your content.

  • #6

    rudolph the red nosed reindeer (Wednesday, 09 August 2017 03:24)

    Thanks for your insight for your fantastic posting. I’m glad I have taken the time to see this.