Ask the grumpies: Class size research

August 22, 2014 — nicoleandmaggie

Just curious whether you have any opinion on the Hoxby class size research (in Connecticut) that Gladwell discusses.

Here’s an interesting summary of class size research from Brookings. It is worth reading if you’re interested in the topic.

There’s a lot of stuff going on with class size research (it is, in fact, the topic going through the Stock and Watson undergraduate econometrics textbook because it has been attacked through most standard econometrics methods).

A couple of important things to note about external validity for these studies:

1. Natural experiments (and, indeed, standard experiments) are only as externally valid as the experiment itself. That means that a study that finds an effect on kindergarteners is not going to necessarily say much about high school students. We know a lot about class size and K-3, we don’t know so much about middle grades or higher. This particular experiment is on 4th and 6th grade. It argues that it gets cumulative effects of class size by cohort size, but when a cohort is expected to be a certain size, districts may plan differently by moving bad teachers to small cohorts and good teachers to larger cohorts etc. They may do the same with aides when deciding where to make a class-size split, or they may make specific decisions about where to put the problem kids or whether to do tracking or clustering. That kind of planning would completely wash out the effect in a way that you would not see if all classes were restricted to a certain size because of a policy change. That kind of planning is more likely to be going on in the type of natural experiment that Hoxby examines in this study.

2. Class size decisions are not made in isolation. A policy asking for extra money from the federal government to reduce class size is going to provide different results than a policy that is forced to take that extra money out of another budget. Generally, research suggests that, believe it or not, most schools are doing the best that they can with the budgets that they have. When you give them an unfunded mandate, outcomes are hurt in ways that they wouldn’t be if you gave them a funded mandate. Hiring more new teachers and buying portables while taking money away from other programs may end up having a negative effect even if smaller class-sizes are beneficial. The type of natural experiment Hoxby is looking at is one of these situations– the budget isn’t changing based on class-sizes, they get the same $/kid whether they’re in a large cohort or a small cohort. The only thing that changes is the expense from economies of scale (whether they need one teacher/classroom or two). That’s a different situation than one in which expenses for everything else stays the same but the district gets extra money to hire more teachers and buy portables.

So, do Hoxby’s results mean that class size is unimportant? No. They just show that it seems to be unimportant in the type of situation that she’s studying, one in which variations in elementary school class-size are caused by variations in cohort size. That’s why there’s a large literature on this topic– the answer is different in different situations. We need a lot of experiments and natural experiments to get the full picture.

Side note: Caroline Hoxby is one of my personal heroes. If I ever decide to give up this academia thing, I’m totally going to beg her for an RA job. She is an amazing economist. Also, rumor has it (aka multiple of her coauthors has mentioned) that she is one of those people who sleeps 4 hours/night every night because of low sleep need.

Posted in Uncategorized. Tags: ask the grumpies, economics, education. 7 Comments »

Fiona McQuarrie (@all_about_work) Says:
August 22, 2014 at 2:28 am

Thank you! This is very interesting and helpful.

Comradde PhysioProffe Says:
August 22, 2014 at 10:43 am

It sounds almost impossible to disentangle with so many confounding variables covarying with class size in these natural experiments. You really need a controlled prospective experiment.

nicoleandmaggie Says:
August 22, 2014 at 10:46 am
That was called the Tennessee Star experiment. It’s why we know so much about K-3. And there were/are some problems with that kind of experiment too.

anandar Says:
August 22, 2014 at 1:11 pm

Yes, thank you for this post, very helpful. I will take all statements about the impact (or lack thereof) of class size with a particularly large dose of salt.

I would really be curious about how many of the studies equated “student learning” with “change in performance on standardized test scores.” As a parent, I of course wish my children had smaller class sizes, but not really because I think they would result in better standardized test performance– but because if my daughter, say, was 1 of twenty rather than 1 of 25-30 (the norm in lower elementary in our CA urban public school), it would be easier for her to know/be known by and build a relationship with her teacher; it would be easier for the teacher to do more creative out-of-school or prep-intensive projects; and the risks of having a class that is poorly managed would be lessened. Only the latter really seems like it would have an impact on standardized test scores, and the capacity to spend more time on “outside the box” learning could even cause test scores to go down (=less time on test prep).

I am generally a standardized test grump, but it bothers me the most when scores get used as an unacknowledged proxy for something bigger and better– like “student learning.”

Rosa Says:
August 22, 2014 at 10:43 pm
I would think personal relationships would affect test performance, both motivationally (“I want to do well and reflect well on this teacher and receive praise”) and in terms of the kinds of confidence that are important to test taking. That sure matches my personal experience – since standardized tests are inherently unrewarding, a personal relationship with someone who’s going to see or be affected by them would stop me from treating them as speed challenges or just picking random answers in a way just “have to take a test” wouldn’t.

nicoleandmaggie Says:
August 23, 2014 at 10:27 am
To answer, yes most of these studies do use standardized tests. However, many of the big studies occurred prior to NCLB, which means they were low stakes tests (so no incentive for teachers to cheat or to “teach to the test”) and they were external to the classes (again, so no “teaching to the test”). Things like the Iowa Test of Basic Skills and some other ones whose names I don’t remember because it’s not my area of expertise (though I do know a lot just because so much of applied micro is education).

There are a few education studies that use earnings as the outcome variable, mostly by Josh Angrist, IIRC. (And that recent one by Chetty et al. that’s getting a lot of mixed press lately.) There’s problems with using earnings as an outcome though because additional education will mess up your earnings profile. And being a hedge fund manager isn’t necessarily as worthy a goal as being a social worker, even if the former pays a heck of a lot more. And where do you put people who are financially independent or living on their spouses’ income– are they failures? Hard to say. We don’t have a lot of wealth info.

Much of the long-term preschool research uses, in addition to test scores and earnings, things like teen pregnancy, jailtime, dropping out etc. and other things that most people would agree are actually bad.

- anandar Says:
  August 25, 2014 at 2:05 pm
  It is interesting that the impact Rosa is discussing– a personal relationship with a teacher being a motivator for doing well on a test– would only seem to me to make a difference if the testing WAS high stakes. When I was a public school student, long long ago, I took the (low stakes) Iowa Basics but it never occurred to me that my teachers looked at or cared about the results, so the quality of my relationship with a teacher wouldn’t have directly affected my motivation to perform on the test, I don’t think. I always looked at the results, however, and cared deeply, because I was the kind of kid who loved getting As/gold stars, and my scores on standardized tests always gave me that gold-star pleasure rush. Not a dynamic I particularly want to pass along to my kids.
  
  My other reason for being sceptical about standardized test results as a measure for evaluation ed policy options, fyi, is having a partner with a PhD in educational assessment and evaluation– he has helped both draft and evaluate standardized tests used by the biggie testing companies, and as a result has a very low opinion of their ability to test the things that are most important (e.g., that they are not very thoughtful, inherently biased, and don’t measure the kinds of learning that we most care about as parents). But there is no good solution that is as (relatively) cheap as standardized tests. While the long-term preschool data is helpful, it is never going to be a very efficient way to help with policy decisions like class size, especially when we have relatively quick yearly standardized test results that appear to be so powerfully explanatory.
  
  I do hope that Common Core-alignment will improve the quality of standardized testing, but my partner is pretty pessimistic…

Grumpy Rumblings (of the formerly untenured)

Disclaimer