Quixotic Reimagining of Standardized Tests (Part 1)

Life update: I got my driver’s license from the place where I learned to drive. Then I drove home from there with my mom, and it was zarking terrifying.

Early in the morning tomorrow, I have a small surgical operation, so I can’t sleep too late. (Well, it ended up being pretty late anyway. Darn.) Therefore I think I’m going to do something unprecedented on this blog for the daily posting streak: I’m going to post an incomplete non-expository post.

Yes, the only purpose of the title is to get initials that are four consecutive letters of the alphabet..

One of the more argumentative post sequences on my blog involved ranting against standardized tests.

My very first stab was probably the silly satire directed at the test everybody has to take that takes up two hours per day of an entire week. Once college became a thing in my life, I wrote a humblebrag rant after I took the SAT and then a summary post after I snagged this subject for an English class research paper and finished said paper.

It should be plenty clear that I am not ranting against this part of the system because it’s disadvantageous to me.

But it should also be said that I’ve read some convincing arguments for using standardized tests more in college admissions (Pinker, then Aaronson). Despite the imperfections of tests, they argue, the alternatives are likely to be less fair and more easily gamed. The fear that selecting only high test-scorers will yield a class of one-dimensional boring thinkers is unfounded. And the idea that standardized tests “reduce a human being to a number” may be uncomfortable for some, but it makes no sense to prioritize avoiding a vague feeling of discomfort over trusting reliable social science studies. Neither article, you will note, advocates selecting all of one’s college admits based on highest score. Just a certain unspecified proportion, one that’s probably a lot larger than it is today.

And although I wish the first article linked its studies, I mostly agree with their arguments. So this puts me in a tricky position. These positions I’ve expressed seem hard to reconcile! So, after arguing about all this with a friend who told me things like

I think you fail to understand how anti-intellectual american society is

(comments on this statement are also welcome) I think some clarifications and updates on how I feel are in order.

Firstly, in my community, I think the perception that standardized tests matter is much stronger than it is in the States. If not, at least the actions people take to have their children do well on standardized tests are more drastic. I know students who start doing standardized test prep by early ninth grade and classmates who spent full weeks of mornings and afternoons at summer study programs for the same purpose. These are the people my rants were primarily intended for.

Secondly, my primary beef with standardized tests is not argued from the perspective of the colleges, it’s from the perspective of the students. I don’t have strong beliefs about whether standardized tests assess people well. Okay, the research paper I said I wrote for school was argued from the former perspective and claimed in its thesis statement that the SAT “should be de-emphasized relative to other predictors for predicting success”. But, well, it’s a research paper school assignment, and I had to pick a topic for which I could cite lots of sources. You will also note that “de-emphasized” and “relative” are pretty waffly words. That was, in fact, already a compromise with the teacher; my earlier theses were even wafflier. I don’t stand very firmly behind that thesis statement. Even in the version I ended up putting on my blog, I exaggerated my position a bit. I’m not statistics-literate enough to know whether an “improvement of 0.08 correlation” deserves to be described by the adjective “marginal”.

This was my stubborn belief-persistent tweet in response to the SAT overhaul:

I’d like to recant the statement “Standardized tests are still bad”. It’s a great overgeneralization. The driving test I just took was pretty standardized, and studying for it and taking it left me a nervous wreck, but I would not advocate de-emphasizing it relative to other predictors of driving skill. I would not like to drive in a society where the other drivers do not have a standardized understanding of traffic rules and signal light customs and what all the weirdly colored arrow signs mean.

There’s also this parallel: many of my beloved math competitions or olympiads could be argued to be kind of like standardized tests. The IMO has a detailed appeal chain to ensure each paper is graded equally and a crude norming process in the proportions of medals it awards so that their values are approximately constant from year to year, right? (Of course the small number of problems inevitably makes luck play a huge role, but still.)

Of course, driving tests and math competitions are very different in structure and function from aptitude/achievement tests that are supposed to test college-readiness. The first test is very specific and narrowly tailored to test one skill — driving — and passing the test is required for certification to perform one narrow task — driving. Math competitions are also specific and target a narrow audience of people who are already interested in mathematics, and nobody is requiring that these competitions be taken to do anything (although presumably it does look good on one’s college applications), so (I hope) most contestants are contestants because they enjoy it. Meanwhile, the standardized tests for college mash together tests for various nebulous areas of skills as part of a gateway to college, where you might choose to study anything from physics to politics or poetry, and the degree to which you enjoy or require the skills you’ve been tested on will vary wildly. Standardization is a lot more acceptable to me without the business of “judging a fish by its ability to climb a tree”.

I’ll come back to this point, but the key takeaway is that standardization does not necessarily lead to unfair or arguably imprudent comparisons.

My far more strongly held belief was expressed with everything in the second half of the blog post: it sucks that students have to prepare for the SAT in particular, because practicing for the types of questions they pose is so soul-suckingly unproductive.

For example, when I read Aaronson’s (probably rhetorical) question:

…spots at the top universities are so coveted, and so much rarer than the demand, that no matter what you use as your admissions criterion, that thing will instantly get fetishized and turned into a commodity by students, parents, and companies eager to profit from their anxiety. If it’s grades, you’ll get a grades fetish; if sports, you’ll get a sports fetish; if community involvement, you’ll get soup kitchens sprouting up for the sole purpose of giving ambitious 17-year-olds something to write about in their application essays. […] So, given that reality, why not at least make the fetishized criterion one that’s uniform, explicit, predictively valid, relatively hard to game, and relevant to universities’ core intellectual mission?

this is what I wanted to reply: Since the people who will fetishize admissions criteria are guaranteed to do so, why not pick a criterion that aligns the goal of self-actualization with that of getting into a good college, for those people who don’t want a top college spot that badly? Measuring certain forms of extracurriculars, if it could be done fairly, lets students who want to get into a “good school” without selling their soul in the process know that they can pursue the activities they like without sacrificing college-quality points. Measuring a standardized test — or, at least, measuring the lifeless SAT — means those students have to allocate time and effort between preparing for it and preparing for excellence in whatever they’re interested in, with no way of achieving both.

But then I realized this response pretty much ignores all the beneficial qualities of standardized tests that the actual question listed, instead simply aiming to make the students’ life decisions easier. So this is probably not very convincing to admissions people, to put it mildly.

There are still several ways out. If standardized tests measure inherent aptitude that can’t be improved by studying, and everybody could be convinced of this, that would resolve this issue too. I think it would be somewhat cruel to match students to certain tiers of colleges without offering them any hope via ways to get ahead, but at least they could reassuredly spend the spare time of their middle- and high-school years for themselves. Unfortunately, even if standardized testing really isn’t amenable to coaching (my beliefs about this statement are quite weak), I an reasonably confident that social science studies could not produce evidence that’s strong enough to convince most people (in the society that I know and grew up in) not to have their child take those SAT or ACT classes advertised by the local test prep company. This would be much worse if colleges ever emphasize standardized test scores more than they do now.

So my conclusion is that standardized tests, and studying for said tests, will both still have to be things. But I don’t think my two goals are irreconcilable. If I had to design and implement a standardized test from scratch, one that I might hope could satisfy both institutions seeking objective metrics and students seeking a study experience that wasn’t completely awful, what would I imagine?

Part 2 will pick up here when I feel like it.


3 thoughts on “Quixotic Reimagining of Standardized Tests (Part 1)

  1. Well, the benefits of standardized tests don’t really work out if the standardized test is not a useful one, and the SAT probably qualifies as not-a-useful-one. Also, I am decently convinced that there does not exist a test for which studying is not noticeably helpful (provided you aren’t already going to score really well anyway).

    But with respect to the effects of relying soley on a standardized test, admittedly I only have a very cursory and possibly false understanding of how it works, but presumably China and its gaokao can be used as a model (or any other nation which depends strongly on standardized tests for admission). Other than this, though, you can theorycraft all you want in every direction on whether people will just spend all their time studying or if they’ll do other things, but it really depends on exactly how dedicated people are going to be to studying for the standardized test.

    And with respect to fetishization of whatever criteria you use– perhaps if you set the right criteria, its fetishization might actually be good. To use a handwavy example, if colleges switched to admitting people based on their awesomeness, the consequence is that the youth of America (or whatever region) will become awesome, which really isn’t so bad. Or, at least, appear awesome. Of course awesomeness isn’t really something which can be evaluated in a reasonable ungameable manner, and in fact there might not exist such an “ideal criterion”.

    I also don’t know what admissions people actually want in their students. Because if what they want is a diverse interesting student body, obviously a standardized test won’t give them that unless the designers of the test are actually magical (or at least, it won’t optimize for it– in particular, the extracurricular activities of the student body will be left to chance, which admittedly might work but it presumably won’t be as good as actually optimizing for it somehow). Whereas if they just care about academics, obviously a (good) standardized test is the way to go.

    Although you know, maybe one key to a good standardized test which the SAT (and APs, and everything else people use) SORELY lacks, is a good difficulty gradient. Like maybe your standardized test is the amalgamation of USAMO, USACO, USAPhO, USNCO, USABO, NaCLO (why not), etc. including the qualifying exams (like maybe keep a few high-level questions, but obviously if all the test questions are unsolvable by most people the test won’t distinguish them– but presumably many people can legitly solve at least one AMC (8?) question), and maybe throw in language tests (oops I totally forgot about humanities haha), history tests, running tests, or swim tests, or musical instrument tests, or whatever. Admittedly this would make the test very grueling, unless the test format is “choose a limited number of these sections to complete” or the time limit is simply too short to complete all the sections. And if your local test prep center can train students to get a 42 on the USAMO, well, that’s probably not that terrible of a world.

