Archive for the ‘Statistics’ category

The Great Vote Fraud Data Mistake…A Cautionary Tale

May 11, 2017

Just in time for the latest, greatest Shitgibbon pursuit of all those not-good-people who got to vote for his opponent, Maggie Koerth-Baker brings the hammer down.  She’s written an excellent long-read over at Five Thirty Eight on what went wrong in the ur-paper that has fed the right wing fantasy that a gazillion undocumented brown people threw the election to the popular-vote winner, but somehow failed to actually turn the result.

The nub of the problem lies with a common error in data-driven research, a failure to come to grips with the statistical properties — the weaknesses — of the underlying sample or set.  As Koerth-Baker emphasizes this is both hardly unusual, and usually not quite as consequential as it was when and undergraduate, working with her professor, used  found that, apparently, large numbers of non-citizens 14% of them — were registered to vote.

There was nothing wrong the calculations they used on the raw numbers in their data set — drawn from a large survey of voters called the Cooperative Congressional Election Study. The problem, though, was that they failed fully to handle the implications of the fact that the people they were interested in, non-citizens, were too small a fraction of the total sample to eliminate the impact of what are called measurement errors. Koerth-Baker writes:

Non-citizens who vote represent a tiny subpopulation of both non-citizens in general and of the larger community of American voters. Studying them means zeroing in on a very small percentage of a much larger sample. That massive imbalance in sample size makes it easier for something called measurement error to contaminate the data. Measurement error is simple: It’s what happens when people answer a survey or a poll incorrectly.1 If you’ve ever checked the wrong box on a form, you know how easy it can be to screw this stuff up. Scientists are certainly aware this happens. And they know that, most of the time, those errors aren’t big enough to have much impact on the outcome of a study. But what constitutes “big enough” will change when you’re focusing on a small segment of a bigger group. Suddenly, a few wrongly placed check marks that would otherwise be no big deal can matter a lot.

This is what critics of the original paper say happened to the claim that non-citizens are voting in election-shaping numbers:

Of the 32,800 people surveyed by CCES in 2008 and the 55,400 surveyed in 2010, 339 people and 489 people, respectively, identified themselves as non-citizens.2 Of those, Chattha found 38 people in 2008 who either reported voting or who could be verified through other sources as having voted. In 2010, there were just 13 of these people, all self-reported. It was a very small sample within a much, much larger one. If some of those people were misclassified, the results would run into trouble fast. Chattha and Richman tried to account for the measurement error on its own, but, like the rest of their field, they weren’t prepared for the way imbalanced sample ratios could make those errors more powerful. Stephen Ansolabehere and Brian Schaffner, the Harvard and University of Massachusetts Amherst professors who manage the CCES, would later say Chattha and Richman underestimated the importance of measurement error — and that mistake would challenge the validity of the paper.

Koerth-Baker argues that Chatta (the undergraduate) and Richman, the authors of the original paper are not really to blame for what came next — the appropriation of this result as a partisan weapon in the voter-suppression wars.  She writes, likely correctly in my view, that political science and related fields are more prone to problems of methodology, and especially in handling the relatively  new (to these disciplines) pitfalls of big, or even medium-data research. The piece goes on to look at how and why this kind of not-great research can have such potent political impact, long after professionals within the field have recognized problems and moved on.  A sample of that analysis:

This isn’t the only time a single problematic research paper has had this kind of public afterlife, shambling about the internet and political talk shows long after its authors have tried to correct a public misinterpretation and its critics would have preferred it peacefully buried altogether. Even retracted papers — research effectively unpublished because of egregious mistakes, misconduct or major inaccuracies — sometimes continue to spread through the public consciousness, creating believers who use them to influence others and drive political discussion, said Daren Brabham, a professor of journalism at the University of Southern California who studies the interactions between online communities, media and policymaking. “It’s something scientists know,” he said, “but we don’t really talk about.”

These papers — I think of them as “zombie research” — can lead people to believe things that aren’t true, or, at least, that don’t line up with the preponderance of scientific evidence. When that happens — either because someone stumbled across a paper that felt deeply true and created a belief, or because someone went looking for a paper that would back up beliefs they already had — the undead are hard to kill.

There’s lots more at the link.  Highly recommended.  At the least, it will arm you for battle w. Facebook natterers screaming about non-existent voter fraud “emergency.”

Image: William Hogarth, The Humours of an Election: The Polling, 1754-55

How About A Little xkcd Haute Snark?

May 28, 2011

Honi soit qui mal y pense:

I got the heads up to this from John Sundman, a Twitter buddy, (@jsundmanus) who complains that it “is factually wrong; an astoundingly rare occurrence.”  Guess why.  (Sundman’s answer after the jump.)

“discussants are not holding beers.”

Lies, Damned Lies, Statistics: Andrew Sullivan, Brit Election edition

May 6, 2010

Update:  The original of this post mischaracterized the Treasury figures for government spending as a percentage of GDP; I repeatedly referred to them as the percentage of GDP due to the deficit year over year.  It’s been corrected below, and thus, in fact, tracks the figures Sullivan was citing.  The argument remains the same, though in a post piously demanding attention to what numbers tell you, I can’t say I’m not embarassed.  Do not blog after too effusive a dinner party the night before; that’s my motto.

Thanks to friend-of-the-blog Lovable Liberal for the catch.

Andrew Sullivan has been blogging the Brit election extensively, and his reflexive loathing for Labour has come through on a number of occasions.

He has some considered loathing too, I’ll grant you, but he admits that “in my native land, unlike America, I have residual partisan loyalty…” to the party of his youth.

That means its just a bit hard to assign a root cause for his rote repetition of a favorite anti-Labour meme, that the party is a bunch of big government spendthrifts.

It could be Sullivan’s difficulty in dealing with facts presented in the form of quantified data (see for example, this old chestnut). Or it could be a leap to unexamined conclusions propelled by his self- acknowledged Tory partisanship. Or, perhaps most likely, both.

In any event, he parrots the charge that the 13 years of Labour government produced a spending regime that has dramatically changed the size and cost of British government.  He writes:

Britain’s debt piles higher – because 13 years of Labour’s reckless spending has neither solved the country’s social problems nor stabilized the country’s economy….

…And then he attempts to put meat on the bones of that “reckless spending” cliche by borrowing from The Wall St. Journal vie The Corner:

Since 2000, public spending in Britain has grown faster as a share of GDP than any other country in the 28-member OECD — up 17 percentage points to 53% of GDP, compared to 15 points for Ireland and 10 points for Iceland

Sullivan might have wanted to consider his sources.  Doesn’t he know that any statistic with political consequence that emerges from The Wall St. Journal has to be considered guilty until proven innocent — that is, checked for oneself?  And by all that the FSM considers holy (semolina, for one), he of all people has had enough experience of The Corner to realize that they are what Ronald Reagan should have been talking about when he said “don’t trust and verify.” (What — RR didn’t say that? Sorry — ed.)

Shoulda, coulda, woulda … but here, he takes on face value a number that should have provoked more scrutiny.

That would be the date for the start of the time line, 2000.  Why 2000?  First because that marked the lowest deficit figure for all thirteen years of Labour governance — and thus choosing that date, rather than the start of Labour rule in 1997, would make any increase since that time loom larger in percentage terms.  This is called gaming your data.

And then there is the question of context and trend.  What should we make of that one number for a deficit in 2000?  Was it much different from other years’ and other governments’ budget work?  Did what come after trace a steady trend, or were there distinct outliers that need particular explanation?

I’m not going to pretend for a moment that I am an expert, or even knowledgeable about British state finances.  But even from a state  of near total lack of information, it just isn’t that hard to find the broad outlines of the history of UK government deficit spending.  A moment with Teh Google, leads one, for example, to this.

So what happened?

Well, from 1997 to 2007-8, the Labour government spent at levels that ranged between a low of 36.6.% to a high of 41.1% of GDP

From 1990-1997, a Tory government led by John Major, ran budgets that ranged from a low of 39.4% of GDP in the year he took over from Maggie Thatcher, to a high of 43.7% in 1993, from which it declined slowly to the number he handed off to Tony Blair.

Go back to the Thatcher years, and you see the same story.  She inherited a budget that accounted for 45.1% of GDP in Fy 1978-9.  She brought in a slightly reduced percentage the next year, her government’s budget spending coming in equal for 44.7% of GDP in FY 1979-80, but that figure rose for the next several years, and only dropped to the level she inherited in 1985-6.  Her high was 48.1 percent of GDP, and her best year was still above that best number achieved by Blair, with Brown as his Chancellorof the Exchequer — right around 39% for the Tories, compared with the Labour best figure of roughly 36 1/2 percent.

In other words:  for most of its time in office, Labour budgets included deficits well within the historical range established over the previous 18 years of Tory rule.  Just not much change in it — and often below that of their Tory predecessors.

Repeat:  for most of Labour rule, budget deficits were in a very familiar range.  You can debate whether Thatcher, Major, Blair and Brown were all drunken sailors ashore, but that’s a different question than whether or not Blair/Brown/Labour have a distinctively different record on spending than their friends on the right.  You can argue who will best deal with the situation going forward, Cameron, Brown or Clegg — and that’s a different question.  Nothing I’m writing here bears very much on that question (except, perhaps, to call into question the presumption that Cameron will be more fiscally responsible than his peers — but others have much more directly made that same point).

But hold on to the key point:  Most of the recent Labour record is one of ordinary, familiar approaches to the broad outlines of what British governments have approached spending levels for more than three decades.

Still, there is no doubt that the budget deficit is huge now, and the leap in government spending over Labour’s starting point quite noticeable.   From spending 41.1 percent of GDP of 2006-7, Labour governments produced a budgets amounting to 43 percent of GDP in 2008-9, with spending levels that are projected to rise as high as 48.1% in 2009-10 and 2010-11 — the same level as Thatcher’s high.

So, yes, a leap in government spending under Labour in 2008-10 period, just as there has been a leap in spending and deficits under Obama’s adminstration around the same time.

Now, refresh my memory:  what happened in September of 2008?

Oh yeah. The global financial system went into cardiac arrest, the American real estate bubble burst, and economies around the world shuddered under the impact.  US and UK governments responded in classic Keynesian fashion, perhaps not expansively enough, and spent much more than they had to pump capital into the banking system and cash into the daily economy.

Sullivan, of course, has lauded this on the American side, in grand tones and  little posts.  He does not do so for poor Gordon Brown.

Why he didn’t isn’t really that important.

The fact that he didn’t is, as it is a specimen of a dangerously common failure of modern political reporting.

Here’s my credo:  Numbers matter.  Understanding what they do and don’t tell you in any encounter with them is the crucial task for any would-be serious political journalist — hell of anyone who wants to take him or herself seriously as an observer of contemporary life.

Failure to do so means that you will get lots of your writing wrong — and you won’t know it, you can’t know it — until rude and wordy bastards like myself point it out (and one deigns to notice such gnats gnawing on the body politic).  But it matters, to audiences and to any writer who takes their craft seriously.

And in this story, here’s the bottom line:  it is certainly true that government deficit spending in 2010 in Britain (and the US) is much higher as percentage of GDP than it was in 2000.  But it is so for a reason, and that reason is not the one either Brown’s or Obama’s critics say it is.  Stating that out loud, as often as needed, ought to be the job of someone who aspires to be “of no party or clique.”

That is all.

Image:  Martina Schettina, “Fibonacci’s Traum (Dream)” 2008.

Andrew Sullivan Gets Him Some Data: Contraception and Abortion edition.

October 20, 2009

Readers of this blog know that I usually slag off on Sullivan’s reluctance to engage data on issues in which he has strong views.I don’t believe, by and large, that he has a very solid grasp of either quantitative methods or scientific practice. (See, e.g., this post or this one.)

But when he’s onto something he does care about where the data and its manipulation matter, he can be like a dog to a bone, and today’s he’s done good.

The issue?  Whether contraception reduces the incidence of abortion.  His dissection of the dishonest manipulation of the research record (e.g. admixing a study of US women with a 197 country study) can be found here.

His conclusion:

Theocons cannot have it every which way. Practically speaking, if you really believe that all abortion is murder, a huge program of contraception education and access is the most practical life-saver out there. And yet the Catholic pro-lifers refuse to embrace it and go to these kinds of lengths to deny reality. By their own logic, they are the ones enabling the massacre of millions.

Exactly so.

The“every sperm is sacred” crowd has led to enormous suffering.  It’s good to see Andrew call them out on this one.*

I’m about to wrap up a three part post on Sullivan’s theodicy issues (part one and part two, for your delectation) and I’ve had some harsh things to say, with worse to come in the last section.  But when he is able to achieve some remove from his own internal conflicts on the vexing tensions in his faith, he is no dummy, not at all.  Credit where credit is due….

*Note:  I haven’t linked all the way through to the deeply disturbed person who calls herself the Anchoress…but if you want to see that with which Sullivan’s arguing, go ahead.  And surf through the second link; Elizabeth Pisani is as good as it gets on deflating the murderous hypocrisy and self-delusion of the better-to-die than-than-have-safe-sex crowd.

Image:  Postcard published in 1909; photograph by Irvin M. Kline, 1907.

I know I’ve been AWOL and this isn’t about science at all, but…

March 17, 2009

I do find it really odd (I’ve just taken my medication for the day, and so can express myself so mildly) that the members of the same mass media organizations that worry whether or not President Barack Hussein Obama (I still love writing that) has taken on too much as he attempts to undo the disasters left for him by the previous criminal conspiracy administration [too many to link — ed.] actually complain when President Obama’s spokesman pokes a little fun at the previous administration’s consgliore Vice President after said consigliore eminence grise attempts to portray Obama’s efforts to stop torturing people is a bad thing.

So, it’s ok to accuse someone of endangering the country because he won’t destroy the ability to prosecute terrorists by torturing them to the point of insanity, but it’s not OK to compare the worst vice president in history to a drug dependenent, grotesquely obsese, yakmeister most-popular conservative entertainer in the country?*

[I never pass up a chance to post this portrait of the de-facto dear leader of the Republican Party–ed.]

And, by the way, to fret when the new President shows himself able to actually think about two matters in the same day?

There are lots of structural reasons why the main stream media is falling apart.  But the truth of the matter is that every major dead-tree and traditional broadcaster/cable net had, not too long ago, a serious brand, a name that could be seen as a destination.  CBS, the home to the correspondent who was shocked, shocked, to learn that the Obama administration might not take the former grossly over-promoted button man vice president seriously, was once the Tiffany Network, the place that Murrow and Cronkhite called home — the network that proved it was possible to do real news reporting in that gossamer medium with all those moving pictures.

Now?  With honorable exceptions, its a self-parody, in which the only consolation is that it is not yet Fox “News” [sic — ed.].  (Late breaking:  ABC and MSNBC pile on to defend poor little Dick’s honor, with no sign that they saw anything amiss, or a mite disrespectful,  in Mr. Cheney’s original feral fantasies.

These are all still influential venues, which is of course the problem.  But, speaking as someone who grew up personally and professionally in the traditional medium (stints, early on, at Time magazine and all), this smells to me like the end game.  The institutions that are trumpeting this stuff the loudest are economically and psychologically tied to the idea of being platforms of mass media.  This kind of partisan commentary masquerading (poorly) as commentary is the staple of niche media.  And in that space, there is already plenty of competition, lean, mean and ready to eat the dinosaur’s lunch.

So the next time someone complains about the fate of newspapers or the decline of venues for national conversation, you may (and I do) agree that there is a real loss there.  But recall as well that the supposed moderators of that national agora chose to piss it away in defense of an oilpatch chickenhawk with an eagerness to trade in other’s pain.

*Limbaugh apparently pulls a rating of around or just below 6 — as in 6 percent of radio listeners tuned in to anything at the time of his broadcasts are listening to him.  That’s a serious number. Still, as this report shows, there are some curiousities lurking beneath the gross figures that undergird the fat man’s fat paycheck.

Image:  Cornelis de Vos, “The Triumph of Bacchus,” 17th c.

Comrade Fidel Hearts Him Some Statistical Reasoning

March 11, 2009

Who knew? (h/t Tom Verducci.)  Fidel intelligently criticizing play in the World Baseball Classic by the Japanese team and his own Cubans with a nod to the insights gained from sabremetric approaches to the game.

Ordinarily I ‘d go on here with some pieties about the importance of statistical reasoning, and note that baseball’s mathematicians have actually brought this notion into common public understanding more swiftly and completely than decades of science writing about risk and confidence and all the rest.  (Thank you Bill James, and all your intellectal heirs.)

I might be tempted to talk about the difficulty of modeling the kind of dynamic system that baseball represents, with its enormous range of possibilities to which is added the complication posed by the fact that the responses of people to the knowledge gained by the statistical study of baseball alters that terrain.

But I’m just going to sit back in wonder.  Fidel Castro has a blog?*

Either this means our brave new world (that has such digitalia in’t) is truly manifest…or else all of us here in bloggy pastures have truly jumped the shark.

Good morning, all.

Image:  Thomas Eakins, “Baseball Players Practicing,” 1875.

*The site seems legit, but I’m no kind of a Cuba expert, and I could see the joy possible in a fake Castro blog.   For the purposes of this post, I’m prepared to file this in the too-good-to-check bin of blogospheric delights.  but as with any claim on the internet, if you need to know the material contained within, channel a (very) little bit of Ronald Reagan here and Trust, But Verify.

A Population Lagniappe

September 4, 2008

After the reams of bytes expended on a tour through the fantasy of small town American below, just enjoy yourselves with this:

This is a map of the US population density by night.  Click the link for a better view.  If this map were wall-sized, each white dot would represent 1,000 people

Source:  U.S. Census Bureau.