The U.S.C./Los Angeles Times poll has consistently been an outlier, showing Donald Trump in the lead or near the lead.
Alone, he has been enough to put Mr. Trump in double digits of support among black voters. He can improve Mr. Trump’s margin by 1 point in the survey, even though he is one of around 3,000 panelists.
He is also the reason Mrs. Clinton took the lead in the U.S.C./LAT poll for the first time in a month on Wednesday. The poll includes only the last seven days of respondents, and he hasn’t taken the poll since Oct. 4. Mrs. Clinton surged once he was out of the sample for the first time in several weeks.
How has he made such a difference? And why has the poll been such an outlier? It’s because the U.S.C./LAT poll made a number of unusual decisions in designing and weighting its survey.
It’s worth noting that this analysis is possible only because the poll is extremely and admirably transparent: It has published a data set and the documentation necessary to replicate the survey.
Not all of the poll’s choices were bound to help Mr. Trump. But some were, and it all combined with some very bad luck to produce one of the most persistent outliers in recent elections.
Tiny Groups, Big Weights
Just about every survey is weighted — adjusted to match the demographic characteristics of the population, often by age, race, sex and education, among other variables.
The U.S.C./LAT poll is no exception, but it makes two unusual decisions that combine to produce an odd result.
■ It weights for very tiny groups, which results in big weights.
A typical national survey usually weights to make sure it’s representative across pretty broad categories, like the right number of men or the right number of people 18 to 29.
The U.S.C./LAT poll weights for many tiny categories: like 18-to-21-year-old men, which U.S.C./LAT estimates make up around 3.3 percent of the adult citizen population. Weighting simply for 18-to-21-year-olds would be pretty bold for a political survey; 18-to-21-year-old men is really unusual.
On its own, there’s nothing necessarily wrong with weighting for small categories like this. But it’s risky: Filling up all of these tiny categories generally requires more weighting.
A run of the U.S.C./LAT poll, for instance, might have only 15 or so 18-to-21-year-old men. But for those voters to make up 3.3 percent of the weighted sample, these 15 voters have to count as much as 86 people — an average weight of 5.7.
When you start considering the competing demands across multiple categories, it can quickly become necessary to give an astonishing amount of extra weight to particularly underrepresented voters — like 18-to-21-year-old black men.
OPEN Interactive Feature
Interactive Feature: 2016 Election Forecast: Who Will Be President?
This wouldn’t be a problem with broader categories, like those 18 to 29, and there aren’t very many national polls that are weighting respondents up by more than eight or 10-fold. The extreme weights for the 19-year-old black Trump voter in Illinois are not normal.
■ It weights by past vote.
The U.S.C./LAT poll does something else that’s really unusual: It weights the sample according to how people said they voted in the 2012 election.
Its weights are such that Obama voters represent 27 percent of the sample and Romney voters represent 25 percent, reflecting the split of 51 to 47 percent among actual voters in 2012. The rest include those who stayed home or who are newly eligible to vote.
I’m not aware of any reputable public survey that weights self-reported past vote back to the actual reported results of an election.
You can read more about the U.S.C./LAT “past vote” issue in this August article, but the big problem is that people don’t report their past vote very accurately. They tend to over-report three things: voting, voting for the winner and voting for some other candidate. They underreport voting for the loser.
Credit Stephen Crowley/The New York Times
The same thing is true in the U.S.C./LAT poll. If the survey didn’t include a past vote weight, the past vote of its respondents would be Obama 38, Romney 30. This is a lot like national surveys that were published around the same time as the U.S.C./LAT poll, like those from NBC/WSJ or the NYT/CBS News.
By emphasizing past vote, they might significantly underweight those who claim to have voted for Mr. Obama and give much more weight to people who say they didn’t vote.
Two Key Factors
These two factors — an overweighted sample and the use of past vote — seem to explain the preponderance of the difference between the U.S.C./LAT poll and other surveys.
If the poll was weighted to a generic set of census categories like most surveys (four categories of age, five categories of education, gender and four categories of race and Hispanic origin), Mrs. Clinton would have led in every iteration of the survey except the period immediately after the Republican convention. The U.S.C./LAT poll weights for all of these demographic categories; it just weights to smaller groups.