statistics | Chris Aldrich

Read Why Don’t Polls Have More Information About Black Voters? by Kevin Drum (Mother Jones)

Rashawn Ray wants us to stop treating African Americans as a monolithic group:

Black Americans vote on par or higher than their state population. They represent a significant share of Democratic voters, especially in states like South Carolina (nearly 60%). Despite representing this large voting bloc, polls such as Quinnipiac continue to frame black Americans as a monolithic group, while disaggregating white people by age, political identification and education.

I argue it is important to see the heterogeneity of black Americans. Others agree. Professor Eddie Glaude Jr said: “We have to be more nuanced in how we talk about black voters. I would love to see the breakdown of the Q poll. Age. Class. Etc.” Rolling Stone writer Jamil Smith said, “I’ve examined the newest Quinnipiac poll very thoroughly … and unfortunately, it does not break down black voters by age, class, education, or even gender. Just ‘Black.’ White respondents receive more nuanced treatment in the poll.”

The problem here is not one of racism, but of statistics. The average poll reaches about a thousand people. Of those, about 13 percent are likely to be black. If you then break things down by, say, age, you’ll have only about 30-40 respondents in each group. Unfortunately, as the group size goes down, the margin of error for each group goes up. In this case, the margin of error for each of the age groups is upwards of 15-20 percent, which makes the results useless. It would be a dereliction of duty to even report them.

Some polls oversample blacks and Hispanics to avoid this problem, but that’s expensive. It’s usually done infrequently, and only for surveys specifically aimed at reporting the views of one ethnic group. So don’t blame Quinnipiac for this. It’s a problem of arithmetic and money, not bad faith.

📖 I’m 10% done reading Economy, Society, and Public Policy by CORE Team

Finished chapter one. I like that this text has so many linked resources, but some of the links to the sister texts make me think I’d be getting a deeper and more technical understanding by reading them instead of this more introductory text. Still, this has some tremendous value even as a refresher.

Annotations from Unit 1 Capitalism and democracy: Affluence, inequality, and the environment

Government bodies also tend to be more limited in their capacity to expand if successful, and are usually protected from failure if they perform poorly. ❧

They can expand in different ways however. Think about the expansion of empires of Egypt, Rome, and the Mongols in the 12th Century. What caused them to cease growing and decrease? What allowed them to keep increasing?
Annotated on February 10, 2020 at 04:50PM

Capitalism is an economic system that can combine centralization with decentralization. ❧

How can we analogize this with the decentralization of the web and its economy?
Annotated on February 10, 2020 at 04:50PM

Market competition provides a mechanism for weeding out those who underperform. ❧

Note how this has failed in the current guilded age of the United States where it is possible for things to be “too big to fail”.
Annotated on February 10, 2020 at 04:50PM

First, because capital goods do not fall from the sky: all countries that have successfully moved from poverty to affluence have done so, of necessity, by accumulating large amounts of capital. We will also see that a crucial feature of capitalism is who owns and controls the capital goods in an economy. ❧

Annotated on February 10, 2020 at 03:11PM

Yet some things that we value are not private property—for example, the air we breathe and most of the knowledge we use cannot be owned, bought, or sold. ❧

Annotated on February 10, 2020 at 04:49PM

We should be sceptical when anyone claims that something complex (capitalism) ‘causes’ something else (increased living standards, technological improvement, a networked world, or environmental challenges), just because we can see there is a correlation. ❧

Great and ridiculous examples of this can be found at https://www.tylervigen.com/spurious-correlations
Annotated on February 10, 2020 at 08:59PM

Figure 1.16 ❧

Note the dramatic inconsistency of the scale on the left hand side. What is going on here?
Annotated on February 10, 2020 at 09:23PM

Firms should not be owned and managed by people who survive because of their connections to government or their privileged birth: Capitalism is dynamic when owners or managers succeed because they are good at delivering high-quality goods and services at a competitive price. This is more likely to be a failure when the other two factors above are not working well. ❧

Here is where we’re likely to fail in the United States by following the example of Donald Trump, who ostensibly has survived solely off the wealth of his father’s dwindling empire. With that empire gone, he’s now turning to creating wealth by associating with the government. We should carefully follow where this potentially leads the country.
Annotated on February 10, 2020 at 09:31PM

In some, their spending on goods and services as well as on transfers like unemployment benefits and pensions, accounts for more than half of GDP. ❧

What is the Government’s proportion of the US GDP presently?
Annotated on February 10, 2020 at 09:34PM

James Bronterre O’Brien, told the people:‘Knaves will tell you that it is because you have no property, you are unrepresented. I tell you on the contrary, it is because you are unrepresented that you have no property …’ ❧

great quote
Annotated on February 10, 2020 at 09:53PM

Yet some things that we value are not private property—for example, the air we breathe and most of the knowledge we use cannot be owned, bought, or sold. ❧

Annotated on February 10, 2020 at 04:49PM

Annotated The Dan MacKinlay family of variably-well-considered enterprises by

Dan MacKinlay (danmackinlay.name)

A statistician is the exact same thing as a data scientist or machine learning researcher with the differences that there are qualifications needed to be a statistician, and that we are snarkier. ❧

Bookmarked The Top 3 Books to Get Started with Data Science Right Now (towardsdatascience.com)

Python For Data Analysis
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow is by far the best book to get started with machine learning.
Introduction to Statistical Learning.

🎧 Steven Strogatz Bonus – What to Do When Things Keep Changing! | Clear+Vivid with Alan Alda

Listened to Steven Strogatz Bonus - What to Do When Things Keep Changing! by Alan Alda from Clear+Vivid with Alan Alda

Alan Alda wanted to get off the island quickly. Steven Strogatz explains how an 18th century British clergyman could have helped. In this short bonus episode, Steven helps Alan understand something that he’s wondered about for years.

Quadrilateral equation?? Did he mean the Pythagorean theorem?

There’s a reasonable basic discussion of Bayesian statistics here.

👓 Scientists rise up against statistical significance | Nature

Read Scientists rise up against statistical significance by Valentin Amrhein, Sander Greenland & Blake McShane (Nature )

Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects.

👓 Differential privacy, an easy case | accuracyandprivacy.substack.com

Read Differential privacy, an easy case (accuracyandprivacy.substack.com)

By law, the Census Bureau is required to keep our responses to its questionnaires confidential. And so, over decades, it has applied several “disclosure avoidance” techniques when it publishes data — these have been meticulously catalogued by Laura McKenna

I could envision some interesting use cases for differential privacy like this within an IndieWeb framework for aggregated data potentially used for web discovery.

👓 Cornell researcher who studied what we eat and why will step down after six studies are retracted | Los Angeles Times

Read Cornell researcher who studied what we eat and why will step down after six studies are retracted (Los Angeles Times)

Cornell University says Brian Wansink will step down at the end of the academic year after a review of his work turned up many problems.

👓 Curve-Fitting | xkcd

Read Curve-Fitting (xkcd.com)

Cauchy-Lorentz: "Something alarmingly mathematical is happening, and you should probably pause to Google my name and check what field I originally worked in."

I love that it’s all the exact same data points…

👓 Why Le’Veon Bell Might Make More Money If He Ends His Holdout Now | Five Thirty Eight

Read Why Le’Veon Bell Might Make More Money If He Ends His Holdout Now by Josh Hermsmeyer (Five Thirty Eight)

Last weekend, Steelers running back Le’Veon Bell sat out the first game of the regular season rather than play under the NFL franchise tag. Slated to earn $14.5 million in guaranteed money in 2018, Bell loses out on $855,529 each week he fails to report. The franchise tag would make Bell the third highest paid running back in the NFL this season — but only if he actually plays. Around the league, there is a wide range of speculation on how long Bell’s holdout will last. ESPN’s Adam Schefter reports that his sources believe Bell could be back by the end of September, while others note his holdout could conceivably last through Week 10.

👓 Zip codes vs census tracts | Nelson’s log

Read Zip codes vs census tracts (Nelson's log)

A lot of digital maps use zip codes as a binning feature. Election maps, property value maps, pollution maps. But while zip codes are convenient and familiar there’s a much better set of poly…

👓 Take This Cheat Sheet To The Ballpark To Decide When To Leave | FiveThirtyEight

Read Take This Cheat Sheet To The Ballpark To Decide When To Leave (FiveThirtyEight)

According to our statistical model, based on 2010-2015 regular season inning-by-inning scoring data,3 you should leave after the sixth inning if the leading team is ahead by four or more runs. There is a less than 5 percent chance that the other team will deliver a miracle comeback. If the run differential exceeds two at the top of the ninth, it’s safe to head to the exits. What about blowouts in the first inning? If your time is that precious — and you’re willing to view the money spent on tickets as a sunk cost — our advice is to rev up your car’s engine if the leading team jumps ahead by six runs or more. In developing the cheat sheet, we tolerate a 5 percent false positive error rate. Take the 2016 season as an example. The impatient fan who took our advice would have left early in 1,750 games, but in 61 of those games, the eventual winner came from behind to win, and so the fan missed out on some later-inning excitement. For that season, our model attained an accuracy rate of 97 percent.

We really need the other bound as attempting to see the exciting last minute come backs are some of the best parts of baseball!

👓 A statistical analysis of the art on convicts’ bodies | The Economist

Read A statistical analysis of the art on convicts’ bodies (The Economist)

What can be learned from a prisoner’s tattoos

Some crazy stuff in here.