Read How I did a Twitter giveaway, got 10K+ new followers and discovered you can hack most giveaways to win them (levels.io)
It was almost New Year's Eve and I wanted to do something special on Twitter. I had 69,800 followers and because I admittedly am an imperfect and superficial human addicted to vanity metrics, I wanted to get to 70,000 followers before midnight and it becoming 2020. To celebrate

My friend Marc again to the rescue. He suggested that since there was 10,000+ people RT’ing and following, I could just pick a random follower from my current total follower list (78,000 at this point), then go to their profile to check if they RT’d it and see. If they didn’t, get another random follower and repeat, until you find someone. With 78,000 followers this should take about ~8 tries.

Technically he said it would be random among those who retweeted, but he’s chosen a much smaller subset of people who are BOTH following him and who retweeted it. Oops!
Annotated on January 13, 2020 at 01:10PM

So, based on your write up it sounds like you’re saying that if one retweeted, but wasn’t following you, one had no chance of winning. This means a few thousand people still got lost in the shuffle. Keep in mind that some states have laws regarding lotteries, giveaways, games like this. Hopefully they don’t apply to you or your jurisdiction.

Read Yet another view of the negative binomial by John D. CookJohn D. Cook (johndcook.com)

One of the shortcomings of the Poisson distribution is that its variance exactly equals its mean. It is common in practice for the variance of count data to be larger than the mean, so it’s natural to look for a distribution like the Poisson but with larger variance. We start with a Poisson random variable X with mean λ, but then we make λ itself random and suppose that λ comes from a gamma(α, β) distribution. Then the marginal distribution on X is a negative binomial distribution with parameters r = α and p = 1/(β + 1).

The previous post said that the negative binomial is useful because it has more variance than the Poisson. The derivation above explains why the negative binomial should have more variance than the Poisson.

Bookmarked The Top 3 Books to Get Started with Data Science Right Now (towardsdatascience.com)
Python For Data Analysis
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow is by far the best book to get started with machine learning.
Introduction to Statistical Learning.

👓 Scientists rise up against statistical significance | Nature

Read Scientists rise up against statistical significance by Valentin Amrhein, Sander Greenland & Blake McShane (Nature )
Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects.

👓 A Songwriting Mystery Solved: Math Proves John Lennon Wrote ‘In My Life’ | NPR

Read A Songwriting Mystery Solved: Math Proves John Lennon Wrote 'In My Life' (NPR | Weekend Edition Saturday)

Over the years, Lennon and McCartney have revealed who really wrote what, but some songs are still up for debate. The two even debate between themselves — their memories seem to differ when it comes to who wrote the music for 1965's "In My Life."

Mathematics professor Jason Brown spent 10 years working with statistics to solve the magical mystery. Brown's the findings were presented on Aug. 1 at the Joint Statistical Meeting in a presentation called "Assessing Authorship of Beatles Songs from Musical Content: Bayesian Classification Modeling from Bags-Of-Words Representations."

👓 How an Ex-Cop Rigged McDonald’s Monopoly Game and Stole Millions | The Daily Beast

Read How an Ex-Cop Rigged McDonald’s Monopoly Game and Stole Millions (The Daily Beast)
Jerome Jacobson and his network of mobsters, psychics, strip-club owners, and drug traffickers won almost every prize for 12 years, until the FBI launched Operation ‘Final Answer.’
A great little story here. I can see why Matt and Ben bought it.

👓 If You Say Something Is “Likely,” How Likely Do People Think It Is? | Harvard Business Review

Read If You Say Something Is “Likely,” How Likely Do People Think It Is? (Harvard Business Review)
Why you should use percentages, not words, to express probabilities.

Highlights, Quotes, & Marginalia

Phil Tetlock, a professor of psychology at the University of Pennsylvania, who has studied forecasting in depth, suggests that “vague verbiage gives you political safety.”  

This result is consistent with analysis by the data science team at Quora, a site where users ask and answer questions. That team found that women use uncertain words and phrases more often than men do, even when they are just as confident.  

A large literature shows that we tend to be overconfident in our judgments.  

The best forecasters make lots of precise forecasts and keep track of their performance with a metric such as a Brier score.  

👓 Voting me, voting you: Eurovision | The Economist (Espresso)

Read Voting me, voting you: Eurovision (Economist Espresso)
​The competition, whose finals play out tonight, is as famed for its politics as its cheesy
I often read the Economist’s Espresso daily round up, but don’t explicitly post that I do. I’m making an exception in this case because I find the voting partnerships mentioned here quite interesting. Might be worth delving into some of the underlying voting statistics for potential application to other real life examples. I’m also enamored of the nice visualization they provide. I wonder what the overlap of this data is with other related world politics looks like?

👓 Science’s Inference Problem: When Data Doesn’t Mean What We Think It Does | New York Times

Read Science’s Inference Problem: When Data Doesn’t Mean What We Think It Does by James Ryerson (nytimes.com)
Three new books on the challenge of drawing confident conclusions from an uncertain world.
Not sure how I missed this when it came out two weeks ago, but glad it popped up in my reader today.

This has some nice overview material for the general public on probability theory and science, but given the state of research, I’d even recommend this and some of the references to working scientists.

I remember bookmarking one of the texts back in November. This is a good reminder to circle back and read it.

👓 How 4,000 Physicists Gave a Vegas Casino its Worst Week Ever | Physics Buzz

Read How 4,000 Physicists Gave a Vegas Casino its Worst Week Ever (physicsbuzz.physicscentral.com)
What happens when several thousand distinguished physicists, researchers, and students descend on the nation’s gambling capital for a conference? The answer is "a bad week for the casino"—but you'd never guess why. The year was 1986, and the American Physical Society’s annual April meeting was slated to be held in San Diego. But when scheduling conflicts caused the hotel arrangements to fall through just a few months before, the conference's organizers were left scrambling to find an alternative destination that could accommodate the crowd—and ended up settling on Las Vegas's MGM grand.
Totally physics clickbait. The headline should have read: “Vegas won’t cater to physics conferences anymore because they’re too smart to gamble.”

👓 Sliced And Diced: The Inside Story Of How An Ivy League Food Scientist Turned Shoddy Data Into Viral Studies | Buzzfeed

Read Sliced And Diced: The Inside Story Of How An Ivy League Food Scientist Turned Shoddy Data Into Viral Studies by Stephanie M. Lee (BuzzFeed)
Brian Wansink won fame, funding, and influence for his science-backed advice on healthy eating. Now, emails show how the Cornell professor and his colleagues have hacked and massaged low-quality data into headline-friendly studies to “go virally big time.”
This article is painful to read and has some serious implications for both science in general and the issue of repeat-ability. I suspect that this is an easily caught flagrant case and that it probably only scratches the surface. The increased competition in research and the academy is sure to create more cases of this in the future.

We really need people to begin publishing their negative results and doing a better job on understanding and practicing statistics. Science is already not “believed” by far too many in the United States, we really don’t need bad actors like this eroding the solid foundations we’ve otherwise built.

🔖 Computational Social Scientist Beware: Simpson’s Paradox in Behavioral Data by Kristina Lerman

Bookmarked Computational Social Scientist Beware: Simpson's Paradox in Behavioral Data by Kristina Lerman (arxiv.org)
Observational data about human behavior is often heterogeneous, i.e., generated by subgroups within the population under study that vary in size and behavior. Heterogeneity predisposes analysis to Simpson's paradox, whereby the trends observed in data that has been aggregated over the entire population may be substantially different from those of the underlying subgroups. I illustrate Simpson's paradox with several examples coming from studies of online behavior and show that aggregate response leads to wrong conclusions about the underlying individual behavior. I then present a simple method to test whether Simpson's paradox is affecting results of analysis. The presence of Simpson's paradox in social data suggests that important behavioral differences exist within the population, and failure to take these differences into account can distort the studies' findings.

🔖 Ten Great Ideas about Chance by Persi Diaconis and Brian Skyrms

Bookmarked Ten Great Ideas about Chance (Princeton University Press)
In the sixteenth and seventeenth centuries, gamblers and mathematicians transformed the idea of chance from a mystery into the discipline of probability, setting the stage for a series of breakthroughs that enabled or transformed innumerable fields, from gambling, mathematics, statistics, economics, and finance to physics and computer science. This book tells the story of ten great ideas about chance and the thinkers who developed them, tracing the philosophical implications of these ideas as well as their mathematical impact. Persi Diaconis and Brian Skyrms begin with Gerolamo Cardano, a sixteenth-century physician, mathematician, and professional gambler who helped develop the idea that chance actually can be measured. They describe how later thinkers showed how the judgment of chance also can be measured, how frequency is related to chance, and how chance, judgment, and frequency could be unified. Diaconis and Skyrms explain how Thomas Bayes laid the foundation of modern statistics, and they explore David Hume’s problem of induction, Andrey Kolmogorov’s general mathematical framework for probability, the application of computability to chance, and why chance is essential to modern physics. A final idea―that we are psychologically predisposed to error when judging chance―is taken up through the work of Daniel Kahneman and Amos Tversky. Complete with a brief probability refresher, Ten Great Ideas about Chance is certain to be a hit with anyone who wants to understand the secrets of probability and how they were discovered.
h/t Michael Mauboussin

📺 Are University Admissions Biased? | Simpson’s Paradox Part 2 | YouTube

Watched Are University Admissions Biased? | Simpson's Paradox Part 2 by Henry Reich from youtube.com

Simpson's Paradox Part 2. This video is about how to tell whether or not university admissions are biased using statistics: aka, it's about Simpson's Paradox again!

REFERENCES:
Original Berkeley Grad Admissions Paper
Interactive Simpson’s Paradox Explainer
No Lawsuit, But Yes, Berkeley Study on Gender Bias

Statistics on college majors by gender:
https://nces.ed.gov/programs/digest/2016menu\_tables.asp
http://www.npr.org/sections/money/2014/10/28/359419934/who-studies-what-men-women-and-college-majors
http://www.randalolson.com/2014/06/14/percentage-of-bachelors-degrees-conferred-to-women-by-major-1970-2012/

Earnings by college major

Wall Street Journal Article on Simpson’s Paradox