The blog looks like it’s off to a good start! Wonder what I should write about today?

Syndicated copies to:# Tag: statistics

## A Long-Sought Proof, Found and Almost Lost | Quanta Magazine

*(Quanta Magazine)*

When a German retiree proved a famous long-standing mathematical conjecture, the response was underwhelming.

Continue reading “A Long-Sought Proof, Found and Almost Lost | Quanta Magazine”

Syndicated copies to:## Revealed: how US billionaire helped to back Brexit | The Guardian

*(The Guardian)*

Robert Mercer, who bankrolled Donald Trump, played key role with ‘sinister’ advice on using Facebook data

Continue reading “Revealed: how US billionaire helped to back Brexit | The Guardian”

Syndicated copies to:## How One 19-Year-Old Illinois Man Is Distorting National Polling Averages | The New York Times

*(nytimes.com)*(4 days 22 hours 1 minute 56 seconds)

The U.S.C./Los Angeles Times poll has consistently been an outlier, showing Donald Trump in the lead or near the lead.

Alone, he has been enough to put Mr. Trump in double digits of support among black voters. He can improve Mr. Trump’s margin by 1 point in the survey, even though he is one of around 3,000 panelists.

He is also the reason Mrs. Clinton took the lead in the U.S.C./LAT poll for the first time in a month on Wednesday. The poll includes only the last seven days of respondents, and he hasn’t taken the poll since Oct. 4. Mrs. Clinton surged once he was out of the sample for the first time in several weeks.

## Jordan Ellenberg don’t know stat | Rick’s Ramblings

*(Rick's Ramblings sites.duke.edu)*

There follows a discussion of flipping coins and the fact that frequencies have more random variation when the sample size is small, but he never stops to see if this is enough to explain the observation.

My intuition told me it did not, so I went and got some brain cancer data.

Jordan Ellenberg is called out a bit by Rick Durrett for one of his claims in the best seller *How Not To Be Wrong: The Power of Mathematical Thinking*.

I remember reading that section of the book and mostly breezing through that argument primarily as a simple example with a limited, but direct point. Durrett decided to delve into the applied math a bit further.

These are some of the subtle issues one eventually comes across when experts read others’ works which were primarily written for much broader audiences.

I also can’t help thinking that one paints a target on one’s back with a book title like that…

BTW, the quote of the day has to be:

Syndicated copies to:… so I went and got some brain cancer data.

## 🔖 100 years after Smoluchowski: stochastic processes in cell biology

*(arxiv.org)*

100 years after Smoluchowski introduces his approach to stochastic processes, they are now at the basis of mathematical and physical modeling in cellular biology: they are used for example to analyse and to extract features from large number (tens of thousands) of single molecular trajectories or to study the diffusive motion of molecules, proteins or receptors. Stochastic modeling is a new step in large data analysis that serves extracting cell biology concepts. We review here the Smoluchowski's approach to stochastic processes and provide several applications for coarse-graining diffusion, studying polymer models for understanding nuclear organization and finally, we discuss the stochastic jump dynamics of telomeres across cell division and stochastic gene regulation.

### References

*arXiv*, 26-Dec-2016. [Online]. Available: https://arxiv.org/abs/1612.08381. [Accessed: 03-Jan-2017]

## Warren Weaver Bot!

*(Twitter)*

This is the signal for the second.

How can you * not *follow this twitter account?!

Now I’m waiting for a Shannon bot and a Weiner bot. Maybe a John McCarthy bot would be apropos too?!

Syndicated copies to:## 🔖 Advanced Data Analysis from an Elementary Point of View by Cosma Rohilla Shalizi

*(stat.cmu.edu)*

Syndicated copies to:Advanced Data Analysis from an Elementary Point of View

by Cosma Rohilla ShaliziThis is a draft textbook on data analysis methods, intended for a one-semester course for advance undergraduate students who have already taken classes in probability, mathematical statistics, and linear regression. It began as the lecture notes for 36-402 at Carnegie Mellon University.

By making this draft generally available, I am not promising to provide any assistance or even clarification whatsoever. Comments are, however, welcome.

The book is under contract to Cambridge University Press; it should be turned over to the press before the end of 2015. A copy of the next-to-final version will remain freely accessible here permanently.

Table of contents:

I. Regression and Its Generalizations

- Regression Basics
- The Truth about Linear Regression
- Model Evaluation
- Smoothing in Regression
- Simulation
- The Bootstrap
- Weighting and Variance
- Splines
- Additive Models
- Testing Regression Specifications
- Logistic Regression
- Generalized Linear Models and Generalized Additive Models
- Classification and Regression Trees

II. Distributions and Latent Structure- Density Estimation
- Relative Distributions and Smooth Tests of Goodness-of-Fit
- Principal Components Analysis
- Factor Models
- Nonlinear Dimensionality Reduction
- Mixture Models
- Graphical Models

III. Dependent Data- Time Series
- Spatial and Network Data
- Simulation-Based Inference

IV. Causal Inference- Graphical Causal Models
- Identifying Causal Effects
- Causal Inference from Experiments
- Estimating Causal Effects
- Discovering Causal StructureAppendices

- Data-Analysis Problem Sets
- Reminders from Linear Algebra
- Big O and Little o Notation
- Taylor Expansions
- Multivariate Distributions
- Algebra with Expectations and Variances
- Propagation of Error, and Standard Errors for Derived Quantities
- Optimization
- chi-squared and the Likelihood Ratio Test
- Proof of the Gauss-Markov Theorem
- Rudimentary Graph Theory
- Information Theory
- Hypothesis Testing
- Writing R Functions
- Random Variable Generation
Planned changes:

- Unified treatment of information-theoretic topics (relative entropy / Kullback-Leibler divergence, entropy, mutual information and independence, hypothesis-testing interpretations) in an appendix, with references from chapters on density estimation, on EM, and on independence testing
- More detailed treatment of calibration and calibration-checking (part II)
- Missing data and imputation (part II)
- Move d-separation material from “causal models” chapter to graphical models chapter as no specifically causal content (parts II and IV)?
- Expand treatment of partial identification for causal inference, including partial identification of effects by looking at all data-compatible DAGs (part IV)
- Figure out how to cut at least 50 pages
- Make sure notation is consistent throughout: insist that vectors are always matrices, or use more geometric notation?
- Move simulation to an appendix
- Move variance/weights chapter to right before logistic regression
- Move some appendices online (i.e., after references)?
(Text last updated 30 March 2016; this page last updated 6 November 2015)

## 16w5113: Stochastic and Deterministic Models for Evolutionary Biology | Banff International Research Station

*(Banff International Research Station)*

A BIRS / Casa Matemática Oaxaca Workshop arriving in Oaxaca, Mexico Sunday, July 31 and departing Friday August 5, 2016

Syndicated copies to:Evolutionary biology is a rapidly changing field, confronted to many societal problems of increasing importance: impact of global changes, emerging epidemics, antibiotic resistant bacteria… As a consequence, a number of new problematics have appeared over the last decade, challenging the existing mathematical models. There exists thus a demand in the biology community for new mathematical models allowing a qualitative or quantitative description of complex evolution problems. In particular, in the societal problems mentioned above, evolution is often interacting with phenomena of a different nature: interaction with other organisms, spatial dynamics, age structure, invasion processes, time/space heterogeneous environment… The development of mathematical models able to deal with those complex interactions is an ambitious task. Evolutionary biology is interested in the evolution of species. This process is a combination of several phenomena, some occurring at the individual level (e.g. mutations), others at the level of the entire population (competition for resources), often consisting of a very large number of individuals. the presence of very different scales is indeed at the core of theoretical evolutionary biology, and at the origin of many of the difficulties that biologists are facing. The development of new mathematical models thus requires a joint work of three different communities of researchers: specialists of partial differential equations, specialists of probability theory, and theoretical biologists. The goal of this workshop is to gather researchers from each of these communities, currently working on close problematics. Those communities have usually few interactions, and this meeting would give them the opportunity to discuss and work around a few biological thematics that are especially challenging mathematically, and play a crucial role for biological applications.

The role of a spatial structure in models for evolution: The introduction of a spatial structure in evolutionary biology models is often challenging. It is however well known that local adaptation is frequent in nature: field data show that the phenotypes of a given species change considerably across its range. The spatial dynamics of a population can also have a deep impact on its evolution. Assessing e.g. the impact of global changes on species requires the development of robust mathematical models for spatially structured populations.

The first type of models used by theoretical biologists for this type of problems are IBM (Individual Based Models), which describe the evolution of a finite number of individuals, characterized by their position and a phenotype. The mathematical analysis of IBM in spatially homogeneous situations has provided several methods that have been successful in the theoretical biology community (see the theory of Adaptive Dynamics). On the contrary, very few results exist so far on the qualitative properties of such models for spatially structured populations.

The second class of mathematical approach for this type of problem is based on ”infinite dimensional” reaction-diffusion: the population is structured by a continuous phenotypic trait, that affects its ability to disperse (diffusion), or to reproduce (reaction). This type of model can be obtained as a large population limit of IBM. The main difficulty of these models (in the simpler case of asexual populations) is the term modeling the competition from resources, that appears as a non local competition term. This term prevents the use of classical reaction diffusion tools such as the comparison principle and sliding methods. Recently, promising progress has been made, based on tools from elliptic equations and/or Hamilton-Jacobi equations. The effects of small populations can however not be observed on such models. The extension of these models and methods to include these effects will be discussed during the workshop.

Eco-evolution models for sexual populations:An essential question already stated by Darwin and Fisher and which stays for the moment without answer (although it continues to intrigue the evolutionary biologists) is: ”Why does sexual reproduction maintain?” Indeed this reproduction way is very costly since it implies a large number of gametes, the mating and the choice of a compatible partner. During the meiosis phasis, half of the genetical information is lost. Moreover, the males have to be fed and during the sexual mating, individual are easy preys for predators. A partial answer is that recombination plays a main role by better eliminating the deleterious mutations and by increasing the diversity. Nevertheless, this theory is not completely satisfying and many researches are devoted to understanding evolution of sexual populations and comparison between asexual and sexual reproduction. Several models exist to model the influence of sexual reproduction on evolving species. The difficulty compared to asexual populations is that a detailed description of the genetic basis of phenotypes is required, and in particular include recombinations. For sexual populations, recombination plays a main role and it is essential to understand. All models require strong biological simplifications, the development of relevant mathematical methods for such mechanisms then requires a joint work of mathematicians and biologists. This workshop will be an opportunity to set up such collaborations.

The first type of model considers a small number of diploid loci (typically one locus and two alleles), while the rest of the genome is considered as fixed. One can then define the fitness of every combination of alleles. While allowing the modeling of specific sexual effects (such as dominant/recessive alleles), this approach neglects the rest of the genome (and it is known that phenotypes are typically influenced by a large number of loci). An opposite approach is to consider a large number of loci, each locus having a small and additive impact on the considered phenotype. This approach then neglects many microscopic phenomena (epistasis, dominant/recessive alleles…), but allows the derivation of a deterministic model, called the infinitesimal model, in the case of a large population. The construction of a good mathematical framework for intermediate situation would be an important step forward.

The evolution of recombination and sex is very sensitive to the interaction between several evolutionary forces (selection, migration, genetic drift…). Modeling these interactions is particularly challenging and our understanding of the recombination evolution is often limited by strong assumptions regarding demography, the relative strength of these different evolutionary forces, the lack of spatial structure… The development of a more general theoretical framework based on new mathematical developments would be particularly valuable.

Another problem, that has received little attention so far and is worth addressing, is the modeling of the genetic material exchanges in asexual population. This phenomena is frequent in micro-organisms : horizontal gene transfers in bacteria, reassortment or recombination in viruses. These phenomena share some features with sexual reproduction. It would be interesting to see if the effect of this phenomena can be seen as a perturbation of existing asexual models. This would in particular be interesting in spatially structured populations (e.g. viral epidemics), since the the mathematical analysis of spatially structured asexual populations is improving rapidly.

Modeling in evolutionary epidemiology: Mathematical epidemiology has been developing since more than a century ago. Yet, the integration of population genetics phenomena to epidemiology is relatively recent. Microbial pathogens (bacteria and viruses) are particularly interesting organisms because their short generation times and large mutation rates allow them to adapt relatively fast to changing environments. As a consequence, ecological (demography) and evolutionary (population genetics) processes often occur at the same pace. This raises many interesting problems.

A first challenge is the modeling of the spatial dynamics of an epidemics. The parasites can evolve during the epidemics of a new host population, either to adapt to a heterogeneous environment, or because it will itself modify the environment as it invades. The applications of such studies are numerous: antibiotic management, agriculture… An aspect of this problem for which our workshop can bring a significant contribution (thanks to the diversity of its participants) is the evolution of the pathogen diversity. During the large expansion produced by an epidemics, there is a loss of diversity in the invading parasites, since most pathogens originate from a few parents. The development of mathematical models for those phenomena is challenging: only a small number of pathogens are present ahead of the epidemic front, while the number of parasites rapidly become very large after the infection. The interaction between a stochastic micro scale and a deterministic macro scale is apparent here, and deserves a rigorous mathematical analysis.

Another interesting phenomena is the effect of a sudden change of the environment on a population of pathogens. Examples of such situations are for instance the antibiotic treatment of an infected patients, or the transmission of a parasite to a new host species (transmission of the avian influenza to human beings, for instance). Related experiments are relatively easy to perform, and called evolutionary rescue experiments. So far, this question has received limited attention from the mathematical community. The key is to estimate the probability that a mutant well adapted to the new environment existed in the original population, or will appear soon after the environmental change. Interactions between biologists specialists of those questions and mathematicians should lead to new mathematical problems.

## To Understand God’s Thought…

## Review of The Signal and the Noise: Why So Many Predictions Fail – But Some Don’t

*(Amazon.com)*

Business & Economics

Penguin Press HC

September 27, 2012

Hardcover

534

personal library

The founder of FiveThirtyEight.com challenges myths about predictions in subjects ranging from the financial market and weather to sports and politics, profiling the world of prediction to explain how readers can distinguish true signals from hype, in a report that also reveals the sources and societal costs of wrongful predictions.

**Started Reading**: May 25, 2013

**Finished Reading**: October 13, 2013

Given the technical nature of what Nate Silver does, and some of the early mentions of the book, I had higher hopes for the technical portions of the book. As usual for a popular text, I was left wanting a lot more. Again, the lack of any math left a lot to desire. I wish technical writers could get away with even a handful of equations, but wishing just won’t make it so.

The first few chapters were a bit more technical sounding, but eventually devolved into a more journalistic viewpoint of statistics, prediction, and forecasting in general within the areas of economics, political elections, weather forecasting, earthquakes, baseball, poker, chess, and terrorism. I have a feeling he lost a large part of his audience in the first few chapters by discussing the economic meltdown of 2008 first instead of baseball or poker and then getting into politics and economics.

While some of the discussion around each of these bigger topics are all intrinsically interesting and there were a few interesting tidbits I hadn’t heard or read about previously, on the whole it wasn’t really as novel as I had hoped it would be. I think it should be required reading for all politicians however, as I too often get the feeling that none of them think at this level.

There was some reasonably good philosophical discussion of Bayesian statistics versus Fisherian, but it was all too short and could have been fleshed out more significantly. I still prefer David Applebaum’s historical and philosophical discussion of probability in Probability and Information: An Integrated Approach though he surprisingly didn’t mention R.A. Fisher directly himself in his coverage.

It was interesting to run across additional mentions of power laws in the realms of earthquakes and terrorism after reading Melanie Mitchell’s *Complexity: A Guided Tour* (review here), but I’ll have to find some texts which describe the mathematics in full detail. There was surprisingly large amount of discussion skirting around the topics within complexity without delving into it in any substantive form.

For those with a pre-existing background in science and especially probability theory, I’d recommend skipping this and simply reading Daniel Kahneman’s book *Thinking, Fast and Slow*. Kahneman’s work is referenced several times and his book seems less intuitive than some of the material Silver presents here.

This is the kind of text which should be required reading in high school civics classes. Perhaps it might motivate more students to be interested in statistics and science related pursuits as these are almost always at the root of most political and policy related questions at the end of the day.

For me, I’d personally give this three stars, but the broader public should view it with at least four stars if not five as there is some truly great stuff here. Unfortunately a lot of it is old hat or retreaded material for me.