statistics | Chris Aldrich

❤️ darenw tweet A time lapse for every hit of Ichiro’s MLB career

Liked a tweet by

I got a ton of requests for this.... A time lapse for every hit of Ichiro's @mlb career. pic.twitter.com/w8uhzlSnp0
— Daren Willman (@darenw) May 6, 2018

👓 Science’s Inference Problem: When Data Doesn’t Mean What We Think It Does | New York Times

Read Science’s Inference Problem: When Data Doesn’t Mean What We Think It Does by James Ryerson (nytimes.com)

Three new books on the challenge of drawing confident conclusions from an uncertain world.

Not sure how I missed this when it came out two weeks ago, but glad it popped up in my reader today.

This has some nice overview material for the general public on probability theory and science, but given the state of research, I’d even recommend this and some of the references to working scientists.

I remember bookmarking one of the texts back in November. This is a good reminder to circle back and read it.

📖 Read chapter one of Weapons of Math Destruction by Cathy O’Neil

📖 Read chapter one of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil

I don’t think she’s used the specific words in the book yet, but O’Neil is fundamentally writing about social justice and transparency. To a great extent both governments and increasingly large corporations are using these Weapons of Math Destruction inappropriately. Often it may be the case that the algorithms are so opaque as to be incomprehensible by their creators/users, but, as I suspect in many cases, they’re being used to actively create social injustice by benefiting some classes and decimating others. The evolving case of Facebook’s involvement in potentially shifting the outcome of the 2016 Presidential election especially via “dark posts” is an interesting case in point with regard to these examples.

In some sense these algorithms are like viruses running rampant in a large population without the availability of antibiotics to tamp down or modify their effects. Without feedback mechanisms and the ability to see what is going on as it happens the scale issue she touches on can quickly cause even greater harm over short periods of time.

I like that one of the first examples she uses for modeling is that of preparing food for a family. It’s simple, accessible, and generic enough that the majority of people can relate directly to it. It has lots of transparency (even more than her sabermetrics example from baseball). Sadly, however, there is a large swath of the American population that is poor, uneducated, and living in horrific food deserts that they may not grasp the subtleties of even this simple model. As I was reading, it occurred to me that there is a reasonable political football that gets pushed around from time to time in many countries that relates to food and food subsidies. In the United States it’s known as the Supplemental Nutrition Assistance Program (aka SNAP) and it’s regularly changing, though fortunately for many it has some nutritionists who help to provide a feedback mechanism for it. I suspect it would make a great example of the type of Weapon of Mass Destruction she’s discussing in this book. Those who are interested in a quick overview of it and some of the consequences can find a short audio introduction to it via the Eat This Podcast episode How much does a nutritious diet cost? Depends what you mean by “nutritious” or Crime and nourishment Some costs and consequences of the Supplemental Nutrition Assistance Program which discusses an interesting crime related sub-consequence of something as simple as when SNAP benefits are distributed.

I suspect that O’Neil won’t go as far as to bring religion into her thesis, so I’ll do it for her, but I’ll do so from a more general moral philosophical standpoint which underpins much of the Judeo-Christian heritage so prevalent in our society. One of my pet peeves of moralizing (often Republican) conservatives (who often both wear their religion on their sleeves as well as beat others with it–here’s a good recent case in point) is that they never seem to follow the Golden Rule which is stated in multiple ways in the Bible including:

He will reply, ‘Truly I tell you, whatever you did not do for one of the least of these, you did not do for me.

Matthew 25:45

In a country that (says it) values meritocracy, much of the establishment doesn’t seem to put much, if any value, into these basic principles as they would like to indicate that they do.

I’ve previously highlighted the application of mathematical game theory before briefly in relation to the Golden Rule, but from a meritocracy perspective, why can’t it operate at all levels? By this I’ll make tangential reference to Cesar Hidalgo‘s thesis in his book Why Information Grows in which he looks not at just individuals (person-bytes), but larger structures like firms/companies (firmbytes), governments, and even nations. Why can’t these larger structures have their own meritocracy? When America “competes” against other countries, why shouldn’t it be doing so in a meritocracy of nations? To do this requires that we as individuals (as well as corporations, city, state, and even national governments) need to help each other out to do what we can’t do alone. One often hears the aphorism that “a chain is only as strong as it’s weakest link”, why then would we actively go out of our way to create weak links within our own society, particularly as many in government decry the cultures and actions of other nations which we view as trying to defeat us? To me the statistical mechanics of the situation require that we help each other to advance the status quo of humanity. Evolution and the Red Queeen Hypothesis dictates that humanity won’t regress back to the mean, it may be regressing itself toward extinction otherwise.

Highlights, Quotes, & Marginalia

Chapter One – Bomb Parts: What is a Model

You can often see troubles when grandparents visit a grandchild they haven’t seen for a while.

Highlight (yellow) page 22 | Location 409-410
Added on Thursday, October 12, 2017 11:19:23 PM

Upon meeting her a year later, they can suffer a few awkward hours because their models are out of date.

Highlight (yellow) page 22 | Location 411-412
Added on Thursday, October 12, 2017 11:19:41 PM

Racism, at the individual level, can be seen as a predictive model whirring away in billions of human minds around the world. It is built from faulty, incomplete, or generalized data. Whether it comes from experience or hearsay, the data indicates that certain types of people have behaved badly. That generates a binary prediction that all people of that race will behave that same way.

Highlight (yellow) page 22 | Location 416-420
Added on Thursday, October 12, 2017 11:20:34 PM

Needless to say, racists don’t spend a lot of time hunting down reliable data to train their twisted models.

Highlight (yellow) page 23 | Location 420-421
Added on Thursday, October 12, 2017 11:20:52 PM

the workings of a recidivism model are tucked away in algorithms, intelligible only to a tiny elite.

Highlight (yellow) page 25 | Location 454-455
Added on Thursday, October 12, 2017 11:24:46 PM

A 2013 study by the New York Civil Liberties Union found that while black and Latino males between the ages of fourteen and twenty-four made up only 4.7 percent of the city’s population, they accounted for 40.6 percent of the stop-and-frisk checks by police.

Highlight (yellow) page 25 | Location 462-463
Added on Thursday, October 12, 2017 11:25:50 PM

So if early “involvement” with the police signals recidivism, poor people and racial minorities look far riskier.

Highlight (yellow) page 26 | Location 465-466
Added on Thursday, October 12, 2017 11:26:15 PM

The questionnaire does avoid asking about race, which is illegal. But with the wealth of detail each prisoner provides, that single illegal question is almost superfluous.

Highlight (yellow) page 26 | Location 468-469
Added on Friday, October 13, 2017 6:01:28 PM

judge would sustain it. This is the basis of our legal system. We are judged by what we do, not by who we are.

Highlight (yellow) page 26 | Location 478-478
Added on Friday, October 13, 2017 6:02:53 PM

(And they’ll be free to create them when they start buying their own food.) I should add that my model is highly unlikely to scale. I don’t see Walmart or the US Agriculture Department or any other titan embracing my app and imposing it on hundreds of millions of people, like some of the WMDs we’ll be discussing.

You have to love the obligatory parental aphorism about making your own rules when you have your own house.
Yet the US SNAP program does just this. It could be an interesting example of this type of WMD.
Highlight (yellow) page 28 | Location 497-499
Added on Friday, October 13, 2017 6:06:04 PM

three kinds of models.

namely: baseball, food, recidivism
Highlight (yellow) page 27 | Location 489-489
Added on Friday, October 13, 2017 6:08:26 PM

The first question: Even if the participant is aware of being modeled, or what the model is used for, is the model opaque, or even invisible?

Highlight (yellow) page 28 | Location 502-503
Added on Friday, October 13, 2017 6:08:59 PM

many companies go out of their way to hide the results of their models or even their existence. One common justification is that the algorithm constitutes a “secret sauce” crucial to their business. It’s intellectual property, and it must be defended,

Highlight (yellow) page 29 | Location 513-514
Added on Friday, October 13, 2017 6:11:03 PM

the second question: Does the model work against the subject’s interest? In short, is it unfair? Does it damage or destroy lives?

Highlight (yellow) page 29 | Location 516-518
Added on Friday, October 13, 2017 6:11:22 PM

While many may benefit from it, it leads to suffering for others.

Highlight (yellow) page 29 | Location 521-522
Added on Friday, October 13, 2017 6:12:19 PM

The third question is whether a model has the capacity to grow exponentially. As a statistician would put it, can it scale?

Highlight (yellow) page 29 | Location 524-525
Added on Friday, October 13, 2017 6:13:00 PM

scale is what turns WMDs from local nuisances into tsunami forces, ones that define and delimit our lives.

Highlight (yellow) page 30 | Location 526-527
Added on Friday, October 13, 2017 6:13:20 PM

So to sum up, these are the three elements of a WMD: Opacity, Scale, and Damage. All of them will be present, to one degree or another, in the examples we’ll be covering

Think about this for a bit. Are there other potential characteristics?
Highlight (yellow) page 31 | Location 540-542
Added on Friday, October 13, 2017 6:18:52 PM

You could argue, for example, that the recidivism scores are not totally opaque, since they spit out scores that prisoners, in some cases, can see. Yet they’re brimming with mystery, since the prisoners cannot see how their answers produce their score. The scoring algorithm is hidden.

This is similar to anti-class action laws and arbitration clauses that prevent classes from realizing they’re being discriminated against in the workplace or within healthcare. On behalf of insurance companies primarily, many lawmakers work to cap awards from litigation as well as to prevent class action suits which show much larger inequities that corporations would prefer to keep quiet. Some of the recent incidences like the cases of Ellen Pao, Susan J. Fowler, or even Harvey Weinstein are helping to remedy these types of things despite individuals being pressured to stay quiet so as not to bring others to the forefront and show a broader pattern of bad actions on the part of companies or individuals. (This topic could be an extended article or even book of its own.)
Highlight (yellow) page 31 | Location 542-544
Added on Friday, October 13, 2017 6:20:59 PM

the point is not whether some people benefit. It’s that so many suffer.

Highlight (yellow) page 31 | Location 547-547
Added on Friday, October 13, 2017 6:23:35 PM

And here’s one more thing about algorithms: they can leap from one field to the next, and they often do. Research in epidemiology can hold insights for box office predictions; spam filters are being retooled to identify the AIDS virus. This is true of WMDs as well. So if mathematical models in prisons appear to succeed at their job—which really boils down to efficient management of people—they could spread into the rest of the economy along with the other WMDs, leaving us as collateral damage.

Highlight (yellow) page 31 | Location 549-552
Added on Friday, October 13, 2017 6:24:09 PM

Guide to highlight colors

Yellow–general highlights and highlights which don’t fit under another category below
Orange–Vocabulary word; interesting and/or rare word
Green–Reference to read
Blue–Interesting Quote
Gray–Typography Problem
Red–Example to work through

I’m reading this as part of Bryan Alexander’s online book club.

📗 Started reading Weapons of Math Destruction by Cathy O’Neil

📖 Read introduction of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil

Based on the opening, I’m expecting some great examples many which are going to be as heavily biased as things like redlining seen in lending practices in the last century. They’ll come about as the result of missing data, missing assumptions, and even incorrect assumptions.

I’m aware that one of the biggest problems in so-called Big Data is that one needs to spend an inordinate amount of time cleaning up the data (often by hand) to get something even remotely usable. Even with this done I’ve heard about people not testing out their data and then relying on the results only to later find ridiculous error rates (sometimes over 100%!)

Of course there is some space here for the intelligent mathematician, scientist, or quant to create alternate models to take advantage of overlays in such areas, and particularly markets. By overlay here, I mean the gambling definition of the word in which the odds of a particular wager are higher than they should be, thus tending to favor an individual player (who typically has more knowledge or information about the game) rather than the house, which usually relies on a statistically biased game or by taking a rake off of the top of a parimutuel financial structure, or the bulk of other players who aren’t aware of the inequity. The mathematical models based on big data (aka Weapons of Math Destruction or WMDs) described here, particularly in financial markets, are going to often create such large inequities that users of alternate means can take tremendous advantage of the differences for their own benefits. Perhaps it’s the evolutionary competition that will more actively drive these differences to zero? If this is the case, it’s likely that it’s going to be a long time before they equilibrate based on current usage, especially when these algorithms are so opaque.

I suspect that some of this book will highlight uses of statistical errors and logical fallacies like cherry picking data, but which are hidden behind much more opaque mathematical algorithms thereby making them even harder to detect than simple policy decisions which use the simpler form. It’s this type of opacity that has caused major market shifts like the 2008 economic crash, which is still heavily unregulated to protect the masses.

I suspect that folks within Bryan Alexander’s book club will find that the example of Sarah Wysocki to be very compelling and damning evidence of how these big data algorithms work (or don’t work, as the case may be.) In this particular example, there are so many signals which are not only difficult to measure, if at all, that the thing they’re attempting to measure is so swamped with noise as to be unusable. Equally interesting, but not presented here, would be the alternate case of someone tremendously incompetent (perhaps who is cheating as indicated in the example) who actually scored tremendously high on the scale who was kept in their job.

Highlights, Quotes, & Marginalia

Introduction

Do you see the paradox? An algorithm processes a slew of statistics and comes up with a probability that a certain person might be a bad hire, a risky borrower, a terrorist, or a miserable teacher. That probability is distilled into a score, which can turn someone’s life upside down. And yet when the person fights back, “suggestive” countervailing evidence simply won’t cut it. The case must be ironclad. The human victims of WMDs, we’ll see time and again, are held to a far higher standard of evidence than the algorithms themselves.

Highlight (yellow) – Introduction > Location xxxx
Added on Sunday, October 9, 2017

[WMDs are] opaque, unquestioned, and unaccountable, and they operate at a scale to sort, target or “optimize” millions of people. By confusing their findings with on-the-ground reality, most of them create pernicious WMD feedback loops.

Highlight (yellow) – Introduction > Location xxxx
Added on Sunday, October 9, 2017

The software is doing it’s job. The trouble is that profits end up serving as a stand-in, or proxy, for truth. We’ll see this dangerous confusion crop up again and again.

Highlight (yellow) – Introduction > Location xxxx
Added on Sunday, October 9, 2017

I’m reading this as part of Bryan Alexander’s online book club.

👓 Big names in statistics want to shake up much-maligned P value | Nature

Read Big names in statistics want to shake up much-maligned P value by Dalmeet Singh Chawla (Nature)

One of scientists’ favourite statistics — the P value — should face tougher standards, say leading researchers.

The related articles listed at the bottom, many of which I’d previously read, also give some great additional background.

The blog looks like it’s off to a good start! Wonder what I should write about today?

A Long-Sought Proof, Found and Almost Lost | Quanta Magazine

Read A Long-Sought Proof, Found and Almost Lost by Natalie Wolchover (Quanta Magazine)

When a German retiree proved a famous long-standing mathematical conjecture, the response was underwhelming.

Revealed: how US billionaire helped to back Brexit | The Guardian

Read Revealed: how US billionaire helped to back Brexit by Carole Cadwalladr (The Guardian)

Robert Mercer, who bankrolled Donald Trump, played key role with ‘sinister’ advice on using Facebook data

How One 19-Year-Old Illinois Man Is Distorting National Polling Averages | The New York Times

Read How One 19-Year-Old Illinois Man Is Distorting National Polling Averages (nytimes.com)

The U.S.C./Los Angeles Times poll has consistently been an outlier, showing Donald Trump in the lead or near the lead.

Alone, he has been enough to put Mr. Trump in double digits of support among black voters. He can improve Mr. Trump’s margin by 1 point in the survey, even though he is one of around 3,000 panelists.

He is also the reason Mrs. Clinton took the lead in the U.S.C./LAT poll for the first time in a month on Wednesday. The poll includes only the last seven days of respondents, and he hasn’t taken the poll since Oct. 4. Mrs. Clinton surged once he was out of the sample for the first time in several weeks.

Continue reading How One 19-Year-Old Illinois Man Is Distorting National Polling Averages | The New York Times

Jordan Ellenberg don’t know stat | Rick’s Ramblings

Read Jordan Ellenberg don’t know stat by Rick Durrett, Ph.D. (Rick's Ramblings sites.duke.edu)

There follows a discussion of flipping coins and the fact that frequencies have more random variation when the sample size is small, but he never stops to see if this is enough to explain the observation.

My intuition told me it did not, so I went and got some brain cancer data.

Jordan Ellenberg is called out a bit by Rick Durrett for one of his claims in the best seller How Not To Be Wrong: The Power of Mathematical Thinking.

I remember reading that section of the book and mostly breezing through that argument primarily as a simple example with a limited, but direct point. Durrett decided to delve into the applied math a bit further.

These are some of the subtle issues one eventually comes across when experts read others’ works which were primarily written for much broader audiences.

I also can’t help thinking that one paints a target on one’s back with a book title like that…

BTW, the quote of the day has to be:

… so I went and got some brain cancer data.

🔖 100 years after Smoluchowski: stochastic processes in cell biology

Bookmarked 100 years after Smoluchowski: stochastic processes in cell biology (arxiv.org)

100 years after Smoluchowski introduces his approach to stochastic processes, they are now at the basis of mathematical and physical modeling in cellular biology: they are used for example to analyse and to extract features from large number (tens of thousands) of single molecular trajectories or to study the diffusive motion of molecules, proteins or receptors. Stochastic modeling is a new step in large data analysis that serves extracting cell biology concepts. We review here the Smoluchowski's approach to stochastic processes and provide several applications for coarse-graining diffusion, studying polymer models for understanding nuclear organization and finally, we discuss the stochastic jump dynamics of telomeres across cell division and stochastic gene regulation.

65 pages, J. Phys A 2016 [1]

References

[1]

D. Holcman and Z. Schuss, “100 years after Smoluchowski: stochastic processes in cell biology,” arXiv, 26-Dec-2016. [Online]. Available: https://arxiv.org/abs/1612.08381. [Accessed: 03-Jan-2017]

Warren Weaver Bot!

Liked Someone has built a Warren Weaver Bot! by

Weaverbot (Twitter)

This is the signal for the second.

How can you not follow this twitter account?!

Now I’m waiting for a Shannon bot and a Weiner bot. Maybe a John McCarthy bot would be apropos too?!

🔖 Advanced Data Analysis from an Elementary Point of View by Cosma Rohilla Shalizi

Bookmarked Advanced Data Analysis from an Elementary Point of View by Cosma Rohilla Shalizi (stat.cmu.edu)

Advanced Data Analysis from an Elementary Point of View
by Cosma Rohilla Shalizi

This is a draft textbook on data analysis methods, intended for a one-semester course for advance undergraduate students who have already taken classes in probability, mathematical statistics, and linear regression. It began as the lecture notes for 36-402 at Carnegie Mellon University.

By making this draft generally available, I am not promising to provide any assistance or even clarification whatsoever. Comments are, however, welcome.

The book is under contract to Cambridge University Press; it should be turned over to the press before the end of 2015. A copy of the next-to-final version will remain freely accessible here permanently.

Complete draft in PDF

Table of contents:

I. Regression and Its Generalizations

Regression Basics

The Truth about Linear Regression

Model Evaluation

Smoothing in Regression

Simulation

The Bootstrap

Weighting and Variance

Splines

Additive Models

Testing Regression Specifications

Logistic Regression

Generalized Linear Models and Generalized Additive Models

Classification and Regression Trees
II. Distributions and Latent Structure

Density Estimation

Relative Distributions and Smooth Tests of Goodness-of-Fit

Principal Components Analysis

Factor Models

Nonlinear Dimensionality Reduction

Mixture Models

Graphical Models
III. Dependent Data

Time Series

Spatial and Network Data

Simulation-Based Inference
IV. Causal Inference

Graphical Causal Models

Identifying Causal Effects

Causal Inference from Experiments

Estimating Causal Effects

Discovering Causal StructureAppendices

Data-Analysis Problem Sets

Reminders from Linear Algebra

Big O and Little o Notation

Taylor Expansions

Multivariate Distributions

Algebra with Expectations and Variances

Propagation of Error, and Standard Errors for Derived Quantities

Optimization

chi-squared and the Likelihood Ratio Test

Proof of the Gauss-Markov Theorem

Rudimentary Graph Theory

Information Theory

Hypothesis Testing

Writing R Functions

Random Variable Generation

Planned changes:

Unified treatment of information-theoretic topics (relative entropy / Kullback-Leibler divergence, entropy, mutual information and independence, hypothesis-testing interpretations) in an appendix, with references from chapters on density estimation, on EM, and on independence testing

More detailed treatment of calibration and calibration-checking (part II)

Missing data and imputation (part II)

Move d-separation material from “causal models” chapter to graphical models chapter as no specifically causal content (parts II and IV)?

Expand treatment of partial identification for causal inference, including partial identification of effects by looking at all data-compatible DAGs (part IV)

Figure out how to cut at least 50 pages

Make sure notation is consistent throughout: insist that vectors are always matrices, or use more geometric notation?

Move simulation to an appendix

Move variance/weights chapter to right before logistic regression

Move some appendices online (i.e., after references)?

(Text last updated 30 March 2016; this page last updated 6 November 2015)

16w5113: Stochastic and Deterministic Models for Evolutionary Biology | Banff International Research Station

Bookmarked Stochastic and Deterministic Models for Evolutionary Biology (Banff International Research Station)

A BIRS / Casa Matemática Oaxaca Workshop arriving in Oaxaca, Mexico Sunday, July 31 and departing Friday August 5, 2016

Evolutionary biology is a rapidly changing field, confronted to many societal problems of increasing importance: impact of global changes, emerging epidemics, antibiotic resistant bacteria… As a consequence, a number of new problematics have appeared over the last decade, challenging the existing mathematical models. There exists thus a demand in the biology community for new mathematical models allowing a qualitative or quantitative description of complex evolution problems. In particular, in the societal problems mentioned above, evolution is often interacting with phenomena of a different nature: interaction with other organisms, spatial dynamics, age structure, invasion processes, time/space heterogeneous environment… The development of mathematical models able to deal with those complex interactions is an ambitious task. Evolutionary biology is interested in the evolution of species. This process is a combination of several phenomena, some occurring at the individual level (e.g. mutations), others at the level of the entire population (competition for resources), often consisting of a very large number of individuals. the presence of very different scales is indeed at the core of theoretical evolutionary biology, and at the origin of many of the difficulties that biologists are facing. The development of new mathematical models thus requires a joint work of three different communities of researchers: specialists of partial differential equations, specialists of probability theory, and theoretical biologists. The goal of this workshop is to gather researchers from each of these communities, currently working on close problematics. Those communities have usually few interactions, and this meeting would give them the opportunity to discuss and work around a few biological thematics that are especially challenging mathematically, and play a crucial role for biological applications.

The role of a spatial structure in models for evolution: The introduction of a spatial structure in evolutionary biology models is often challenging. It is however well known that local adaptation is frequent in nature: field data show that the phenotypes of a given species change considerably across its range. The spatial dynamics of a population can also have a deep impact on its evolution. Assessing e.g. the impact of global changes on species requires the development of robust mathematical models for spatially structured populations.

The first type of models used by theoretical biologists for this type of problems are IBM (Individual Based Models), which describe the evolution of a finite number of individuals, characterized by their position and a phenotype. The mathematical analysis of IBM in spatially homogeneous situations has provided several methods that have been successful in the theoretical biology community (see the theory of Adaptive Dynamics). On the contrary, very few results exist so far on the qualitative properties of such models for spatially structured populations.

The second class of mathematical approach for this type of problem is based on ”infinite dimensional” reaction-diffusion: the population is structured by a continuous phenotypic trait, that affects its ability to disperse (diffusion), or to reproduce (reaction). This type of model can be obtained as a large population limit of IBM. The main difficulty of these models (in the simpler case of asexual populations) is the term modeling the competition from resources, that appears as a non local competition term. This term prevents the use of classical reaction diffusion tools such as the comparison principle and sliding methods. Recently, promising progress has been made, based on tools from elliptic equations and/or Hamilton-Jacobi equations. The effects of small populations can however not be observed on such models. The extension of these models and methods to include these effects will be discussed during the workshop.

Eco-evolution models for sexual populations:An essential question already stated by Darwin and Fisher and which stays for the moment without answer (although it continues to intrigue the evolutionary biologists) is: ”Why does sexual reproduction maintain?” Indeed this reproduction way is very costly since it implies a large number of gametes, the mating and the choice of a compatible partner. During the meiosis phasis, half of the genetical information is lost. Moreover, the males have to be fed and during the sexual mating, individual are easy preys for predators. A partial answer is that recombination plays a main role by better eliminating the deleterious mutations and by increasing the diversity. Nevertheless, this theory is not completely satisfying and many researches are devoted to understanding evolution of sexual populations and comparison between asexual and sexual reproduction. Several models exist to model the influence of sexual reproduction on evolving species. The difficulty compared to asexual populations is that a detailed description of the genetic basis of phenotypes is required, and in particular include recombinations. For sexual populations, recombination plays a main role and it is essential to understand. All models require strong biological simplifications, the development of relevant mathematical methods for such mechanisms then requires a joint work of mathematicians and biologists. This workshop will be an opportunity to set up such collaborations.

The first type of model considers a small number of diploid loci (typically one locus and two alleles), while the rest of the genome is considered as fixed. One can then define the fitness of every combination of alleles. While allowing the modeling of specific sexual effects (such as dominant/recessive alleles), this approach neglects the rest of the genome (and it is known that phenotypes are typically influenced by a large number of loci). An opposite approach is to consider a large number of loci, each locus having a small and additive impact on the considered phenotype. This approach then neglects many microscopic phenomena (epistasis, dominant/recessive alleles…), but allows the derivation of a deterministic model, called the infinitesimal model, in the case of a large population. The construction of a good mathematical framework for intermediate situation would be an important step forward.

The evolution of recombination and sex is very sensitive to the interaction between several evolutionary forces (selection, migration, genetic drift…). Modeling these interactions is particularly challenging and our understanding of the recombination evolution is often limited by strong assumptions regarding demography, the relative strength of these different evolutionary forces, the lack of spatial structure… The development of a more general theoretical framework based on new mathematical developments would be particularly valuable.

Another problem, that has received little attention so far and is worth addressing, is the modeling of the genetic material exchanges in asexual population. This phenomena is frequent in micro-organisms : horizontal gene transfers in bacteria, reassortment or recombination in viruses. These phenomena share some features with sexual reproduction. It would be interesting to see if the effect of this phenomena can be seen as a perturbation of existing asexual models. This would in particular be interesting in spatially structured populations (e.g. viral epidemics), since the the mathematical analysis of spatially structured asexual populations is improving rapidly.

Modeling in evolutionary epidemiology: Mathematical epidemiology has been developing since more than a century ago. Yet, the integration of population genetics phenomena to epidemiology is relatively recent. Microbial pathogens (bacteria and viruses) are particularly interesting organisms because their short generation times and large mutation rates allow them to adapt relatively fast to changing environments. As a consequence, ecological (demography) and evolutionary (population genetics) processes often occur at the same pace. This raises many interesting problems.

A first challenge is the modeling of the spatial dynamics of an epidemics. The parasites can evolve during the epidemics of a new host population, either to adapt to a heterogeneous environment, or because it will itself modify the environment as it invades. The applications of such studies are numerous: antibiotic management, agriculture… An aspect of this problem for which our workshop can bring a significant contribution (thanks to the diversity of its participants) is the evolution of the pathogen diversity. During the large expansion produced by an epidemics, there is a loss of diversity in the invading parasites, since most pathogens originate from a few parents. The development of mathematical models for those phenomena is challenging: only a small number of pathogens are present ahead of the epidemic front, while the number of parasites rapidly become very large after the infection. The interaction between a stochastic micro scale and a deterministic macro scale is apparent here, and deserves a rigorous mathematical analysis.

Another interesting phenomena is the effect of a sudden change of the environment on a population of pathogens. Examples of such situations are for instance the antibiotic treatment of an infected patients, or the transmission of a parasite to a new host species (transmission of the avian influenza to human beings, for instance). Related experiments are relatively easy to perform, and called evolutionary rescue experiments. So far, this question has received limited attention from the mathematical community. The key is to estimate the probability that a mutant well adapted to the new environment existed in the original population, or will appear soon after the environmental change. Interactions between biologists specialists of those questions and mathematicians should lead to new mathematical problems.

To Understand God’s Thought…

To understand God’s thought, we must study statistics, for these are the measure of His purpose.

Florence Nightingale, OM, RRC (1820-1910), English social reformer and statistician, founder of modern nursing, renaissance woman
in Florence Nightingale’s Wisdom, New York Times, 3/4/14