Machine Learning (ML) is one of the most exciting and dynamic areas of modern research and application. The purpose of this review is to provide an introduction to the core concepts and tools of machine learning in a manner easily understood and intuitive to physicists. The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, regularization, and generalization before moving on to more advanced topics in both supervised and unsupervised learning. Topics covered in the review include ensemble models, deep learning and neural networks, clustering and data visualization, energy-based models (including MaxEnt models and Restricted Boltzmann Machines), and variational methods. Throughout, we emphasize the many natural connections between ML and statistical physics. A notable aspect of the review is the use of Python notebooks to introduce modern ML/statistical packages to readers using physics-inspired datasets (the Ising Model and Monte-Carlo simulations of supersymmetric decays of proton-proton collisions). We conclude with an extended outlook discussing possible uses of machine learning for furthering our understanding of the physical world as well as open problems in ML where physicists maybe able to contribute. (Notebooks are available at this https URL )
Recommendation engines are everywhere. They let Netflix suggest shows you might want to watch. They let Spotify build you a personalised playlist of music you will probably like. They turn your smartphone into a source of endless hilarity and mirth. And, of course, there’s IBM’s Watson, recommending all sorts of “interesting” new recipes. As part of his PhD project on machine learning, Jaan Altosaar decided to use a new mathematical technique to build his own recipe recommendation engine. The technique is similar to the kind of natural language processing that powers predictive text on a phone, and one of the attractions of using food instead of English is that there are only 2000–3000 ingredients to worry about, instead of more than 150,000 words. The results so far are fun and intriguing, and can only get better.Recommendation engines are everywhere. They let Netflix suggest shows you might want to watch. They let Spotify...
Back in February I had retweeted something interesting from physicist and information theorist Michael Nielsen:
“Augmented cooking with machine intelligence”, with interesting remarks on generating food analogies… https://t.co/UluYk6p8TV
— michael_nielsen (@michael_nielsen) February 2, 2017
I found the article in it so interesting, there was some brief conversation around it and I thought to recommend it to my then new friend Jeremy Cherfas, whose Eat This Podcast I had just recently started to enjoy. Mostly I thought he would find it as interesting as I, though I hardly expected he’d turn it into a podcast episode. Though I’ve been plowing through back episodes in his catalog, fortunately this morning I ran out of downloaded episodes in the car so I started streaming the most recent one to find a lovely surprise: a podcast produced on a tip I made.
While he surely must have been producing the episode for some time before I started supporting the podcast on Patreon last week, I must say that having an episode made from one of my tips is the best backer thank you I’ve ever received from a crowd funded project.
Needless to say, I obviously found the subject fascinating. In part it did remind me of a section of Herve This’ book The Science of the Oven (eventually I’ll get around to posting a review with more thoughts) and some of his prior research which I was apparently reading on Christmas Day this past year. On page 118 of the text This discusses the classic French sauces of Escoffier’s students Louis Saulnier and Theodore Gringoire  and that a physical chemical analysis of them shows there to be only twenty-three kinds. He continues on:
A system that I introduced during the European Conference on Colloids and Interfaces in 2002  offers a new classification, based on the physical chemical structure of the sauce. In it, G indicates a gas, E an aqueous solution, H a fat in the liquid state, and S a solid. These “phases” can be dispersed (symbol /), mixed (symbol +), superimposed (symbol θ), included (symbol @). Thus, veal stock is a solution, which is designated E. Bound veal stock, composed of starch granules swelled by the water they have absorbed, dispersed in an aqueous solution, is thus described by the formula (E/S)/E.
This goes on to describe in a bit more detail how the scientist-cook could then create a vector space of all combinations of foods from a physical state perspective. A classification system like this could be expanded and bolted on top of the database created by Jaan Altosaar and improved to provide even more actual realistic recipes of the type discussed in the podcast. The combinatorics of the problem are incredibly large, but my guess is that the constraints on the space of possible solutions is brought down incredibly in actual practice. It’s somewhat like the huge numbers of combinations the A, C, T, and Gs in our DNA that could be imagined, yet only an incredibly much smaller subset of that larger set could be found in a living human being.
The additional byproduct of catching this episode was that it finally reminded me why I had thought the name Jaan Altosaar was so familiar to me when I read his article. It turns out I know Jaan and some of his previous work. Sometime back in 2014 I had corresponded with him regarding his fantastic science news site Useful Science which was just then starting. While I was digging up the connection I realized that my old friend Sol Golomb had also referenced Jaan to me via Mark Wilde for some papers he suggested I read.