This is a video I've been wanting to do for a while (in part because I've wanted to learn Morse Code myself, for years!) and I've also had many requests for it.
Over recent years, new light has been shed on aspects of information processing in cells. The quantification of information, as described by Shannon’s information theory, is a basic and powerful tool that can be applied to various fields, such as communication, statistics, and computer science, as well as to information processing within cells. It has also been used to infer the network structure of molecular species. However, the difficulty of obtaining sufficient sample sizes and the computational burden associated with the high-dimensional data often encountered in biology can result in bottlenecks in the application of information theory to systems biology. This article provides an overview of the application of information theory to systems biology, discussing the associated bottlenecks and reviewing recent work.
Different quantities that go by the name of entropy are used in variational principles to infer probability distributions from limited data. Shore and Johnson showed that maximizing the Boltzmann-Gibbs form of the entropy ensures that probability distributions inferred satisfy the multiplication rule of probability for independent events in the absence of data coupling such events. Other types of entropies that violate the Shore and Johnson axioms, including nonadditive entropies such as the Tsallis entropy, violate this basic consistency requirement. Here we use the axiomatic framework of Shore and Johnson to show how such nonadditive entropy functions generate biases in probability distributions that are not warranted by the underlying data.
Zero-width characters are invisible, ‘non-printing’ characters that are not displayed by the majority of applications. For example, I’ve inserted 10 zero-width spaces into this sentence, can you tell? (Hint: paste the sentence into Diff Checker to see the locations of the characters!). These characters can be used to ‘fingerprint’ text for certain users.
A cool little trick with text for embedded steganography, security, or other communication purposes.
This could also be used for pseudo-private communication via Webmention even. Just hide your messages inside of public messages.
This is the first time I’ve ever seen someone indicate that they’ve done this in the wild.
I’ll also admit that this is a really great looking blogroll too! I’m going to have to mine it for the bunch of feeds that I’m not already following.
The media's "epistemic crisis," algorithmic biases, and the radio's inherent, historical misogyny.
In hearings this week, House Democrats sought to highlight an emerging set of facts concerning the President’s conduct. On this week’s On the Media, a look at why muddying the waters remains a viable strategy for Trump’s defenders. Plus, even the technology we trust for its clarity isn’t entirely objective, especially the algorithms that drive decisions in public and private institutions. And, how early radio engineers designed broadcast equipment to favor male voices and make women sound "shrill."
Cathy O’Neil has a great interview on her book Weapons of Math Distraction. I highly recommend everyone read it, but if for some reason you can’t do it this month, this interview is a good starting place for repairing that deficiency.
In section three, I’ll note that I’ve studied the areas of signal processing and information theory in great depth, but never run across the fascinating history of how we physically and consciously engineered women out of radio and broadcast in quite the way discussed here. I recall the image of “Lena” being nudged out of image processing recently, but the engineering wrongs here are far more serious and pernicious.
President and William H. Miller Professor of Complex Systems
Maxwell’s Demon is a famous thought experiment in which a mischievous imp uses knowledge of the velocities of gas molecules in a box to decrease the entropy of the gas, which could then be used to do useful work such as pushing a piston. This is a classic example of converting information (what the gas molecules are doing) into work. But of course that kind of phenomenon is much more widespread — it happens any time a company or organization hires someone in order to take advantage of their know-how. César Hidalgo has become an expert in this relationship between information and work, both at the level of physics and how it bubbles up into economies and societies. Looking at the world through the lens of information brings new insights into how we learn things, how economies are structured, and how novel uses of data will transform how we live.
César Hidalgo received his Ph.D. in physics from the University of Notre Dame. He currently holds an ANITI Chair at the University of Toulouse, an Honorary Professorship at the University of Manchester, and a Visiting Professorship at Harvard’s School of Engineering and Applied Sciences. From 2010 to 2019, he led MIT’s Collective Learning group. He is the author of Why Information Grows and co-author of The Atlas of Economic Complexity. He is a co-founder of Datawheel, a data visualization company whose products include the Observatory of Economic Complexity.
I was also piqued at the mention of Lynne Kelly’s work, which I’m now knee deep into. I suspect it could dramatically expand on what we think of as the capacity of a personbyte, though the limit of knowledge there still exists. The idea of mnemotechniques within indigenous cultures certainly expands on the way knowledge worked in prehistory and what we classically think of and frame collective knowledge or collective learning.
I also think there are some interesting connections with Dr. Kelly’s mentions of social equity in prehistorical cultures and the work that Hidalgo mentions in the middle of the episode.
There are a small handful of references I’ll want to delve into after hearing this, though it may take time to pull them up unless they’re linked in the show notes.
hat-tip: Complexity Digest for the reminder that this is in my podcatcher. 🔖 November 22, 2019 at 03:28PM
This newsletter has not been written by a GPT-2 text generator, but you can now find a lot of artificially created text that has been.
For those not familiar with GPT-2, it is, according to its creators OpenAI (a socially conscious artificial intelligence lab overseen by a nonprofit entity), “a large-scale unsupervised language model which generates coherent paragraphs of text.” Think of it as a computer that has consumed so much text that it’s very good at figuring out which words are likely to follow other words, and when strung together, these words create fairly coherent sentences and paragraphs that are plausible continuations of any initial (or “seed”) text.
This isn’t a very difficult problem and the underpinnings of it are well laid out by John R. Pierce in *[An Introduction to Information Theory: Symbols, Signals and Noise](https://amzn.to/32JWDSn)*. In it he has a lot of interesting tidbits about language and structure from an engineering perspective including the reason why crossword puzzles work.
November 13, 2019 at 08:33AM
The most interesting examples have been the weird ones (cf. HI7), where the language model has been trained on narrower, more colorful sets of texts, and then sparked with creative prompts. Archaeologist Shawn Graham, who is working on a book I’d like to preorder right now, An Enchantment of Digital Archaeology: Raising the Dead with Agent Based Models, Archaeogaming, and Artificial Intelligence, fed GPT-2 the works of the English Egyptologist Flinders Petrie (1853-1942) and then resurrected him at the command line for a conversation about his work. Robin Sloan had similar good fun this summer with a focus on fantasy quests, and helpfully documented how he did it.
Circle back around and read this when it comes out.
Similarly, these other references should be an interesting read as well.
November 13, 2019 at 08:36AM
From this perspective, GPT-2 says less about artificial intelligence and more about how human intelligence is constantly looking for, and accepting of, stereotypical narrative genres, and how our mind always wants to make sense of any text it encounters, no matter how odd. Reflecting on that process can be the source of helpful self-awareness—about our past and present views and inclinations—and also, some significant enjoyment as our minds spin stories well beyond the thrown-together words on a page or screen.
And it’s not just happening with text, but it also happens with speech as I’ve written before: Complexity isn’t a Vice: 10 Word Answers and Doubletalk in Election 2016 In fact, in this mentioned case, looking at transcripts actually helps to reveal that the emperor had no clothes because there’s so much missing from the speech that the text doesn’t have enough space to fill in the gaps the way the live speech did.
November 13, 2019 at 08:43AM
This demo enables forensic inspection of the visual footprint of a language model on input text to detect whether a text could be real or fake.
The data and virtual unwrapping results on the En-Gedi scroll.See the following papers for more information:Seales, William Brent, et al. "From damage to discovery via virtual unwrapping: Reading the scroll from En-Gedi." Science advances 2.9 (2016): e1601247. (Web Article)Segal, Michael, et al. "An Early Leviticus Scroll From En-Gedi: Preliminary Publication." Textus 26 (2016): 1-30. (PDF)
The written word has been used throughout history to chronicle and contemplate the human experience, but many valuable texts are “lost” to us due to damage. The words of these documents and the knowledge they seek to impart are locked behind the destruction and decay wrought by time and injury, while the physical manuscripts themselves form an “invisible library” of sorts — closeted away on dark shelves, well-protected but prevented from proffering knowledge and encouraging inquiry. For more than 20 years, Dr. Seales has been working to create and use hi-tech, non-invasive tools to rescue these lost texts from the blink of oblivion and restore them to humanity. We call this innovative process “virtual unwrapping.”
The eruption of Mt. Vesuvius covered the city of Herculaneum in twenty meters of lava, simultaneously destroying the Herculaneum scrolls through carbonization and preserving the scrolls by protecting them from the elements. Unwrapping the scrolls would damage them, but researchers are anxious to read the texts. Researchers from the University of Kentucky collaborated with the Institut de France and SkyScan to digitally unwrap and preserve the scrolls. To learn more about the EDUCE project, go to http://cs.uky.edu/dri.