oh my gosh: how did I not know about this counting-out system called Yan-Tan...?!— Laura Gibbs (@OnlineCrsLady) November 15, 2019
I was looking up something about the nursery rhyme Hickory Dickory Dock, which some people think is a counting out rhyme, and that led to this British sheep-counting system: https://t.co/NWfJgyicB6 pic.twitter.com/azXjrpo8ob
This newsletter has not been written by a GPT-2 text generator, but you can now find a lot of artificially created text that has been.
For those not familiar with GPT-2, it is, according to its creators OpenAI (a socially conscious artificial intelligence lab overseen by a nonprofit entity), “a large-scale unsupervised language model which generates coherent paragraphs of text.” Think of it as a computer that has consumed so much text that it’s very good at figuring out which words are likely to follow other words, and when strung together, these words create fairly coherent sentences and paragraphs that are plausible continuations of any initial (or “seed”) text.
This isn’t a very difficult problem and the underpinnings of it are well laid out by John R. Pierce in *[An Introduction to Information Theory: Symbols, Signals and Noise](https://amzn.to/32JWDSn)*. In it he has a lot of interesting tidbits about language and structure from an engineering perspective including the reason why crossword puzzles work.
November 13, 2019 at 08:33AM
The most interesting examples have been the weird ones (cf. HI7), where the language model has been trained on narrower, more colorful sets of texts, and then sparked with creative prompts. Archaeologist Shawn Graham, who is working on a book I’d like to preorder right now, An Enchantment of Digital Archaeology: Raising the Dead with Agent Based Models, Archaeogaming, and Artificial Intelligence, fed GPT-2 the works of the English Egyptologist Flinders Petrie (1853-1942) and then resurrected him at the command line for a conversation about his work. Robin Sloan had similar good fun this summer with a focus on fantasy quests, and helpfully documented how he did it.
Circle back around and read this when it comes out.
Similarly, these other references should be an interesting read as well.
November 13, 2019 at 08:36AM
From this perspective, GPT-2 says less about artificial intelligence and more about how human intelligence is constantly looking for, and accepting of, stereotypical narrative genres, and how our mind always wants to make sense of any text it encounters, no matter how odd. Reflecting on that process can be the source of helpful self-awareness—about our past and present views and inclinations—and also, some significant enjoyment as our minds spin stories well beyond the thrown-together words on a page or screen.
And it’s not just happening with text, but it also happens with speech as I’ve written before: Complexity isn’t a Vice: 10 Word Answers and Doubletalk in Election 2016 In fact, in this mentioned case, looking at transcripts actually helps to reveal that the emperor had no clothes because there’s so much missing from the speech that the text doesn’t have enough space to fill in the gaps the way the live speech did.
November 13, 2019 at 08:43AM
This demo enables forensic inspection of the visual footprint of a language model on input text to detect whether a text could be real or fake.
The rapid improvement of language models has raised the specter of abuse of text generation systems. This progress motivates the development of simple methods for detecting generated text that can be used by and explained to non-experts. We develop GLTR, a tool to support humans in detecting whether a text was generated by a model. GLTR applies a suite of baseline statistical methods that can detect generation artifacts across common sampling schemes. In a human-subjects study, we show that the annotation scheme provided by GLTR improves the human detection-rate of fake text from 54% to 72% without any prior training. GLTR is open-source and publicly deployed, and has already been widely used to detect generated outputs.
From pages 111–116; Florence, Italy, July 28 - August 2, 2019. Association for Computational Linguistics
For years, tech companies have relied on a rhetorical sleight of hand. It’s not working anymore.
Medieval scholar: "Sorry, folks, 'proto-Romance language' is not a thing."
The Voynich manuscript is a famous medieval text written in a mysterious language that so far has proven to be undecipherable. Now, Gerard Cheshire, a University of Bristol academic, has announced his own solution to the conundrum in a new paper in the journal Romance Studies. Cheshire identifies the mysterious writing as a "calligraphic proto-Romance" language, and he thinks the manuscript was put together by a Dominican nun as a reference source on behalf of Maria of Castile, Queen of Aragon. Apparently it took him all of two weeks to accomplish a feat that has eluded our most brilliant scholars for at least a century.
"Meritocracy" was coined as satire; the messaging for and against Medicare for All; and Dutch economic historian Rutger Bregman.
A college admissions scandal has highlighted what people refer to as "the myth of meritocracy." But actually, meritocracy itself is a myth. This week, On the Media looks at the satirical origins of the word and what they tell us about why the US embraces it. Plus, the messaging for and against Medicare for All, as well as a historical look at why we don't have universal healthcare. And economic historian and Tucker Carlson antagonist Rutger Bregman.
Loved hearing about the early origins of the meaning of meritocracy. Obviously we haven’t come close to helping level the playing field.
I see crisis and creators close to each other in the text here and can’t help but think about the neologism “crisis creators” as the thing we should be talking about instead of “crisis actors”, a word that seems to have been created by exactly those “crisis creators”!
The Atlas of Endangered Alphabets is a collection of “indigenous and minority writing systems”, gathered together in the hopes of collecting information about reviving interest in these alphabets. From the about page:
In 2009, when I started work on the first series of carvings that became the Endangered Alphabets Project, times were dark for indigenous and minority cultures. The lightning spread of television and the Internet were driving a kind of cultural imperialism into every corner of the world. Everyone had a screen or wanted a screen, and the English language and the Latin alphabet (or one of the half-dozen other major writing systems) were on every screen and every keyboard. Every other culture was left with a bleak choice: learn the mainstream script or type a series of meaningless tofu squares.
Yet 2019 is a remarkable time in the history of writing systems. In spite of creeping globalization, political oppression, and economic inequalities, minority cultures are starting to revive interest in their traditional scripts. Across the world, calligraphy is turning writing into art; letters are turning up as earrings, words as pendants, proverbs as clothing designs. Individuals, groups, organizations and even governments are showing interest in preserving and protecting traditional writing systems or even creating new ones as way to take back their cultural identity.
The earliest fragments of English reveal how interconnected Europe has been for centuries, finds Cameron Laux. He traces a history of the language through 10 objects and manuscripts.
They suggest not just inadequate manners or polish, but inadequate thought.
An open letter to newsrooms everywhere
In this episode, Haley interviews Natalia Komarova, Chancellor's Professor of the School of Physical Sciences at the University of California, Irvine. Komarova talks with Haley at the Ninth International Conference on Complex Systems about her presentation, which explored using applied mathematics to study the spread of mutants, as well as the evolution of popular music.
There’s some interesting sounding research being described here. Be sure to circle back around to some of her papers.
“If he'd gone to some proper cockney, like me, we'd have got a bit more background.”