distant reading | Chris Aldrich

👓 Humane Ingenuity 9: GPT-2 and You | Dan Cohen | Buttondown

Read Humane Ingenuity 9: GPT-2 and You by

Dan Cohen (buttondown.email)

This newsletter has not been written by a GPT-2 text generator, but you can now find a lot of artificially created text that has been.

For those not familiar with GPT-2, it is, according to its creators OpenAI (a socially conscious artificial intelligence lab overseen by a nonprofit entity), “a large-scale unsupervised language model which generates coherent paragraphs of text.” Think of it as a computer that has consumed so much text that it’s very good at figuring out which words are likely to follow other words, and when strung together, these words create fairly coherent sentences and paragraphs that are plausible continuations of any initial (or “seed”) text.

This isn’t a very difficult problem and the underpinnings of it are well laid out by John R. Pierce in *[An Introduction to Information Theory: Symbols, Signals and Noise](https://amzn.to/32JWDSn)*. In it he has a lot of interesting tidbits about language and structure from an engineering perspective including the reason why crossword puzzles work.
November 13, 2019 at 08:33AM

The most interesting examples have been the weird ones (cf. HI7), where the language model has been trained on narrower, more colorful sets of texts, and then sparked with creative prompts. Archaeologist Shawn Graham, who is working on a book I’d like to preorder right now, An Enchantment of Digital Archaeology: Raising the Dead with Agent Based Models, Archaeogaming, and Artificial Intelligence, fed GPT-2 the works of the English Egyptologist Flinders Petrie (1853-1942) and then resurrected him at the command line for a conversation about his work. Robin Sloan had similar good fun this summer with a focus on fantasy quests, and helpfully documented how he did it.

Circle back around and read this when it comes out.

Similarly, these other references should be an interesting read as well.
November 13, 2019 at 08:36AM

From this perspective, GPT-2 says less about artificial intelligence and more about how human intelligence is constantly looking for, and accepting of, stereotypical narrative genres, and how our mind always wants to make sense of any text it encounters, no matter how odd. Reflecting on that process can be the source of helpful self-awareness—about our past and present views and inclinations—and also, some significant enjoyment as our minds spin stories well beyond the thrown-together words on a page or screen.

And it’s not just happening with text, but it also happens with speech as I’ve written before: Complexity isn’t a Vice: 10 Word Answers and Doubletalk in Election 2016 In fact, in this mentioned case, looking at transcripts actually helps to reveal that the emperor had no clothes because there’s so much missing from the speech that the text doesn’t have enough space to fill in the gaps the way the live speech did.
November 13, 2019 at 08:43AM

Latin Pedagogy and the Digital Humanities

I’ve long been a student of the humanities (and particularly the classics) and have recently begun reviewing over my very old and decrepit knowledge of Latin. It’s been two decades since I made a significant study of classical languages, and lately (as the result of conversations with friends like Dave Harris, Jim Houser, Larry Richardson, and John Kountouris) I’ve been drawn to reviewing them for reading a variety of classical texts in their original languages. Fortunately, in the intervening years, quite a lot has changed in the tools relating to pedagogy for language acquisition.

Jenny's Second Year Latin — A copy of Jenny’s Latin text which I had used 20 years ago and recently acquired a new copy for the pittance of $3.25.

Internet

The biggest change in the intervening time is the spread of the internet which supplies a broad variety of related websites with not only interesting resources for things like basic reading and writing, but even audio sources apparently including listening to the nightly news in Latin. There are a variety of blogs on Latin as well as even online courseware, podcasts, pronunciation recordings, and even free textbooks. I’ve written briefly about the RapGenius platform before, but I feel compelled to mention it as a potentially powerful resource as well. (Julius Caesar, Seneca, Ovid, Cicero, et al.) There is a paucity of these sources in a general sense in comparison with other modern languages, but given the size of the niche, there is quite a lot out there, and certainly a mountain in comparison to what existed only twenty years ago.

Software

There has also been a spread of pedagogic aids like flashcard software including Anki and Mnemosyne with desktop, web-based, and even mobile-based versions making learning available in almost any situation. The psychology and learning research behind these types of technologies has really come a long way toward assisting students to best make use of their time in learning and retaining what they’ve learned in long term memory. Simple mobile applications like Duolingo exist for a variety of languages – though one doesn’t currently exist for classical Latin (yet).

Digital Humanities

The other great change is the advancement of the digital humanities which allows for a lot of interesting applications of knowledge acquisition. One particular one that I ran across this week was the Dickinson College Commentaries (DCC). Specifically a handful of scholars have compiled and documented a list of the most common core vocabulary words in Latin (and in Greek) based on their frequency of appearance in extant works. This very specific data is of interest to me in relation to my work in information theory, but it also becomes a tremendously handy tool when attempting to learn and master a language. It is a truly impressive fact that, simply by knowing that if one can memorize and master about 250 words in Latin, it will allow them to read and understand 50% of most written Latin. Further, knowledge of 1,500 Latin words will put one at the 80% level of vocabulary mastery for most texts. Mastering even a very small list of vocabulary allows one to read a large variety of texts very comfortably. I can only think about the old concept of a concordance (which was generally limited to heavily studied texts like the Bible or possibly Shakespeare) which has now been put on some serious steroids for entire cultures. Another half step and one arrives at the Google Ngram Viewer.

The best part is that one can, with very little technical knowledge, easily download the DCC Core Latin Vocabulary (itself a huge research undertaking) and upload and share it through the Anki platform, for example, to benefit a fairly large community of other scholars, learners, and teachers. With a variety of easy-to-use tools, shortly it may be even that much easier to learn a language like Latin – potentially to the point that it is no longer a dead language. For those interested, you can find my version of the shared DCC Core Latin Vocabulary for Anki online; the DCC’s Chris Francese has posted details and a version for Mnemosyne already.

[Editor’s note: Anki’s web service occasionally clears decks of cards from their servers, so if you find that the Anki link to the DCC Core Latin is not working, please leave a comment below, and we’ll re-upload the deck for shared use.]

What tools and tricks do you use for language study and pedagogy?

Read Vocabulary Study with Mnemosyne by Chris Francese (Dickinson College Commentaries)

Learning any language involves acquiring a large amount of vocabulary. For this reason, I think it is very useful for Latin and Greek students to put time and effort into systematic vocabulary study.

I’ve added a copy of the DCC Core Latin Vocabulary to the Anki platform for those interested in utilizing it there instead of on Mnemosyne. The cards can be found/downloaded at: https://ankiweb.net/shared/info/1342288910. My personal thanks to the DCC for posting and sharing the results of their research and work in this manner. This is a brilliant example of the concept of digital humanities.