Vocabulary notebooks, Criminally Insane Asylum Patients, Zettelkasten, the Thesaurus Linguae Latinae, and Digital Dictionaries

A Sixth Grade Vocabulary Notebook

The sixth grade language arts class at the school in Altadena, CA, which my daughter attends, has a weekly set of vocabulary exercises which they keep in a simple composition notebook. Each week the teacher picks two vocabulary words (eg: passage, intelligent) and throughout the week the students fill in bits of knowledge about the word itself. On Monday they write down the word, a preliminary definition of it in their own words, a quick sketch or drawing of their perception of the word, and any prior knowledge they have of it. On Tuesday they revisit the words and look up dictionary definitions and write them down in their notebooks. On Wednesday they compose an original sentence using the words. Thursday finds them filling in spaces under each word with their morphologies, and variations with prefixes and suffixes. Finally on Friday they complete the weekly exercise by writing down synonyms and antonyms for the week’s words.

When I saw their notebooks at a recent open house night, it immediately reminded me of a now partially forgotten lexicographer’s and grammarian’s practices of excerpting (ars excerpendi) and collecting examples of sentences and words on slips of paper. Examples of this can be seen in the editing and creation of the Oxford English Dictionary, the Thesaurus Linguae Latinae (Latin for Thesaurus of the Latin Language), and the Wörterbuch der ägyptischen Sprache (German for Dictionary of the Egyptian Language).

I first became aware of the practice when reading Simon Winchester’s entertaining book The Professor and the Madman: A Tale of Murder, Insanity, and the Making of the Oxford English Dictionary. In the book , Winchester describes the pigeonhole and slip system that Oxford professor James Murray and collaborators used to create the Oxford English Dictionary (OED). The editors of the dictionary put out a call to readers to note down interesting everyday words they found in their reading along with example sentences and source references. They then collected these words alphabetically into pigeonholes and from here were able to collectively compile their magisterial dictionary which uses the collected example sentences. While tangentially about the creation of the OED, the heart of the fascinating story in the book focuses on Dr. William C. Minor, a Civil War veteran and a convicted murderer living in Britain in the Broadmoor Criminal Lunatic Asylum, who began a long written correspondence with James Murray by sending in over ten thousand slips with words from his personal reading. Many years went by between the two men before the dictionary editor realized that his collaborator was in an insane asylum. The 1998 book was ultimately turned into the 2019 movie starring Mel Gibson and Sean Penn.

Movie poster for The Professor and the Madman featuring large period photos of both Sean Penn and Mel Gibson comprising most of the image with a silhouette of a large castle-like sanitorium with a sun setting below them.

Thesaurus Linguae Latinae

Somewhat similar to the compilation of the Oxford English Dictionary which predated it is the ongoing compilation of the Thesaurus Linguae Latinae (TLL). An academic research project begun in 1894 and projected to be finished by a team of international scholars sometime around 2050, the TLL is a massive dictionary written entirely in Latin which contains every instance of every known Latin word in every known medium (manuscripts, scrolls, artworks, coins, buildings, monuments, graffiti, etc.) from the beginning of the language down to the 2nd century CE and from then on, every lexicographically significant instance from that time until the 6th century CE.

The Thesaurus Linguae Latinae used the Meusel system for creating zettel (a German word meaning slip) by utilizing double folio sheets onto which they copied text in hectographic ink which can be reproduced by lithography before cutting them up into individual slips. It took approximately five years of collecting and excerpting material before the researchers of the TLL began writing “articles”, by which they mean individual entries in their dictionary of Latin words. Because of the time-consuming work to research and write individual articles, researchers are individually credited within the Thesaurus for their work on individual words.

Between the 2nd and 6th centuries CE, the Thesaurus Linguae Latinae doesn’t excerpt every single word in written Latin, just what the researchers thought was lexicographically significant. As an example, they didn’t excerpt all of Saint Augustine’s works because if they had, the collection would have been approximately 50% larger because Augustine was such a prolific writer.

The magisterial zettelkasten (German for slip box) which powers the Thesaurus Linguae Latinae is befittingly housed on the top floors of the Residenz, the former palace of the Bavarian royal family, now a part of the Bavarian Academy (Bayerische Akademie der Wissenschaften) in Munich, Germany.

slip for the word sentio
An example slip in the TLL for the word “sentio”.

The slips in the TLL’s collection are organized alphabetically by headword (or catchword) in a box in the top right hand side of the card and then secondarily by their appearance or publication in chronological time, which is indicated in a box on the top left of each slip. The number of copies of each slip is written in the bottom left hand corner and circled. Within the text excerpts on the cards themselves, occurrences of the word are underlined in red.

Basic statistics regarding the Thesaurus:

  • comprised of approximately 55,000 ancient Latin vocabulary words
    • 10,000,000+ slips
    • stored in about 6,500 boxes
    • with approximately 1,500 slips per box
  • excerpted from a library of 32,000 volumes
  • contributors: 375 scholars from 20 different countries, with:
    • 12 Indo-European language specialists
    • 8 romance language specialists
    • 100 proof-readers
  • approximately 44,000 words published in their dictionary already
    • published content: 70% of the entire vocabulary
    • print run: 1,350 copies
    • Publisher: consortium of 35 academies from 27 countries on 5 continents
  • Longest remaining words which remain to be compiled into the dictionary
    • non / 37 boxes of ca. 55,500 slips
    • qui, quae, quod / 65 boxes of ca. 96,000 slips
    • sum, esse, fui / 54.5 boxes of ca. 81,750 slips
    • ut / 35 boxes of ca. 52,500 slips

As a point of comparison, the upper end of prolific academic researchers and note takers who use index card collections for their lifelong research (25-40 year careers) have compiled collections of 90,000 (Niklas Luhmann), 70,000+ (Gotthard Deutsch), 30,000 (Hans Blumenberg), 27,000+ (S.D. Goitein) and 12,500 slips (Roland Barthes). This means that there are individual Latin words in the TLL have more slips than these researchers produced in their research lifetimes.

A sample of the note cards being used to compile the TLL. Courtesy of Samuel Beckelhymer.

Living languages

While many think of Latin as a “dead language”, something one notices quickly about the articles in the TLL is that words changed meanings over the span of time which they were in use. Linguists call this change in word meaning over time semantic shift. Many articles focus on these subtle changes and different meanings over time. Often words with only a few hundred attestations in the corpus of the language will be quoted and cited in articles about them with every example of use along with their contexts to help highlight these subtleties. Just like people had the choice of which words to use in the ancient world, we have those same choices today and this is where the use of modern dictionaries and thesauruses can make our words and word choices more exciting.

Normally, a dictionary just tells you what words mean—and of course we do that—but the scale of the project gives us the space and opportunity to say what we’re not sure of too. This is important because it leaves the door open for further scholarship and it gives the reader choices rather than dictating to them what to think. The dictionary can be a catalyst for more research and this is what makes the dictionary a living thing.—⁠⁠Adam Gitner, a TLL scholar

Slip box for the word ‘requiro’ © Adam Gitner
TLL slip archive © Adam Gitner

For those interested in more details on the TLL, Kathleen Coleman’s presentation on YouTube is a fantastic resource and primer on what is in it, how they built it and current work:

TLL Podcast and the Wordhord

Based on the history and usage of the Latin word horreum, which is featured in the first episode of the Thesaurus Linguae Latinae podcast, I can’t help but think that not only is the word ever so apropos for an introduction to some of the TLL, but it does quite make an excellent word for translating the idea of card index in English or Zettelkasten from German into Latin: “My horreum is a storehouse or treasury for my thoughts and ideas which nourishes my desire to discover and build upon my knowledge.” One might also notice that the Latin word horreum is also cognate with the fun Old English word “wordhord” that one encounters in classics like Beowulf and which roughly translates as one’s brain or their memory, especially for words.

Wörterbuch der ägyptischen Sprache (A Dictionary of the Egyptian Language)

Like the Thesaurus Linguae Latinae the Wörterbuch der ägyptischen Sprache was an international collaborative zettelkasten project. Started in 1897, it was finally published as five volumes in 1926.

The structure of the filing system for the Wörterbuch der ägyptischen Sprache (Wb) was designed based on the work done for the Thesaurus Linguae Latinae started three years earlier. Texts in the collection were roughly divided into passages of about 30 words and written in hieroglyphic form on postcard-sized slips of paper. The heading contained the designation of the text and the body included the texts’ context (inscriptions, etc.) as well as a preliminary translation of the passage.

These passages were then cross-referenced with other occurrences of the hieroglyphics to provide better progressive translations which ultimately appeared in the final manuscript. As a result some of the translations on the cards were incomplete as work proceeded and cross-comparisons of individual words were puzzled out.

A slip showing a passage of text from the victory stele of Sesostris III at the Nubian fortress of Semna. The handwriting is that of project leader Adolf Erman, who had “already struggled with the text as a high school student”.

With support from the German Research Foundation, the 1.5 million sheets of the Wörterbuch der ägyptischen Sprache began to be digitized and put online in 1997. The Digitized Card Archive (DZA) of the Dictionary of the Egyptian Language (Wörterbuch der ägyptischen Sprache) has been available on the Internet since 1999. The archive can be searched at: https://aaew.bbaw.de/tla/servlet/DzaIdx. Since 2004, the materials and query functions have been integrated into the larger Thesaurus Linguae Aegyptiae project at https://aaew.bbaw.de/thesaurus-linguae-aegyptiae.

Wörterbuch der ägyptischen Sprache by Adolph Erman and Hermann Grapow can be viewed online using the Wb. browser at https://aaew.bbaw.de/tla/servlet/WbImgBrowser. Links from reference points within the dictionary go directly to corresponding slips of paper in the digitized slip archive.

Although he’s a fictional character, given one could suppose that given his areas of specialization in archaeology, Indiana Jones would certainly have been aware of the Wörterbuch, would likely have used it, and may even have worked on it as a young college student.

The method used for indexing the Wörterbuch der ägyptischen Sprache and the Thesaurus Linguae Latinae is now generally known as a key word in context (KWIC) index. The design of these sorts of indices is now a subject within the realm of computer science and database design. Given that the work on the TLL has taken over 100 years, could it be possible that digital versions might speed up the process of excerpting, collating, and writing articles in the future? Perhaps these examples might be used for compiling other languages in the future.

Modern day practice: Wordnik and Hypothes.is

Having looked at some historical word and idea collecting practices, how might one do this sort of work in a modern, digital world? A similar word collecting scheme is currently happening on the internet now, though perhaps with a bit more focus on interesting neologisms (and hopefully without many insane asylum patients.) The lovely folks at the online dictionary Wordnik have been using the digital annotation tool Hypothes.is to collect examples of words as they happen in the wild. One can create a free account on the Hypothes.is service and quickly and easily begin collecting words for their dictionary efforts by highlighting example sentences and tagging them with “wordnik” and “hw-[InsertFoundWordHere]”.

So for example, I was reading about the clever new animations in the language app Duolingo and came across a curious new word (at least to me): viseme.

To create accurate animations, we generate the speech, run it through our in-house speech recognition and pronunciation models, and get the timing for each word and phoneme (speech sound). Each sound is mapped onto a visual representation, or viseme, in a set we designed based on linguistic features.

So I clicked on my handy browser extension for Hypothes.is, highlighted the sentence with a bit of context, and tagged it with “wordnik” and “hw-viseme”. The “hw-” prefix ostensibly means “head word” which is how lexicographers refer to the words you see defined in dictionaries.

Then the fine folks at Wordnik are able to access the public annotations matching the tag Wordnik, and use Hypothes.is’ API to pull in the collections of new words for inclusion into their ever-growing corpus of examples. Lexicographers can then use examples of words appearing in context to define, study, and research their meanings and their shifts in meaning over time.

Since I’ve collected interesting new words and neologisms for ages anyway, this has been a quick and easy method of helping out other like-minded wordhoarders along the way. (Note how this last sentence has brought wordhord back into more active usage with a tinge of shift?!) In addition to the ability to help out others, a side benefit of the process is that the collected words are all publicly available for reading and using in daily life! You can not only find the public page for Wordnik words on Hypothes.is, but you can subscribe to it via RSS to see all the clever and interesting neologisms appearing in the English language as collected in real time! So if you’re the sort who enjoys touting new words at cocktail parties, a rabid cruciverbalist who refuses to be stumped by this week’s puzzle, or a budding lexicographer yourself, you’ve now got a fantastic new resource! I’ve found it to be far more entertaining and intriguing than any ten other word-of-the-day efforts I’ve seen in published calendar or internet form.

If you like, there’s also a special Hypothes.is group you can apply to join to more easily aid in the effort. Want to know more about Wordnik and their mission, check out their informative Kickstarter page.

Expanding the sixth grade practice

The basic pedagogic exercise I’ve described above is an incredibly solid base for nearly any school-aged child. But with some of the historical context we’ve explored, the weekly word notebook exercise could be expanded. Some could be done during the week while others could be done at a later date/time, which could serve as potential (spaced repetition) reminders to students as they see words throughout the year potentially for bonus points.

What is the earliest attestation (evidence or proof of existence) of a word?

Can students find attestations of their words during their weekly reading or reading later in the year?

What is the word’s etymology? What other words sound like it or are related to it? What words are cognate to it in other languages they might be studying/learning? These could be collected too.

What new and interesting words are students coming across that they haven’t seen before in their own reading? Bonus points for doing additional words they find themselves, or add them to the queue of the words the teacher assigns on future weeks.

Double bonus points for finding new words in their reading that are neologisms which aren’t in the dictionary yet. Can they find and add words to the Wordnik dictionary using Hypothes.is?

Instead of using a notebook for their supplemental wordhord, students might try the older practice of keeping their words on index cards and storing them in a zettelkasten just like the OED, the TLL, or the Wb. A shoebox works nicely and can be fun to decorate, but there are fancier boxes out there. Here they might also be used as flashcards for occasional review. Students can index them alphabetically and perhaps their example sentences may come in handy later in life while they’re doing their own writing (see Draft No. 4 and boxing words.) Perhaps their collections will come in handy at the end of high school when they take the SAT or the ACT tests? Might their collections rival those of famed academics like Niklas Luhmann, Gotthard Deutsch, Hans Blumenberg, S.D. Goitein or Roland Barthes? Maybe they’ll become professional lexicographers and help to finish up work on the TLL later in life?

For a fun math exercise, can students calculate how long it would take them (individually or as a class) to copy out 10,000,000 slips for their words at the pace of two or three words a week? How many notebooks would this require? Would they fit into their classroom? their house, their library, or their school?

What other ideas might one add to such a classroom exercise?


Watched Lecture 3 of 36: Introduction to the Subjunctive Mood by Hans-Friedrich Mueller from Latin 101: Learning a Classical Language | The Great Courses
See how the long vowel a" is the key to the present subjunctive mood in verbs such as pono (I place). The subjunctive expresses doubt or potential, and you explore its use by the poet Catullus in one of the most famous love poems to survive from the ancient world."

“Linguam Latīnam discunt, ut in Rōmā antīquā vīvant.” They learn the Latin language, so that they may live in ancient Rome. Intellectually that is. As a way to forget about the present troubles, which is actually a pretty good reason to learn Latin.

How did he know!?!

Notes on my wiki.

Watched Lecture 2 of 36: Introduction to Third-Conjugation Verbs by Hans-Friedrich Mueller from Latin 101: Learning a Classical Language | The Great Courses
Begin your adventure in Latin verbs with the third conjugation, practicing the present tense indicative of ago (I do). Learn the four principal parts of ago-the key words that allow you to conjugate any form-as well as the imperative endings that permit you to issue commands.
Notes on my wiki.
Watched Lecture 1 of 36: Pronouncing Classical Latin by Hans-Friedrich Mueller from Latin 101: Learning a Classical Language | The Great Courses
Salvete! Greetings! Ease into your study of Latin by admiring its beauty and impressive history. Then focus on the letters and sounds of the restored classical pronunciation, which approximates the way Latin was spoken in the classical era. Finally, cover the rules of accents.
Finished lecture one. I quite like the classical pronunciation.

Notes on my wiki.

I was doing some reading and thinking about how one might translate the idea of blogging into Latin. I tried entering “I am blogging.” into Google translate just to see what would come out. Perhaps it’s just a glitch in their translation algorithm, but the response felt apropos to me.

A screen capture of Google Translate's attempt to translate "I am blogging." into Latin. It outputs "Ego nullam dolore."

“Ego nullam dolore.” translated back into English is “I have no pain.”

🎧 The Power Of Categories | Invisibilia (NPR)

Listened to The Power Of Categories from Invisibilia | NPR.org
The Power Of Categories examines how categories define us — how, if given a chance, humans will jump into one category or another. People need them, want them. The show looks at what categories provide for us, and you'll hear about a person caught between categories in a way that will surprise you. Plus, a trip to a retirement community designed to help seniors revisit a long-missed category.
The transgender/sexual dysphoria story here is exceedingly interesting because it could potentially have some clues to how those pieces of biology work and what shifts things in one direction or another. How is that spectrum created/defined? A few dozen individuals like that could help provide an answer.

The story about the Indian retirement community in Florida is interesting, but it also raises the (unasked, in the episode at least) question of the detriment it can do to a group of people to be lead by some the oldest members of their community. The Latin words senīlis ‎(“of or pertaining to old age”) and senex ‎(“old”) are the roots of words like senate, senescence, senility, senior, and seniority, and though it’s nice to take care of our elders, the younger generations should take a hard look at the unintended consequences which may stem from this.

In some sense I’m also reminded about Thomas Kuhn’s book The Structure of Scientific Revolutions and why progress in science (and yes, society) is held back by the older generations who are still holding onto outdated models. Though simultaneously, they do provide some useful “brakes” on both velocity of change as well as potential ill effects which could be damaging in short timeframes.

Collective learning has potentially been growing at the expense of a shrinking body of diverse language

Yesterday, I saw an interesting linguistic exercise:

Short activity to show how flexible our language is and how difficult collective learning would have been for our non sapiens ancestors.

Step 1: As a class, choose 200 random words. (I had 15 kids choose 14 words each)

Step 2: Answer the following questions using only the words listed:

  1. How should we try to kill that mammoth?
  2. Explain why you should marry me.
  3. Give directions for a simple task.
  4. Come up with a plan to improve our cave.
  5. Describe a physical landscape.
  6. Come up with your own question!
Chris Scaturo
on February 3 at 8:44am in Yammer Group on Big History: Unit 6 – Early Humans Group

I have to imagine that once the conceptualization of language and some basic grammar existed, word generation was a much more common thing than it is now. It’s only been since the time of Noah Webster that humans have been actively standardizing things like spelling. If we can use Papua New Guinea as a model of pre-agrarian society and consider that almost 12% of extant languages on the Earth are spoken in an area about the size of Texas (and with about 1/5th the population of Texas too), then modern societies are actually severely limiting language (creation, growth, diversity, creativity, etc.) [cross reference: A World of Languages – and How Many Speak Them (Infographic)]

Consider that the current extinction of languages is about one every 14 weeks, which puts us on a course to loose about half of the 7,100 languages on the planet right now before the end of the century. Collective learning has potentially been growing at the expense of a shrinking body of diverse language! In the paper “Global distribution and drivers of language extinction risk” the authors indicate that of all the variables tested, economic growth was most strongly linked to language loss.

To help put this exercise into perspective, we can look at the corpus of extant written Latin (a technically dead language):

“It is a truly impressive fact that, simply by knowing that if one can memorize and master about 250 words in Latin, it will allow them to read and understand 50% of most written Latin. Further, knowledge of 1,500 Latin words will put one at the 80% level of vocabulary mastery for most texts. Mastering even a very small list of vocabulary allows one to read a large variety of texts very comfortably.”

with data from Dickinson College Commentaries

These numbers become even smaller when considering ancient Greek texts.

Another interesting measurement is the vocabulary of a modern 2 year old who typically has a 50-75 word vocabulary while a 4 year old has 250-500 words, which is about the level of the exercise.

As a contrast, consider the message in this TED Youth Talk from last year by Erin McKean, which students should be able to relate to:

And of course, there’s the dog Chaser, which 60 minutes recently reported has a vocabulary of over 1,000 words. (Are we now destroying variants of “dog language” for English too?!)

Hopefully the evolutionary value of the loss of the multiple languages will be more than balanced out by the power of collective learning in the long run.

Brief Review: The Swerve: How the World Became Modern by Stephen Greenblatt

The Swerve: How the World Became ModernThe Swerve: How the World Became Modern by Stephen Greenblatt

My rating: 4 of 5 stars

Stephen Greenblatt provides an interesting synthesis of history and philosophy. Greenblatt’s love of the humanities certainly shines through. This stands as an almost over-exciting commercial for not only reading Lucretius’s “De Rerum Natura” (“On the Nature of Things”), but in motivating the reader to actually go out to learn Latin to appreciate it properly.

I would have loved more direct analysis and evidence of the immediate impact of Lucretius in the 1400’s as well as a longer in-depth analysis of the continuing impact through the 1700’s.

The first half of the book is excellent at painting a vivid portrait of the life and times of Poggio Bracciolini which one doesn’t commonly encounter. I’m almost reminded of Stacy Schiff’s Cleopatra: A Life, though Greenblatt has far more historical material with which to paint the picture. I may also be biased that I’m more interested in the mechanics of the scholarship of the resurgence of the classics in the Renaissance than I was of that particular political portion of the first century BCE. Though my background on the history of the time periods involved is reasonably advanced, I fear that Greenblatt may be leaving out a tad too much for the broader reading public who may not be so well versed. The fact that he does bring so many clear specifics to the forefront may more than compensate for this however.

In some interesting respects, this could be considered the humanities counterpart to the more science-centric story of Owen Gingerich’s The Book Nobody Read: Chasing the Revolutions of Nicolaus Copernicus. Though Simon Winchester is still by far my favorite nonfiction writer, Greenblatt does an exceedingly good job of narrating what isn’t necessarily a very linear story.

Greenblatt includes lots of interesting tidbits and some great history. I wish it had continued on longer… I’d love to have the spare time to lose myself in the extensive bibliography. Though the footnotes, bibliography, and index account for about 40% of the book, the average reader should take a reasonable look at the quarter or so of the footnotes which add some interesting additional background an subtleties to the text as well as to some of the translations that are discussed therein.

I am definitely very interested in the science behind textual preservation which is presented as the underlying motivation for the action in this book. I wish that Greenblatt had covered some of these aspects in the same vivid detail he exhibited for other portions of the story. Perhaps summarizing some more of the relevant scholarship involved in transmitting and restoring old texts as presented in Bart Ehrman and Bruce Metzter’s The Text of the New Testament: Its Transmission, Corruption & Restoration would have been a welcome addition given the audience of the book. It might also have presented a more nuanced picture of the character of the Church and their predicament presented in the text as well.

Though I only caught one small reference to modern day politics (a prison statistic for America which was obscured in a footnote), I find myself wishing that Greenblatt had spent at least a few paragraphs or even a short chapter drawing direct parallels to our present-day political landscape. I understand why he didn’t broach the subject as it would tend to date an otherwise timeless feeling text and generally serve to dissuade a portion of his readership and in particular, the portion which most needs to read such a book. I can certainly see a strong need for having another short burst of popularity for “On the Nature of Things” to assist with the anti-science and overly pro-religion climate we’re facing in American politics.

For those interested in the topic, I might suggest that this text has some flavor of Big History in its DNA. It covers not only a fairly significant chunk of recorded human history, but has some broader influential philosophical themes that underlie a potential change in the direction of history which we’ve been living for the past 300 years. There’s also an intriguing overlap of multidisciplinary studies going on in terms of the history, science, philosophy, and technology involved in the multiple time periods discussed.

This review was originally posted on GoodReads.com on 7/8/2014. View all my reviews

Latin Pedagogy and the Digital Humanities

I’ve long been a student of the humanities (and particularly the classics) and have recently begun reviewing over my very old and decrepit knowledge of Latin.  It’s been two decades since I made a significant study of classical languages, and lately (as the result of conversations with friends like Dave Harris, Jim Houser, Larry Richardson, and John Kountouris) I’ve been drawn to reviewing them for reading a variety of classical texts in their original languages. Fortunately, in the intervening years, quite a lot has changed in the tools relating to pedagogy for language acquisition.

Jenny's Second Year Latin
A copy of Jenny’s Latin text which I had used 20 years ago and recently acquired a new copy for the pittance of $3.25.


The biggest change in the intervening time is the spread of the  internet which supplies a broad variety of related websites with not only interesting resources for things like basic reading and writing, but even audio sources apparently including listening to the nightly news in Latin. There are a variety of blogs on Latin as well as even online courseware, podcasts, pronunciation recordings, and even free textbooks. I’ve written briefly about the RapGenius platform before, but I feel compelled to mention it as a potentially powerful resource as well. (Julius Caesar, Seneca, Ovid, Cicero, et al.) There is a paucity of these sources in a general sense in comparison with other modern languages, but given the size of the niche, there is quite a lot out there, and certainly a mountain in comparison to what existed only twenty years ago.


There has also been a spread of pedagogic aids like flashcard software including Anki and Mnemosyne with desktop, web-based, and even mobile-based versions making  learning available in almost any situation. The psychology and learning research behind these types of technologies has really come a long way toward assisting students to best make use of their time in learning and retaining what they’ve learned in long term memory.  Simple mobile applications like Duolingo exist for a variety of languages – though one doesn’t currently exist for classical Latin (yet).

Digital Humanities

The other great change is the advancement of the digital humanities which allows for a lot of interesting applications of knowledge acquisition. One particular one that I ran across this week was the Dickinson College Commentaries (DCC). Specifically a handful of scholars have compiled and documented a list of the most common core vocabulary words in Latin (and in Greek) based on their frequency of appearance in extant works.  This very specific data is of interest to me in relation to my work in information theory, but it also becomes a tremendously handy tool when attempting to learn and master a language.  It is a truly impressive fact that, simply by knowing that if one can memorize and master about 250 words in Latin, it will allow them to read and understand 50% of most written Latin.  Further, knowledge of 1,500 Latin words will put one at the 80% level of vocabulary mastery for most texts.  Mastering even a very small list of vocabulary allows one to read a large variety of texts very comfortably.  I can only think about the old concept of a concordance (which was generally limited to heavily studied texts like the Bible or possibly Shakespeare) which has now been put on some serious steroids for entire cultures. Another half step and one arrives at the Google Ngram Viewer.

The best part is that one can, with very little technical knowledge, easily download the DCC Core Latin Vocabulary (itself a huge research undertaking) and upload and share it through the Anki platform, for example, to benefit a fairly large community of other scholars, learners, and teachers. With a variety of easy-to-use tools, shortly it may be even that much easier to learn a language like Latin – potentially to the point that it is no longer a dead language. For those interested, you can find my version of the shared DCC Core Latin Vocabulary for Anki online; the DCC’s Chris Francese has posted details and a version for Mnemosyne already.

[Editor’s note: Anki’s web service occasionally clears decks of cards from their servers, so if you find that the Anki link to the DCC Core Latin is not working, please leave a comment below, and we’ll re-upload the deck for shared use.]

What tools and tricks do you use for language study and pedagogy?

Read Vocabulary Study with Mnemosyne by Chris Francese (Dickinson College Commentaries)
Learning any language involves acquiring a large amount of vocabulary. For this reason, I think it is very useful for Latin and Greek students to put time and effort into systematic vocabulary study.
I’ve added a copy of the DCC Core Latin Vocabulary to the Anki platform for those interested in utilizing it there instead of on Mnemosyne. The cards can be found/downloaded at: https://ankiweb.net/shared/info/1342288910. My personal thanks to the DCC for posting and sharing the results of their research and work in this manner. This is a brilliant example of the concept of digital humanities.