Vocabulary notebooks, Criminally Insane Asylum Patients, Zettelkasten, the Thesaurus Linguae Latinae, and Digital Dictionaries

A Sixth Grade Vocabulary Notebook

The sixth grade language arts class at the school in Altadena, CA, which my daughter attends, has a weekly set of vocabulary exercises which they keep in a simple composition notebook. Each week the teacher picks two vocabulary words (eg: passage, intelligent) and throughout the week the students fill in bits of knowledge about the word itself. On Monday they write down the word, a preliminary definition of it in their own words, a quick sketch or drawing of their perception of the word, and any prior knowledge they have of it. On Tuesday they revisit the words and look up dictionary definitions and write them down in their notebooks. On Wednesday they compose an original sentence using the words. Thursday finds them filling in spaces under each word with their morphologies, and variations with prefixes and suffixes. Finally on Friday they complete the weekly exercise by writing down synonyms and antonyms for the week’s words.

When I saw their notebooks at a recent open house night, it immediately reminded me of a now partially forgotten lexicographer’s and grammarian’s practices of excerpting (ars excerpendi) and collecting examples of sentences and words on slips of paper. Examples of this can be seen in the editing and creation of the Oxford English Dictionary, the Thesaurus Linguae Latinae (Latin for Thesaurus of the Latin Language), and the Wörterbuch der ägyptischen Sprache (German for Dictionary of the Egyptian Language).

I first became aware of the practice when reading Simon Winchester’s entertaining book The Professor and the Madman: A Tale of Murder, Insanity, and the Making of the Oxford English Dictionary. In the book , Winchester describes the pigeonhole and slip system that Oxford professor James Murray and collaborators used to create the Oxford English Dictionary (OED). The editors of the dictionary put out a call to readers to note down interesting everyday words they found in their reading along with example sentences and source references. They then collected these words alphabetically into pigeonholes and from here were able to collectively compile their magisterial dictionary which uses the collected example sentences. While tangentially about the creation of the OED, the heart of the fascinating story in the book focuses on Dr. William C. Minor, a Civil War veteran and a convicted murderer living in Britain in the Broadmoor Criminal Lunatic Asylum, who began a long written correspondence with James Murray by sending in over ten thousand slips with words from his personal reading. Many years went by between the two men before the dictionary editor realized that his collaborator was in an insane asylum. The 1998 book was ultimately turned into the 2019 movie starring Mel Gibson and Sean Penn.

Movie poster for The Professor and the Madman featuring large period photos of both Sean Penn and Mel Gibson comprising most of the image with a silhouette of a large castle-like sanitorium with a sun setting below them.

Thesaurus Linguae Latinae

Somewhat similar to the compilation of the Oxford English Dictionary which predated it is the ongoing compilation of the Thesaurus Linguae Latinae (TLL). An academic research project begun in 1894 and projected to be finished by a team of international scholars sometime around 2050, the TLL is a massive dictionary written entirely in Latin which contains every instance of every known Latin word in every known medium (manuscripts, scrolls, artworks, coins, buildings, monuments, graffiti, etc.) from the beginning of the language down to the 2nd century CE and from then on, every lexicographically significant instance from that time until the 6th century CE.

The Thesaurus Linguae Latinae used the Meusel system for creating zettel (a German word meaning slip) by utilizing double folio sheets onto which they copied text in hectographic ink which can be reproduced by lithography before cutting them up into individual slips. It took approximately five years of collecting and excerpting material before the researchers of the TLL began writing “articles”, by which they mean individual entries in their dictionary of Latin words. Because of the time-consuming work to research and write individual articles, researchers are individually credited within the Thesaurus for their work on individual words.

Between the 2nd and 6th centuries CE, the Thesaurus Linguae Latinae doesn’t excerpt every single word in written Latin, just what the researchers thought was lexicographically significant. As an example, they didn’t excerpt all of Saint Augustine’s works because if they had, the collection would have been approximately 50% larger because Augustine was such a prolific writer.

The magisterial zettelkasten (German for slip box) which powers the Thesaurus Linguae Latinae is befittingly housed on the top floors of the Residenz, the former palace of the Bavarian royal family, now a part of the Bavarian Academy (Bayerische Akademie der Wissenschaften) in Munich, Germany.

slip for the word sentio
An example slip in the TLL for the word “sentio”.

The slips in the TLL’s collection are organized alphabetically by headword (or catchword) in a box in the top right hand side of the card and then secondarily by their appearance or publication in chronological time, which is indicated in a box on the top left of each slip. The number of copies of each slip is written in the bottom left hand corner and circled. Within the text excerpts on the cards themselves, occurrences of the word are underlined in red.

Basic statistics regarding the Thesaurus:

  • comprised of approximately 55,000 ancient Latin vocabulary words
    • 10,000,000+ slips
    • stored in about 6,500 boxes
    • with approximately 1,500 slips per box
  • excerpted from a library of 32,000 volumes
  • contributors: 375 scholars from 20 different countries, with:
    • 12 Indo-European language specialists
    • 8 romance language specialists
    • 100 proof-readers
  • approximately 44,000 words published in their dictionary already
    • published content: 70% of the entire vocabulary
    • print run: 1,350 copies
    • Publisher: consortium of 35 academies from 27 countries on 5 continents
  • Longest remaining words which remain to be compiled into the dictionary
    • non / 37 boxes of ca. 55,500 slips
    • qui, quae, quod / 65 boxes of ca. 96,000 slips
    • sum, esse, fui / 54.5 boxes of ca. 81,750 slips
    • ut / 35 boxes of ca. 52,500 slips

As a point of comparison, the upper end of prolific academic researchers and note takers who use index card collections for their lifelong research (25-40 year careers) have compiled collections of 90,000 (Niklas Luhmann), 70,000+ (Gotthard Deutsch), 30,000 (Hans Blumenberg), 27,000+ (S.D. Goitein) and 12,500 slips (Roland Barthes). This means that there are individual Latin words in the TLL have more slips than these researchers produced in their research lifetimes.

A sample of the note cards being used to compile the TLL. Courtesy of Samuel Beckelhymer.

Living languages

While many think of Latin as a “dead language”, something one notices quickly about the articles in the TLL is that words changed meanings over the span of time which they were in use. Linguists call this change in word meaning over time semantic shift. Many articles focus on these subtle changes and different meanings over time. Often words with only a few hundred attestations in the corpus of the language will be quoted and cited in articles about them with every example of use along with their contexts to help highlight these subtleties. Just like people had the choice of which words to use in the ancient world, we have those same choices today and this is where the use of modern dictionaries and thesauruses can make our words and word choices more exciting.

Normally, a dictionary just tells you what words mean—and of course we do that—but the scale of the project gives us the space and opportunity to say what we’re not sure of too. This is important because it leaves the door open for further scholarship and it gives the reader choices rather than dictating to them what to think. The dictionary can be a catalyst for more research and this is what makes the dictionary a living thing.—⁠⁠Adam Gitner, a TLL scholar

Slip box for the word ‘requiro’ © Adam Gitner
TLL slip archive © Adam Gitner

For those interested in more details on the TLL, Kathleen Coleman’s presentation on YouTube is a fantastic resource and primer on what is in it, how they built it and current work:

TLL Podcast and the Wordhord

Based on the history and usage of the Latin word horreum, which is featured in the first episode of the Thesaurus Linguae Latinae podcast, I can’t help but think that not only is the word ever so apropos for an introduction to some of the TLL, but it does quite make an excellent word for translating the idea of card index in English or Zettelkasten from German into Latin: “My horreum is a storehouse or treasury for my thoughts and ideas which nourishes my desire to discover and build upon my knowledge.” One might also notice that the Latin word horreum is also cognate with the fun Old English word “wordhord” that one encounters in classics like Beowulf and which roughly translates as one’s brain or their memory, especially for words.

Wörterbuch der ägyptischen Sprache (A Dictionary of the Egyptian Language)

Like the Thesaurus Linguae Latinae the Wörterbuch der ägyptischen Sprache was an international collaborative zettelkasten project. Started in 1897, it was finally published as five volumes in 1926.

The structure of the filing system for the Wörterbuch der ägyptischen Sprache (Wb) was designed based on the work done for the Thesaurus Linguae Latinae started three years earlier. Texts in the collection were roughly divided into passages of about 30 words and written in hieroglyphic form on postcard-sized slips of paper. The heading contained the designation of the text and the body included the texts’ context (inscriptions, etc.) as well as a preliminary translation of the passage.

These passages were then cross-referenced with other occurrences of the hieroglyphics to provide better progressive translations which ultimately appeared in the final manuscript. As a result some of the translations on the cards were incomplete as work proceeded and cross-comparisons of individual words were puzzled out.

A slip showing a passage of text from the victory stele of Sesostris III at the Nubian fortress of Semna. The handwriting is that of project leader Adolf Erman, who had “already struggled with the text as a high school student”.

With support from the German Research Foundation, the 1.5 million sheets of the Wörterbuch der ägyptischen Sprache began to be digitized and put online in 1997. The Digitized Card Archive (DZA) of the Dictionary of the Egyptian Language (Wörterbuch der ägyptischen Sprache) has been available on the Internet since 1999. The archive can be searched at: https://aaew.bbaw.de/tla/servlet/DzaIdx. Since 2004, the materials and query functions have been integrated into the larger Thesaurus Linguae Aegyptiae project at https://aaew.bbaw.de/thesaurus-linguae-aegyptiae.

Wörterbuch der ägyptischen Sprache by Adolph Erman and Hermann Grapow can be viewed online using the Wb. browser at https://aaew.bbaw.de/tla/servlet/WbImgBrowser. Links from reference points within the dictionary go directly to corresponding slips of paper in the digitized slip archive.

Although he’s a fictional character, given one could suppose that given his areas of specialization in archaeology, Indiana Jones would certainly have been aware of the Wörterbuch, would likely have used it, and may even have worked on it as a young college student.

The method used for indexing the Wörterbuch der ägyptischen Sprache and the Thesaurus Linguae Latinae is now generally known as a key word in context (KWIC) index. The design of these sorts of indices is now a subject within the realm of computer science and database design. Given that the work on the TLL has taken over 100 years, could it be possible that digital versions might speed up the process of excerpting, collating, and writing articles in the future? Perhaps these examples might be used for compiling other languages in the future.

Modern day practice: Wordnik and Hypothes.is

Having looked at some historical word and idea collecting practices, how might one do this sort of work in a modern, digital world? A similar word collecting scheme is currently happening on the internet now, though perhaps with a bit more focus on interesting neologisms (and hopefully without many insane asylum patients.) The lovely folks at the online dictionary Wordnik have been using the digital annotation tool Hypothes.is to collect examples of words as they happen in the wild. One can create a free account on the Hypothes.is service and quickly and easily begin collecting words for their dictionary efforts by highlighting example sentences and tagging them with “wordnik” and “hw-[InsertFoundWordHere]”.

So for example, I was reading about the clever new animations in the language app Duolingo and came across a curious new word (at least to me): viseme.

To create accurate animations, we generate the speech, run it through our in-house speech recognition and pronunciation models, and get the timing for each word and phoneme (speech sound). Each sound is mapped onto a visual representation, or viseme, in a set we designed based on linguistic features.

So I clicked on my handy browser extension for Hypothes.is, highlighted the sentence with a bit of context, and tagged it with “wordnik” and “hw-viseme”. The “hw-” prefix ostensibly means “head word” which is how lexicographers refer to the words you see defined in dictionaries.

Then the fine folks at Wordnik are able to access the public annotations matching the tag Wordnik, and use Hypothes.is’ API to pull in the collections of new words for inclusion into their ever-growing corpus of examples. Lexicographers can then use examples of words appearing in context to define, study, and research their meanings and their shifts in meaning over time.

Since I’ve collected interesting new words and neologisms for ages anyway, this has been a quick and easy method of helping out other like-minded wordhoarders along the way. (Note how this last sentence has brought wordhord back into more active usage with a tinge of shift?!) In addition to the ability to help out others, a side benefit of the process is that the collected words are all publicly available for reading and using in daily life! You can not only find the public page for Wordnik words on Hypothes.is, but you can subscribe to it via RSS to see all the clever and interesting neologisms appearing in the English language as collected in real time! So if you’re the sort who enjoys touting new words at cocktail parties, a rabid cruciverbalist who refuses to be stumped by this week’s puzzle, or a budding lexicographer yourself, you’ve now got a fantastic new resource! I’ve found it to be far more entertaining and intriguing than any ten other word-of-the-day efforts I’ve seen in published calendar or internet form.

If you like, there’s also a special Hypothes.is group you can apply to join to more easily aid in the effort. Want to know more about Wordnik and their mission, check out their informative Kickstarter page.

Expanding the sixth grade practice

The basic pedagogic exercise I’ve described above is an incredibly solid base for nearly any school-aged child. But with some of the historical context we’ve explored, the weekly word notebook exercise could be expanded. Some could be done during the week while others could be done at a later date/time, which could serve as potential (spaced repetition) reminders to students as they see words throughout the year potentially for bonus points.

What is the earliest attestation (evidence or proof of existence) of a word?

Can students find attestations of their words during their weekly reading or reading later in the year?

What is the word’s etymology? What other words sound like it or are related to it? What words are cognate to it in other languages they might be studying/learning? These could be collected too.

What new and interesting words are students coming across that they haven’t seen before in their own reading? Bonus points for doing additional words they find themselves, or add them to the queue of the words the teacher assigns on future weeks.

Double bonus points for finding new words in their reading that are neologisms which aren’t in the dictionary yet. Can they find and add words to the Wordnik dictionary using Hypothes.is?

Instead of using a notebook for their supplemental wordhord, students might try the older practice of keeping their words on index cards and storing them in a zettelkasten just like the OED, the TLL, or the Wb. A shoebox works nicely and can be fun to decorate, but there are fancier boxes out there. Here they might also be used as flashcards for occasional review. Students can index them alphabetically and perhaps their example sentences may come in handy later in life while they’re doing their own writing (see Draft No. 4 and boxing words.) Perhaps their collections will come in handy at the end of high school when they take the SAT or the ACT tests? Might their collections rival those of famed academics like Niklas Luhmann, Gotthard Deutsch, Hans Blumenberg, S.D. Goitein or Roland Barthes? Maybe they’ll become professional lexicographers and help to finish up work on the TLL later in life?

For a fun math exercise, can students calculate how long it would take them (individually or as a class) to copy out 10,000,000 slips for their words at the pace of two or three words a week? How many notebooks would this require? Would they fit into their classroom? their house, their library, or their school?

What other ideas might one add to such a classroom exercise?

References

Forschung: Der Thesaurus linguae Latinae. Munich, Germany: Bayerische Akademie der Wissenschaften, 2019. https://www.youtube.com/watch?v=C3Eqt2QBKNs.

Kathleen Coleman, “The Thesaurus Linguae Latinae” Paideia Lectures 2022, 2022. https://www.youtube.com/watch?v=s98hTIOW1Ug.

Pinkerton, Byrd. “The Ultimate Latin Dictionary: After 122 Years, Still At Work On The Letter ‘N.’” NPR, May 14, 2016, sec. Parallels. https://www.npr.org/sections/parallels/2016/05/14/476873307/the-ultimate-latin-dictionary-after-122-years-still-at-work-on-the-letter-n.

The Professor and the Madman. 35mm film, Biography, Drama, History. Voltage Pictures, Fábrica de Cine, Definition Films, 2019.

Smith, Chris. “Thesaurus Linguae Latinae: How the World’s Largest Latin Lexicon Is Brought to Life.” De Gruyter Conversations, July 5, 2021. https://blog.degruyter.com/thesaurus-linguae-latinae-how-the-worlds-largest-latin-lexicon-is-brought-to-life/.

Winchester, Simon. The Professor and the Madman: A Tale of Murder, Insanity, and the Making of the Oxford English Dictionary. 1st ed. New York: Harper, 1998.

Replied to a tweet by Jan Knorr (Twitter)
Reclipped is on my radar, but I haven’t experimented with it yet. For YouTube annotation, I quite like https://docdrop.org/ which dovetails w/ @Hypothes_is. For other online video I will often use their page annotations w/ timestamps/media fragments.

Hypothes.is as a Digital Zettelkasten for Neologism and Word Collection for Wordnik

A little while ago, one of the followers of my Hypothes.is account where I actively mark up my reading with highlights, annotations, and notes asked me why I was tagging seemingly random sentences with “wordnik” and other odd tags that started with “hw-“. Today I thought I’d write out the explanation of the habits around one of my side hobbies of word collecting.

Some background

In the book The Professor and the Madman: A Tale of Murder, Insanity, and the Making of the Oxford English Dictionary, Simon Winchester describes the pigeonhole and slip system that professor James Murray used to create the Oxford English Dictionary. The editors essentially put out a call to readers to note down interesting every day words they found in their reading along with examples sentences and references. They then collected these words alphabetically into pigeonholes and from here were able to collectively compile their magisterial dictionary.  Those who are fans of the various methods of knowledge collection and management represented by the index card-based commonplace book or the zettelkasten, will appreciate this scheme as a method of collectively finding and collating knowledge. It’s akin to Paul Otlet and Henri La Fontaine’s work on creating the Mundaneum, but focused  on the niche area of lexicography and historical linguistics.

Book cover of The Professor and the Madman featuring a sepia toned image of a seated professor in a full beard and a mustache holding a book.The Professor and the Madman is broadly the fascinating story of Dr. W. C. Minor, an insane asylum patient, who saw the call to collect words and sentences began a written correspondence with James Murray by sending in over ten thousand slips with words from his personal reading. 

Wordnik and Hypothes.is

A similar word collecting scheme is currently happening on the internet now, though perhaps with a bit more focus on interesting neologisms (and hopefully without me being cast as an insane asylum patient.) The lovely folks at the online dictionary Wordnik have been using the digital annotation tool Hypothes.is to collect examples of words as they happen in the wild. One can create a free account on the Hypothes.is service and quickly and easily begin collecting words for the effort by highlighting example sentences and tagging with “wordnik” and “hw-[InsertFoundWordHere]”.

So for example, this morning I was reading about the clever new animations in the language app Duolingo and came across a curious new word (at least to me): viseme.

To create accurate animations, we generate the speech, run it through our in-house speech recognition and pronunciation models, and get the timing for each word and phoneme (speech sound). Each sound is mapped onto a visual representation, or viseme, in a set we designed based on linguistic features.

So I clicked on my handy browser extension for Hypothes.is, highlighted the sentence with a bit of context, and tagged it with “wordnik” and “hw-viseme”. The “hw-” prefix ostensibly means “head word” which is how lexicographers refer to the words you see defined in dictionaries.

Then the fine folks at Wordnik are able to access the public annotations matching the tag Wordnik, and use Hypothes.is’ API to pull in the collections of new words for inclusion into their ever-growing corpus.

Since I’ve collected interesting new words and neologisms for ages anyway, this has been a quick and easy method of helping out other like minded word collectors along the way. In addition to the ability to help out others, a side benefit of the process is that the collected words are all publicly available for reading and using in daily life! You can not only find the public page for Wordnik words on Hypothes.is, but you can subscribe to it via RSS to see all the clever and interesting neologisms appearing in the English language as collected in real time! So if you’re the sort who enjoys touting new words at cocktail parties, a rabid cruciverbalist who refuses to be stumped by this week’s puzzle, or a budding lexicographer yourself, you’ve now got a fantastic new resource! I’ve found it to be far more entertaining and intriguing than any ten other word-of-the-day efforts I’ve seen in published or internet form.

If you like, there’s also a special Hypothes.is group you can apply to join to more easily aid in the effort. Want to know more about Wordnik and their mission, check out their informative Kickstarter page.

 

A Small 10,000+ Annotations Party

I recently hit the 10,000+ annotations mark on the fantastic Hypothes.is platform, and in celebration, the kind team at Hypothesis sent me a care package with a lovely card, some sticky flags, some stickers, and some chocolate to see me through the next 10,000.

If I’m honest, I get so much value and joy out of annotating online, I should have been the one to send them the care package.

Thanks Hypothes.is! I’ll see you in the margins.

Replied to a post by Ton Zijlstra (zylstra.org)
Is it possible to annotate links in Hypothes.is that are in the Internet Archive? My browser bookmarklet for it doesn’t work on such archived pages. I can imagine that there are several javascript or iframe related technical reasons for it. An information related reason may be that bringing togeth...
The ability to annotate archived material on the Internet Archive with Hypothes.is is definitely possible, and I do it from time to time. I’m not sure which browser or annotation tool (via, browser extensions, other) you’re using, but it’s possible that one or more combinations may have issues allowing you to do it or not. The standard browser extension on Chrome has worked well for me in the past.

Hypothes.is has methods for establishing document equivalency which archive.org apparently conforms. I did an academic experiment a few years back with an NYT article about books where you’ll see equivalent annotations on the original, the archived version, and a copy on my own site that has a rel="canonical" link back to the original as well: 

  • https://www.nytimes.com/2017/01/16/books/obamas-secret-to-surviving-the-white-house-years-books.html
  • https://web.archive.org/web/20170119220705/https://www.nytimes.com/2017/01/16/books/obamas-secret-to-surviving-the-white-house-years-books.html
  • https://boffosocko.com/2017/01/19/obamas-secret-to-surviving-the-white-house-years-books-the-new-york-times/

I don’t recommend doing the rel-canonical trick on your own site frequently as I have noticed a bug, which I don’t think has been fixed.

The careful technologist with one tool or another, will see that I and a couple others have been occasionally delving into the archive and annotating Manfred Kuehn’s work. (I see at least one annotation from 2016, which was probably native on his original site before it was shut down in 2018.) I’ve found some great gems and leads into some historical work from his old site. In particular, he’s got some translations from German texts that I haven’t seen in other places.

How to Make Notes and Write, a handbook by Dan Allosso and S.F. Allosso

A new handbook on note making and writing

I wasn’t expecting it until next week or shortly thereafter, but just in time for the new academic year, Dan Allosso has finished a major rewrite on his and S.F. Allosso’s earlier edition of A Short Handbook for writing essays in the Humanities and Social Sciences. This expanded edition has several new chapters on note making (notice that this is dramatically different than note taking) using a zettelkasten-based (or card index or fichier boîte if you prefer) approach similar to that practiced by Beatrice Webb, Marcel Mauss, Claude Lévi-Strauss, Roland Barthes, Hans Blumenberg, Mortimer J. Adler, and Walter Benjamin among many others.

The focus of the book is on note making for actively producing tangible outputs (essays, papers, theses, monographs, books, etc.), something on which a few recent texts in a the related productivity space haven’t delivered. While ostensibly focused on the humanities and social sciences in terms of examples, the methods broadly apply to all fields. In fact, some of the methods draw historically on some of the practices fruitfully used by Bacon, Newton, Leibnitz, Linnaeus, and many others in the sciences since.

This isn’t your father’s note making system…

While many students (especially undergraduates and graduate students) may eschew this sort of handbook as something they think they “already know”, I can assure you that they do not and will benefit from the advice contained therein, particularly the first half. I’ve often heartily recommended Sönke Ahrens’ book How to Take Smart Notes: One Simple Technique to Boost Writing, Learning and Thinking to many in the past, but I think Allosso’s version, while similar in many respects, is clearer, shorter, and likely more easily realized by new practitioners.

There’s more detail in Dr. Allosso’s announcement video:

Availability

How to Make Notes and Write is available at Minnesota State’s Pressbooks site for reading online, or download as a .pdf or .epub. If you’d like a physical copy, they’re also available for purchase on Amazon.

For those in the educational spaces, Dr. Allosso has given the book a Creative Commons license (CC BY-NC-SA 4.0), so that people can use it as an Open Educational Resource (OER) in their classes and work.

For teachers who are using social annotation with tools like Hypothes.is in their classrooms, Allosso’s book is an excellent resource for what students can actively do with all those annotations once they’ve made them. (Here’s a link to my annotated copy of a recent working draft if you care to “play along”.)

† Unless of course your father happens to be Salvatore Allosso, but even then…

Hypothes.is, a web annotation tool, as an off-label zettelkasten?!

It wasn’t built specifically to be used as one, but is anyone actively using Hypothes.is as an off-label zettelkasten just by itself?

The platform has most of the basic functionalities one would want for a digital ZK:

  1. Simple note taking,
  2. Notes are editable and update-able,
  3. There’s a tag functionality for indexing notes,
  4. One can add links to other notes to cross link them if they liked,
  5. There’s a reasonably good search functionality,
  6. Data export is available if you want to move

The interesting piece is that if many are doing this in public, then folks can reply to others’ notes and even cross-link their public notes. (They’ve got the ability to have public and private notes, as well as groups for collaboration, which are functionalities most ZK don’t have as well.)

I was reminded of this potential off-label use case when someone replied to an older note with a quote/comment and it nudged me to add a note to link the two together. Who’s up for a public, social zettelkasten practice?

For those who need an example to look at, try my “digital ZK”: https://hypothes.is/users/chrisaldrich or their public timeline: https://hypothes.is/search.

I know some have mentioned Hypothes.is as an annotation tool for their note taking before. There are a few who use it as an online platform for notes and then they leverage the platform’s API or feeds to export their data to their tool of choice. (I’ve used Hypothes.idian for Obsidian to do just this.)

Replied to a thread by Bill Seitz & Tom Critchlow (Twitter)
@BillSeitz @UseCrowdWise @peterhagen_

I get @TomCritchlow’s sentiment, but… the extra “work” it currently entails for the social part dramatically ups the signal to noise ratio for me compared to Twitter.

You’d definitely want the ability to filter by your social circle, especially on popular sites. In fact this sort of discovery mechanism would be cool if it could be more broadly built into either the web or perhaps into IndieWeb social readers which would know your social graph and could surface related details.

Perhaps expanding a browser extension like Crowdwise to include Twitter support might be a potential solution? I would worry that portions wouldn’t add much other than a lot of likes and bookmark-like data. While some[1], [2] might consider Twitter as an annotation layer (not always directly linked) on the web, the overall quality isn’t necessarily going to be built in there.

It would be cool if Crowdwise also added Hypothes.is’ API to their list of sources.

I’m also reminded of Peter Hagen’s experiments with Hypothes.is seem very similar but with a different UI. His version flips the discovery question on its head.

Annotated Pivoting with Hypothesis during Covid by Christine MoskellChristine Moskell (Hypothesis | OLC Innovate 2020 | YouTube)

Kicking off OLC Innovate 2020, Hypothesis held two free workshop sessions on collaborative annotation with members of AnnotatED. The sessions each started with a "Getting on the same page" introduction from Jeremy Dean of Hypotheses, followed by "Notes from the field", where a variety of AnnotatED community members talked about what's happening with collaborative annotation at their schools and participants had the chance to discuss ideas and questions with these experienced practitioners.

In this clip Christine Moskell, Instructional Designer at Colgate University, shares examples of how instructors used Hypothesis during the Covid pivot. 

Christine Moskell talks about a professor’s final exam design prompting students to go back to annotations and add new commentary (or links to other related knowledge) that they’ve gained during the length of a course.

This is very similar to the sort of sensemaking and interlinking of information that Sönke Ahrens outlines in his book How to Take Smart Notes though his broader note taking thesis goes a few additional steps for more broadly synthesizing ideas into longer papers, articles, theses, and books.

Dr. Moskell also outlined a similar tactic at the Hypothesis Social Learning Summit: Spotlight on Social Reading & Social Annotation earlier today, though that video may not be accessible for a bit.

How can we better center and model these educational practices in our pedagogies?

Creating a commonplace book or zettelkasten index from Hypothes.is tags

I thought it might be useful to have a relatively complete list of cross linked topical headings in my digital notebook (currently Obsidian) which is a mélange of wiki, zettelkasten, journal, project management tool, notebook, and productivity tool. First, let’s be honest that mélange is too poetic based on what I see of how others use Obsidian and similar tools. My version is structured to have very clear delineations between these forms even though I’m using the same tool for various functions.

I find that indexed subject headings can be useful for creating links between my wiki-like pages as well as links between atomic ideas in my digital zettelkasten. Gradually as one’s zettelkasten becomes larger and one works with it more, it becomes easier to recall individual ideas and cross link them. Until this happens or for smaller zettelkasten it can be useful to cross reference subject headings from one zettel to see what those link to and use those as a way to potential create links to other zettels. This method can also be used as a search/discovery aide for connecting edge ideas in new areas to pre-existing portions of one’s zettelkasten as well. Of course at massive scale with decades of work, I suspect this index will have increased value as well.

I don’t hear people talking about these types of indices for their zettelkasten in any of the influencer spaces or on social media. Are people simply skipping this valuable tool? For those enamored of Niklas Luhmann, we should mention that having and maintaining a subject index was a powerful portion of his system, even if the digitized version of his zettelkasten hasn’t yet been fully digitized. I haven’t seen the whole collection myself, but based on the condition of some of the cards in his index, Luhmann heavily used his subject index. (Note to self: I wonder what his whole system would look like in Obsidian?) Having a general key word/subject heading/topic heading index of all the material in one’s system can be very useful for general search and discovery as well. This is one of the reasons that John Locke wrote about a system for indexing one’s commonplace book in 1685. His work here is likely the distal reason Luhmann had one in his system.

Systems that have graphical knowledge graphs may make this process easier as one can look from one zettel out one or two levels to see where those link to.

Since such a large swath of my note taking practice starts by using Hypothes.is as my tool of choice, I’m able to leverage several years of using it to my benefit. Within it I’ve got 9,314 annotations, highlights, and bookmarks tagged with over 3,326 subject headings as of this writing.

To get all my subject heading tags, I used Jon Udell’s excellent facet tool to go to the tag editing interface. There I entered a “max” number larger than my total number of annotations and left the “tag” field empty to have it return the entire list of my tags. I was then able to edit a few of them to concatenate duplicates, fix misspellings, and remove some spurious tags.

An alternate way of doing this is to use a method described in this GitHub issue which shows how to get the tags out of local storage in your web browser. Your mileage may vary though if you use Hypothes.is in multiple browsers, which I do.

I moved this list from the tag editor into a spreadsheet software to massage the list a bit, clean up any character encodings, and then spit out a list of [[wikilinked]] index keywords. I then cut and pasted it into my notebook and threw in some alphabetical headings so that I could more easily jump around the list.

Now I’ve got an excellent tool and interface for more easily searching and browsing the various areas of my multi-purpose digital notebook.

I’m sure there are other methods within various tools of doing this, including searching all files and cutting and pasting those into a page, though in my case this doesn’t capture non-existing files. One might also try a search for a regex phrase like: /(?:(?:(?:<([^ ]+)(?:.*)>)\[\[(?:<\/\1>))|(?:\[\[))(?:(?:(?:<([^ ]+)(?:.*)>)(.+?)(?:<\/\2>))|(.+?))(?:(?:(?:<([^ ]+)(?:.*)>)\]\](?:<\/\5>))|(?:\]\]))/ (found here) or something as simple as /\[\[.*\]\]/ though in my case they don’t quite return what I really want or need.

I’ll likely keep using more local search and discovery, but perhaps having a centralized store of subject headings will offer some more interesting affordances for search and browsing?

Have you created an index for your system? How did you do it?

Unable to search or find public replies to annotations in public stream

Filed an Issue GitHub - hypothesis/client: The Hypothesis web-based annotation client. (GitHub)
The Hypothesis web-based annotation client. Contribute to hypothesis/client development by creating an account on GitHub.

Replies (with or without tags) to primary/original annotations are unable to subsequently be found in the main public stream or via search at https://hypothes.is/search.

Steps to reproduce

  1. Make a reply to any public annotation (with or without tags)
  2. Use https://hypothes.is/search to search the username of the reply or one of the original tags
  3. The reply can’t be found

The original (more complicated) example that uncovered the issue

From https://doi.org/10.6092/issn.1971-8853/8350 which redirects to https://sociologica.unibo.it/article/view/8350, I can click on the pdf icon to get to https://sociologica.unibo.it/article/view/8350/8272 which I can download locally and then reopen in Chrome to annotate with the Hypothes.is client.

I was able to make an original public annotation: https://hypothes.is/a/Nysv1HyTEeyaC2cnv3ZCPQ

Having subscribed to my public individual user feed, this annotation (via the annotation permalink and not via the original document) was found in Ton Zijlstra‘s RSS reader, and he was able to reply to it: https://hypothes.is/a/p3uUBJc8EeyuRmfRyGEGfQ.

Oddly the URL https://sociologica.unibo.it/article/view/8350/8272 when activated for Hypothes.is doesn’t show any of the annotations though I would suspect that the .pdf fingerprint should match that of the downloaded and annotated version. Alternately visiting https://uni-bielefeld.de/soz/luhmann-archiv/pdf/jschmidt_niklas-luhmanns-card-index_-sociologica_2018_12-1.pdf shows 51 annotations in the Chrome extension, though none of them are visible and the .pdf file doesn’t load on the page which returns a 404. Ton Zijlstra, having none of these URLs would otherwise not have been able to find or reply to annotations I’ve made other than having the original pointer via his RSS feed.

This last part non-withstanding, after making his reply to my annotation (directly at https://hypothes.is/a/Nysv1HyTEeyaC2cnv3ZCPQ), Ton Zijlstra is now no longer able to find his original annotation in the https://hypothes.is/search online interface. It’s as if it’s completely disappeared as the main web search interface is unable to find it via username and/or tags and (likely by design) the main public thread only shows top level annotations and not replies.

I’ve tried some similar experiments on my own replies to annotations. I’m unable to search my own annotations (via https://hypothes.is/users/chrisaldrich) or use either a user-based or tag-based search to find those same annotations after they were made, thus they’re essentially lost to me and others unless I can find the original document and trace my way back to them. These replies are obviously available via feeds (RSS/ATOM) and the API (using the urn:x-pdf:471902ab75f5683c53518d14f95f0dfe key), but they are essentially lost to the vast number of users who won’t have recourse to these methods.

Similarly searching Ton Zijlstra’s user name: https://hypothes.is/users/tonz, one will see no public annotations despite his public reply to a public annotation. The reply can be found at https://hypothes.is/stream.atom?user=tonz and via API calls.

Expected behaviour

After having made a reply to an annotation (with or without tags), one should expect to be able to search their own annotations or specific tags and find those public replies to annotations again.

Whether or not the main web stream (https://hypothes.is/search) filters out replies, they should still be able to be found via subsequent direct search.

Actual behaviour

Searching for one’s previous replies, via user, tag, or otherwise doesn’t find them, though they certainly exist and are findable in feeds and API.

Additional details

Related, possibly helpful for the above

Browser/system information

I’ve tried on other platforms and browsers and platforms with similar results, but I’m using Windows 10 and see the same behavior in both Chrome (Version 98.0.4758.102 (Official Build) (64-bit)) and Firefox (97.0.1 (64-bit)).

I’m excited to join Dan Allosso‘s book club on How to Take Smart Notes as a means of turning my active reading, annotating, and note taking into papers, articles and books using Obsidian.md and Hypothes.is

Details: 

cc: Ian O’Byrne, Remi Kalir