Some ideas about tags, categories, and metadata for online commonplace books and search

Earlier this morning I was reading The Difference Between Good and Bad Tags and the discussion of topics versus objects got me thinking about semantics on my website in general.

People often ask why WordPress has both a Category and a Tag functionality, and to some extent it would seem to be just for this thing–differentiating between topics and objects–or at least it’s how I have used it and perceived others doing so as well. (Incidentally from a functionality perspective categories in the WordPress taxonomy also have a hierarchy while tags do not.) I find that I don’t always do a great job at differentiating between them nor do I do so cleanly every time. Typically it’s more apparent when I go searching for something and have a difficult time in finding it as a result. Usually the problem is getting back too many results instead of a smaller desired subset. In some sense I also look at categories as things which might be more interesting for others to subscribe to or follow via RSS from my site, though I also have RSS feeds for tags as well as for post types/kinds as well.

I also find that I have a subtle differentiation using singular versus plural tags which I think I’m generally using to differentiate between the idea of “mine” versus “others”. Thus the (singular) tag for “commonplace book” should be a reference to my particular commonplace book versus the (plural) tag “commonplace books” which I use to reference either the generic idea or the specific commonplace books of others. Sadly I don’t think I apply this “rule” consistently either, but hope to do so in the future.

I’ve also been playing around with some more technical tags like math.NT (standing for number theory), following the lead of arXiv.org. While I would generally have used a tag “number theory”, I’ve been toying around with the idea of using the math.XX format for more technical related research on my site and the more human readable “number theory” for the more generic popular press related material. I still have some more playing around with the idea to see what shakes out. I’ve noticed in passing that Terence Tao uses these same designations on his site, but he does them at the category level rather than the tag level.

Now that I’m several years into such a system, I should probably spend some time going back and broadening out the topic categories (I arbitrarily attempt to keep the list small–in part for public display/vanity reasons, but it’s relatively easy to limit what shows to the public in my category list view.) Then I ought to do a bit of clean up within the tags themselves which have gotten unwieldy and often have spelling mistakes which cause searches to potentially fail. I also find that some of my auto-tagging processes by importing tags from the original sources’ pages could be cleaned up as well, though those are generally stored in a different location on my website, so it’s not as big a deal to me.

Naturally I find myself also thinking about the ontogeny/phylogeny problems of how I do these things versus how others at large do them as well, so feel free to chime in with your ideas, especially if you take tags/categories for your commonplace book/website seriously. I’d like to ultimately circle back around on this with regard to the more generic tagging done from a web-standards perspective within the IndieWeb and Microformats communities. I notice almost immediately that the “tag” and “category” pages on the IndieWeb wiki redirect to the same page yet there are various microformats including u-tag-of and u-category which are related but have slightly different meanings on first blush. (There is in fact an example on the IndieWeb “tag” page which includes both of these classes neither of which seems to be counter-documented at the Microformats site.) I should also dig around to see what Kevin Marks or the crew at Technorati must surely have written a decade or more ago on the topic.


cc: Greg McVerry, Aaron Davis, Ian O’Byrne, Kathleen Fitzpatrick, Jeremy Cherfas

Syndicated copies to:

👓 The Difference Between Good and Bad Tags | Zettelkasten Method

Read The Difference Between Good and Bad Tags by Sascha (Zettelkasten Method)

There are two different types of tags:

  1. Tags for topics. You use tags to group notes under a topic.
  2. Tags for objects. You use tags to group notes around an object, real or conceptual.

This is an interesting concept to think more deeply about with respect to my online commonplace book and future search.

Syndicated copies to:

👓 Evernote lost its CTO, CFO, CPO and HR head in the last month as it eyes another fundraise | TechCrunch

Read Evernote lost its CTO, CFO, CPO and HR head in the last month as it eyes another fundraise (TechCrunch)
Evernote, the productivity app with 225 million users that lets people take notes and organise other files from their working and non-work life, has been on a mission to reset its image as the go-to service for those seeking tools to help themselves be more efficient, years after losing its place a…
Syndicated copies to:

👓 Why Not Blog? | Kathleen Fitzpatrick

Read Why Not Blog? by Kathleen FitzpatrickKathleen Fitzpatrick (Kathleen Fitzpatrick)
My friend Alan Jacobs, a key inspiration in my return (such as it is, so far) to blogging and RSS and a generally pre-Twitter/Facebook outlook on the scholarly internet, is pondering the relationship between blogging and other forms of academic writing in thinking about his next project. Perhaps needless to say, this is something I’m considering as well, and I’m right there with him in most regards.

Highlights, Quotes, Annotations, & Marginalia

The blog was not just the venue in which I started putting together the ideas that became my second book, the one that made promotion and various subsequent jobs possible, but it was also the way that I was able to demonstrate that there might be a readership for that second book, without which it’s much less likely that a press would have been interested.  

This sounds like she’s used her blog as both a commonplace book as well as an author platform.

In fact blog posts are not the kind of thing one can detail on one’s annual review form, and even a blog in the aggregate doesn’t have a place in which it’s easy to be claimed as a site of ongoing scholarly productivity.  

Mine have gone more like (1) having some vague annoying idea with a small i; (b) writing multiple blog posts thinking about things related to that idea; (iii) giving a talk somewhere fulminating about some other thing entirely; (4) wondering if maybe there are connections among those things; (e) holy carp, if I lay the things I’ve been noodling about over the last year and a half out in this fashion, it could be argued that I am in the middle of writing a book!  

Here’s another person talking about blogs as “thought spaces” the same way that old school bloggers like Dave Winer and Om Malik amongst many others have in the past. While I’m thinking about it I believe that Colin Walker and Colin Devroe have used this sort of idea as well.

Syndicated copies to:

👓 Retroactive Webmentioning | Peter Rukavina

Read Retroactive Webmentioning by Peter RukavinaPeter Rukavina (ruk.ca)
By way of testing out my Webmention module for Drupal, I took the 256 posts I’ve written here this year, ferreted out all the external links, discovered their Webmention endpoints, and sent a Webmention. Those 256 posts contained 840 links in total; of those links, 149 were to a target that suppor...

There are some interesting/useful statistics here. There’s also an interesting kernel of an idea about how one links to one’s own website internally as well. I find this very intriguing with respect to owning a digital commonplace book. Perhaps there are some ways to modify IndieMap for extracting some useful metadata out of one’s own website?

Syndicated copies to:

👓 My College Degree as an Open Digital Humanities Project | Mark Corbett Wilson

Read My College Degree as an Open Digital Humanities Project by Mark Corbett Wilson (markcorbettwilson.com)
I’m developing a new model for adult learners so they can avoid the experience I had while trying to improve my skills at a Community College. Combining Self-Directed Learning, Computational Thinking, Digital Pedagogy, Open Education and Open Social Scholarship theories with Open Education Resourc...

This sounds to me to be a bit like an open digital commonplace book.

(I’m noticing, yet again, that Disqus is automatically marking any comments I make as spam.)

Syndicated copies to:

An Outline for Using Hypothesis for Owning your Annotations and Highlights

I was taken with Ian O’Byrne’s righteous excitement in his video the other day over the realization that he could potentially own his online annotations using Hypothesis, that I thought I’d take a moment to outline a few methods I’ve used.

There are certainly variations of ways for attempting to own one’s own annotations using Hypothesis and syndicating them to one’s website (via a PESOS workflow), but I thought I’d outline the quickest version I’m aware of that requires little to no programming or code, but also allows some relatively pretty results. While some of the portions below are WordPress specific, there’s certainly no reason they couldn’t be implemented for other systems.

Saving individual annotations one at a time

Here’s an easy method for taking each individual annotation you create on Hypothesis and quickly porting it to your site:

Create an IFTTT.com recipe to port your Hypothesis RSS feed into WordPress posts. Generally chose an “If RSS, then WordPress” setup and use the following data to build the recipe:

  • Input feed: https://hypothes.is/stream.atom?user=username (change username to your user name)
  • Optional title: 📑 {{EntryTitle}}
  • Body: {{EntryContent}} from {{EntryUrl}} <br />{{EntryPublished}}
  • Categories: Highlight (use whatever categories you prefer, but be aware they’ll apply to all your future posts from this feed)
  • Tags: hypothes.is
  • Post status (optional): I set mine to “Draft” so I have the option to keep it privately or to publish it publicly at a later date.

Modify any of the above fields as necessary for your needs. IFTTT.com usually polls your feed every 10-15 minutes. You can usually pretty quickly take this data and turn it into your post kind of preference–suggestions include read, bookmark, like, favorite, or even reply. Add additional categories, tags, or other metadata as necessary for easier searching at a later time.

Here’s an example of one on my website that uses this method. I’ve obviously created a custom highlight post kind of my own for more specific presentation as well as microformats markup.

A highlight from Hypothesis posted on my own website using some customized code to create a “Highlight post” using the Post Kinds Plugin.

Aggregating lots of annotations on a single page

If you do a lot of annotations on Hypothesis and prefer to create a bookmark or read post that aggregates all of your annotations on a given post, the quickest way I’ve seen on WordPress to export your data is to use the Hypothesis Aggregator plugin [GitHub].

  • Create a tag “key” for a particular article by creating an acronym from the article title followed by the date and then the author’s initials. This will allow you to quickly conglomerate all the annotations for a particular article or web page. As an example for this article I’d use: OUHOAH062218CA. In addition to any other necessary tags, I’ll tag each of my annotations on the particular article with this somewhat random, yet specific key for which there are unlikely to be any other similar tags in my account.
  • Create a bookmark, read, reply or other post kind to which you’ll attach your annotations. I often use a bookmarklet for speed here.
  • Use the Hypothesis Aggregator’s short code for your tag and username to pull your annotations for the particular tag. It will look like this:
    [hypothesis user = 'username' tags = 'tagname']

    If you’re clever, you could include this shortcode in the body of your IFTTT recipe (if you’re using drafts) and simply change the tag name to the appropriate one to save half a step or need to remember the shortcode format each time.

If you’re worried that Hypothes.is may eventually shut down, the plugin quits working (leaving you with ugly short codes in your post) or all of the above, you can add the following steps as a quick work-around.

  • Input the shortcode as above, click on the “Preview” button in WordPress’s Publish meta box which will open a new window and let you view your post.
  • Copy the preview of the annotations you’d like to keep in your post and paste them over your shortcode in the Visual editor tab on your draft post. (This will maintain the simple HTML formatting tags, which you can also edit or supplement if you like.)
  • I also strip out the additional unnecessary data from Hypothesis Aggregator about the article it’s from as well as the line about who created the annotation which isn’t necessary as my post will implicitly have that data. Depending on how you make your post (i.e. not using the Post Kinds Plugin), you may want to keep it.

As Greg McVerry kindly points out, Jon Udell has created a simple web-tool for inputting a few bits of data about a set of annotations to export them variously in HTML, CSV, or JSON format. If you’re not a developer and don’t want to fuss with Hypothesis’ API, this is also a reasonably solid method of quickly exporting subsections of your annotations and cutting and pasting them onto your website. It does export a lot more data that one might want for their site and could require some additional clean up, particularly in HTML format.

Perhaps with some elbow grease and coding skill, sometime in the future, we’ll have a simple way to implement a POSSE workflow that will allow you to post your annotations to your own website and syndicate them to services like Hypothesis. In the erstwhile, hopefully this will help close a little of the data gap for those using their websites as their commonplace books or digital notebooks.

Syndicated copies to:

📺 Open science: Michael Nielsen at TEDxWaterloo | YouTube

Watched Open science: Michael Nielsen at TEDxWaterloo by Michael NielsenMichael Nielsen from YouTube

Michael Nielsen is one of the pioneers of quantum computation. Together with Ike Chuang of MIT, he wrote the standard text in the field, a text which is now one of the twenty most highly cited physics books of all time. He is the author of more than fifty scientific papers, including invited contributions to Nature and Scientific American. His research contributions include involvement in one of the first quantum teleportation experiments, named as one of Science Magazine's Top Ten Breakthroughs of the Year for 1998. Michael was a Fulbright Scholar at the University of New Mexico, and has worked at Los Alamos National Laboratory, as the Richard Chace Tolman Prize Fellow at Caltech, as Foundation Professor of Quantum Information Science at the University of Queensland, and as a Senior Faculty Member at the Perimeter Institute for Theoretical Physics. Michael left academia to write a book about open science, and the radical change that online tools are causing in the way scientific discoveries are made.

Sadly this area of science hasn’t opened up as much as it likely should have in the intervening years. More scientists need to be a growing part of the IndieWeb movement and owning their own data, their content, and, yes, even their own publishing platforms. With even simple content management systems like WordPress researchers can actively practice academic samizdat to a much greater extent and take a lot of the centralized power away from the major journal and textbook publishing enterprises.

I can easily see open web technology like the Webmention spec opening up online scientific communication and citations drastically even to the point of quickly replacing tools like Altmetric. If major publishing wants something to do perhaps they could work on the archiving and aggregation portions?

What if one could publish a research paper or journal article on one’s own (or one’s lab’s) website? It could receive data via webmention about others who are bookmarking it, reading it, highlighting and annotating it. It could also accept webmention replies as part of a greater peer-review process–the equivalent of the researcher hosting their own pre-print server as well as their own personal journal and open lab notebook.

We need to help empower scientists to be the center of their own writing and publishing. For those interested, this might be a useful starting point: https://indieweb.org/Indieweb_for_Education

 

 

Syndicated copies to:

Reply to Open Science notebooks | Ryan Barrett

Replied to a post by Ryan BarrettRyan Barrett (snarfed.org)
Notebooks like Jupyter and Observable are great for research, data science, and really any interactive computing or documentation. I want to start using them for ops/SRE projects too. Thomas Kluyver‘s bash_kernel works, but has lots of rough edges. Anyone have any other ideas?

I’ve been watching that space for a few years. Apparently you saw the same article push them into the broader mainstream consciousness. I would mention Mathematica, but you’re certainly aware of it. There are a few other math-related platforms I’ve used, but I suspect they’re not within the realm you’re looking for.

I’ve seen one or two much smaller projects along the lines of bash_kernel, but they’re either in incredibly rough shape or have very limited scopes or very niche uses. There’s a reasonably interesting list of open science related resources on GitHub, but it’s a tad old and some of the projects on it have merged or changed drastically since it was started. Foster has some interesting material and resources on open science if you care to dig through it. One day I’ll delve into the Open Science Framework to see if they’ve got anything I haven’t seen before too.

I keep meaning to document people who are using their own websites for pieces of this type of thing , but most are doing it in a hybrid fashion. Carl Boettiger is certainly a good example[1][2] and may be aware of some additional resources including one he helps manage.

Syndicated copies to:

Reply to a reply to Dan Cohen tweet

Replied to Reply to Dan Cohen tweet by Chris AldrichChris Aldrich (BoffoSocko)
Dan, There are a lot of moving pieces in your question and a variety of ways to implement them depending on your needs and particular website set up. Fortunately there are lots of educators playing around in these spaces already who are experimenting with various means and methods as well as some of their short and long term implications.

@jbj Given the number of people I’ve seen experimenting over the past months, I’d be happy to put together a series of short pieces for @ProfHacker covering the areas of overlap of between #edtech, #DoOO, #indieweb, research, academic publishing, samizdat, commonplace books, etc. Essentially tighter versions of some of https://boffosocko.com/research/indieweb/ but specifically targeting the education space using WordPress, Known, and Grav. Let me know if you’d accept submissions for the community.

Syndicated copies to:

reply to tkasasagi tweet

Replied to a tweet by tkasasagitkasasagi (Twitter)

I hope you do blog about it, I’m sure many would find it useful. I’ve been using my own website as a commonplace book for a while now, not only for blogging as you’ve considered, but also to bookmark interesting things, to highlight and make notes of what I read, and generally use it as my online notebook/research/study space. I do post some personal tidbits, but a large amount of what I post (both research and personal) is actually private and only viewable by me. Perhaps worth considering as you continue your studies which others have interest in as well?

Syndicated copies to:

👓 Self-platforming, DoOO, and academic workflows | Tim Clarke

Read Self-platforming, DoOO, and academic workflows by Tim ClarkeTim Clarke (simulacrumbly.com)
I see self-platforming as an expression of my own digital citizenship, and I also see it as my deliberate answer to the call for digital sanctuary.  The frequency and extent to which educators urge students onto extractive applications is of great concern.  Self-platforming offers opportunities to benefit from the collaborative, hyper-textual, asynchronous, and distributed qualities of the web, while diminishing the costs — often hidden to us — of working on proprietary and extractive platforms.

I love that Tim is looking closely at how the choices of tools he’s using can potentially impact his students/readers. I’ve also been in the boat he’s in–trying to wrangle some simple data in a way that makes it easy to collect, read, and disseminate content for myself, students, and other audiences.

Needing to rely on five or more outside services (Twitter, Instapaper, Pinboard, bit.ly, and finally even Canvas, where some of them are paid services) seems just painful and excessive. He mentions the amount and level of detail he’s potentially giving away to just bit.ly, but each of these are all taking a bite out of the process. Of course this doesn’t take into consideration the fact that Instapaper is actually a subsidiary of Betaworks, the company that owns and controls bit.ly, so there’s even more personal detail being consumed and aggregated there than he may be aware. All this is compounded by the fact that Instapaper is currently completely blocking its users within the EU because it hasn’t been able to comply with the privacy and personal data details/restrictions of the GDPR. Naturally, there’s currently no restrictions on it in the U.S. or other parts of the world.

I (and many others) have been hacking away for the past several years in trying to tame much of our personal data in a better way to own it and control it for ourselves. And isn’t this part of the point of having a domain of one’s own? Even his solution of using Shaarli to self-host his own bookmarks, while interesting, seems painful to me in some aspects. Though he owns and controls the data, because it sits on a separate domain it’s not as tightly integrated into his primary site or as easily searched. To be even more useful, it needs additional coding and integration into his primary site which appears to run on WordPress. With the givens, it looks more like he’s spending some additional time running his own separate free-standing social media silo just for bookmarks. Why not have it as part of his primary personal hub online?

I’ve been watching a growing trend of folks both within the IndieWeb/DoOO and edtech spaces begin using their websites like a commonplace book to host a growing majority of their own online and social related data. This makes it all easier to find, reference, consume, and even create new content in the future. On their own sites, they’re conglomerating all their data about what they’re reading, highlighting, annotating, bookmarking, liking, favoriting, and watching in addition to their notes and thoughts. When appropriate, they’re sharing that content publicly (more than half my website is hidden privately on my back end, but still searchable and useful only to me) or even syndicating it out to social sites like Twitter, Facebook, Flickr, Instapaper, et al. to share it within other networks.

Some other examples of educators and researchers doing this other than myself include Aaron Davis, Greg McVerry, John Johnson, and more recently W. Ian O’Byrne and Cathie LeBlanc among many others. Some have chosen to do it on their primary site while others are experimenting using two or even more. I would hope that as Tim explores, he continues to document his process as well as the pros and cons of what he does and the resultant effects. But I also hopes he discovers this growing community of scholars, teachers, programmers and experimenters who have been playing in the same space so that he knows he’s not alone and perhaps to prevent himself from going down some rabbit holes some of us have explored all too well. Or to use what may be a familiar bit of lingo to him, I hope he joins our impromptu, but growing personal learning network (PLN).

Syndicated copies to:

👓 Why I Love Link Blogging | BirchTree

Read Why I Love Link Blogging (BirchTree)
More often than not, I write articles for this site after reading something someone else wrote. I browse the web for articles and tweets that I find interesting, and the ones that make me think are very often the ones that inspire me to write something myself. This leads to a funny situation as a w...

How many levels deep could the link blogging on these posts go? Is it linkblogging all the way down?

For me, I’ll add it specifically to my linkblog of things I’ve read which is a subsection of my collected linkblog which also collects favorites, likes, bookmarks, and sites I’m following.

Incidentally, this seems to be another post about people who use their websites for thinking and writing, which I seem to be coming across many of lately. I ought to collect them all into a group and write a piece about them and the general phenomenon.

Syndicated copies to:

IndieWeb Journalism in the Wild

Some tidbits I really appreciate about John Naughton's website

I noticed a few days ago that professor and writer John Naughton not only has his own website but that he’s posting both his own content to it as well as (excerpted) content he’s writing for other journalistic outlets, lately in his case for The Guardian. This is awesome for so many reasons. The primary reason is that I can follow him via his own site and get not only his personally posted content, which informs his longer pieces, but I don’t need to follow him in multiple locations to get the “firehose” of everything he’s writing and thinking about. While The Guardian and The Observer are great, perhaps I don’t want to filter through multiple hundreds of articles to find his particular content or potentially risk missing it?  What if he was writing for 5 or more other outlets? Then I’d need to delve in deeper still and carry a multitude of subscriptions and their attendant notifications to get something that should rightly emanate from one location–him! While he may not be posting his status updates or Tweets to his own website first–as I do–I’m at least able to get the best and richest of his content in one place. Additionally, the way he’s got things set up, The Guardian and others are still getting the clicks (for advertising sake) while I still get the simple notifications I’d like to have so I’m not missing what he writes.

His site certainly provides an interesting example of either POSSE or PESOS in the wild, particularly from an IndieWeb for Journalism or even an IndieWeb for Education perspective. I suspect his article posts occur on the particular outlet first and he’s excerpting them with a link to that “original”. (Example: A post on his site with a link to a copy on The Guardian.) I’m not sure whether he’s (ideally) physically archiving the full post there on his site (and hiding it privately as both a personal and professional portfolio of sorts) or if they’re all there on the respective pages, but just hidden behind the “read more” button he’s providing. I will note that his WordPress install is giving a rel=”canonical link to itself rather than the version at The Guardian, which also has a rel=”canonical” link on it. I’m curious to take a look at how Google indexes and ranks the two pages as a result.

In any case, this is a generally brilliant set up for any researcher, professor, journalist, or other stripe of writer for providing online content, particularly when they may be writing for a multitude of outlets.

I’ll also note that I appreciate the ways in which it seems he’s using his website almost as a commonplace book. This provides further depth into his ideas and thoughts to see what sources are informing and underlying his other writing.

Alas, if only the rest of the world used the web this way…

Syndicated copies to: