Organizing my research related reading

There’s so much great material out there to read and not nearly enough time. The question becomes: “How to best organize it all, so you can read even more?”

I just came across a tweet from Michael Nielsen about the topic, which is far deeper than even a few tweets could do justice to, so I thought I’d sketch out a few basic ideas about how I’ve been approaching it over the last decade or so. Ideally I’d like to circle back around to this and better document more of the individual aspects or maybe even make a short video, but for now this will hopefully suffice to add to the conversation Michael has started.

Lots of good insights in the responses. One thing stands out: this is a real pain point for many, & I don’t think anyone feels like they’ve nailed it (or how they organize information in general). It’d be great to have more ideas added to the thread! https://t.co/6KfhO5aVU3

— michael_nielsen (@michael_nielsen) March 8, 2018

How do people organize their reading? Perennially frustrated by this. I want one system that lets me trivially add books, papers, webpages, etc, re-organize very easily, search & filter. What works for you?

— michael_nielsen (@michael_nielsen) March 8, 2018

Keep in mind that this is an evolving system which I still haven’t completely perfected (and may never), but to a great extent it works relatively well and I still easily have the ability to modify and improve it.

Overall Structure

The first piece of the overarching puzzle is to have a general structure for finding, collecting, triaging, and then processing all of the data. I’ve essentially built a simple funnel system for collecting all the basic data in the quickest manner possible. With the basics down, I can later skim through various portions to pick out the things I think are the most valuable and move them along to the next step. Ultimately I end up reading the best pieces on which I make copious notes and highlights. I’m still slowly trying to perfect the system for best keeping all this additional data as well.

Since I’ve seen so many apps and websites come and go over the years and lost lots of data to them, I far prefer to use my own personal website for doing a lot of the basic collection, particularly for online material. Toward this end, I use a variety of web services, RSS feeds, and bookmarklets to quickly accumulate the important pieces into my personal website which I use like a modern day commonplace book.

Collecting

In general, I’ve been using the Inoreader feed reader to track a large variety of RSS feeds from various clearinghouse sources (including things like ProQuest custom searches) down to individual researcher’s blogs as a means of quickly pulling in large amounts of research material. It’s one of the more flexible readers out there with a huge number of useful features including the ability to subscribe to OPML files, which many readers don’t support.

As a simple example arXiv.org has an RSS feed for the topic of “information theory” at http://arxiv.org/rss/math.IT which I subscribe to. I can quickly browse through the feed and based on titles and/or abstracts, I can quickly “star” the items I find most interesting within the reader. I have a custom recipe set up for the IFTTT.com service that pulls in all these starred articles and creates new posts for them on my WordPress blog. To these posts I can add a variety of metadata including top level categories and lower level tags in addition to other additional metadata I’m interested in.

I also have similar incoming funnel entry points via many other web services as well. So on platforms like Twitter, I also have similar workflows that allow me to use services like IFTTT.com or Zapier to push the URLs easily to my website. I can quickly “like” a tweet and a background process will suck that tweet and any URLs within it into my system for future processing. This type of workflow extends to a variety of sites where I might consume potential material I want to read and process. (Think academic social services like Mendeley, Academia.com, Diigo, or even less academic ones like Twitter, LinkedIn, etc.) Many of these services often have storage ability and also have simple browser bookmarklets that allow me to add material to them. So with a quick click, it’s saved to the service and then automatically ported into my website almost without friction.

My WordPress-based site uses the Post Kinds Plugin which takes incoming website URLs and does a very solid job of parsing those pages to extract much of the primary metadata I’d like to have without requiring a lot of work. For well structured web pages, it’ll pull in the page title, authors, date published, date updated, synopsis of the page, categories and tags, and other bits of data automatically. All these fields are also editable and searchable. Further, the plugin allows me to configure simple browser bookmarklets so that with a simple click on a web page, I can pull its URL and associated metadata into my website almost instantaneously. I can then add a note or two about what made me interested in the piece and save it for later.

Note here, that I’m usually more interested in saving material for later as quickly as I possibly can. In this part of the process, I’m rarely ever interested in reading anything immediately. I’m most interested in finding it, collecting it for later, and moving on to the next thing. This is also highly useful for things I find during my busy day that I can’t immediately find time for at the moment.

As an example, here’s a book I’ve bookmarked to read simply by clicking “like” on a tweet I cam across late last year. You’ll notice at the bottom of the post, I’ve optionally syndicated copies of the post to other platforms to “spread the wealth” as it were. Perhaps others following me via other means may see it and find it useful as well?

Triaging

At regular intervals during the week I’ll sit down for an hour or two to triage all the papers and material I’ve been sucking into my website. This typically involves reading through lots of abstracts in a bit more detail to better figure out what I want to read now and what I’d like to read at a later date. I can delete out the irrelevant material if I choose, or I can add follow up dates to custom fields for later reminders.

Slowly but surely I’m funneling down a tremendous amount of potential material into a smaller, more manageable amount that I’m truly interested in reading on a more in-depth basis.

Document storage

Calibre with GoodReads sync

Even for things I’ve winnowed down, there is still a relatively large amount of material, much of it I’ll want to save and personally archive. For a lot of this function I rely on the free multi-platform desktop application Calibre. It’s essentially an iTunes-like interface, but it’s built specifically for e-books and other documents.

Within it I maintain a small handful of libraries. One for personal e-books, one for research related textbooks/e-books, and another for journal articles. It has a very solid interface and is extremely flexible in terms of configuration and customization. You can create a large number of custom libraries and create your own searchable and sort-able fields with a huge variety of metadata. It often does a reasonable job of importing e-books, .pdf files, and other digital media and parsing out their meta data which prevents one from needing to do some of that work manually. With some well maintained metadata, one can very quickly search and sort a huge amount of documents as well as quickly prioritize them for action. Additionally, the system does a pretty solid job of converting files from one format to another, so that things like converting an .epub file into a .mobi format for Kindle are automatic.

Calibre stores the physical documents either in local computer storage, or even better, in the cloud using any of a variety of services including Dropbox, OneDrive, etc. so that one can keep one’s documents in the cloud and view them from a variety of locations (home, work, travel, tablet, etc.)

I’ve been a very heavy user of GoodReads.com for years to bookmark and organize my physical and e-book library and anti-libraries. Calibre has an exceptional plugin for GoodReads that syncs data across the two. This (and a few other plugins) are exceptionally good at pulling in missing metadata to minimize the amount that must be done via hand, which can be tedious.

Within Calibre I can manage my physical books, e-books, journal articles, and a huge variety of other document related forms and formats. I can also use it to further triage and order the things I intend to read and order them to the nth degree. My current Calibre libraries have over 10,000 documents in them including over 2,500 textbooks as well as records of most of my 1,000+ physical books. Calibre can also be used to add document data that one would like to ultimately acquire the actual documents, but currently don’t have access to.

BibTeX and reference management

In addition to everything else Calibre also has some well customized pieces for dovetailing all its metadata as a reference management system. It’ll allow one to export data in a variety of formats for document publishing and reference management including BibTex formats amongst many others.

Reading, Annotations, Highlights

Once I’ve winnowed down the material I’m interested in it’s time to start actually reading. I’ll often use Calibre to directly send my documents to my Kindle or other e-reading device, but one can also read them on one’s desktop with a variety of readers, or even from within Calibre itself. With a click or two, I can automatically email documents to my Kindle and Calibre will also auto-format them appropriately before doing so.

Typically I’ll send them to my Kindle which allows me a variety of easy methods for adding highlights and marginalia. Sometimes I’ll read .pdf files via desktop and use Adobe to add highlights and marginalia as well. When I’m done with a .pdf file, I’ll just resave it (with all the additions) back into my Calibre library.

Exporting highlights/marginalia to my website

For Kindle related documents, once I’m finished, I’ll use direct text file export or tools like clippings.io to export my highlights and marginalia for a particular text into simple HTML and import it into my website system along with all my other data. I’ve briefly written about some of this before, though I ought to better document it. All of this then becomes very easily searchable and sort-able for future potential use as well.

Here’s an example of some public notes, highlights, and other marginalia I’ve posted in the past.

Synthesis

Eventually, over time, I’ve built up a huge amount of research related data in my personal online commonplace book that is highly searchable and sortable! I also have the option to make these posts and pages public, private, or even password protected. I can create accounts on my site for collaborators to use and view private material that isn’t publicly available. I can also share posts via social media and use standards like webmention and tools like brid.gy so that comments and interactions with these pieces on platforms like Facebook, Twitter, Google+, and others is imported back to the relevant portions of my site as comments. (I’m doing it with this post, so feel free to try it out yourself by commenting on one of the syndicated copies.)

Now when I’m ready to begin writing something about what I’ve read, I’ve got all the relevant pieces, notes, and metadata in one centralized location on my website. Synthesis becomes much easier. I can even have open drafts of things as I’m reading and begin laying things out there directly if I choose. Because it’s all stored online, it’s imminently available from almost anywhere I can connect to the web. As an example, I used a few portions of this workflow to actually write this post.

Continued work

Naturally, not all of this is static and it continues to improve and evolve over time. In particular, I’m doing continued work on my personal website so that I’m able to own as much of the workflow and data there. Ideally I’d love to have all of the Calibre related piece on my website as well.

Earlier this week I even had conversations about creating new post types on my website related to things that I want to read to potentially better display and document them explicitly. When I can I try to document some of these pieces either here on my own website or on various places on the IndieWeb wiki. In fact, the IndieWeb for Education page might be a good place to start browsing for those interested.

One of the added benefits of having a lot of this data on my own website is that it not only serves as my research/data platform, but it also has the traditional ability to serve as a publishing and distribution platform!

Currently, I’m doing most of my research related work in private or draft form on the back end of my website, so it’s not always publicly available, though I often think I should make more of it public for the value of the aggregation nature it has as well as the benefit it might provide to improving scientific communication. Just think, if you were interested in some of the obscure topics I am and you could have a pre-curated RSS feed of all the things I’ve filtered through piped into your own system… now multiply this across hundreds of thousands of other scientists? Michael Nielsen posts some useful things to his Twitter feed and his website, but what I wouldn’t give to see far more of who and what he’s following, bookmarking, and actually reading? While many might find these minutiae tedious, I guarantee that people in his associated fields would find some serious value in it.

I’ve tried hundreds of other apps and tools over the years, but more often than not, they only cover a small fraction of the necessary moving pieces within a much larger moving apparatus that a working researcher and writer requires. This often means that one is often using dozens of specialized tools upon which there’s a huge duplication of data efforts. It also presumes these tools will be around for more than a few years and allow easy import/export of one’s hard fought for data and time invested in using them.

If you’re aware of something interesting in this space that might be useful, I’m happy to take a look at it. Even if I might not use the service itself, perhaps it’s got a piece of functionality that I can recreate into my own site and workflow somehow?

If you’d like help in building and fleshing out a system similar to the one I’ve outlined above, I’m happy to help do that too.

There’s an XKCD cartoon beloved of geeky nerds and nerdy geeks looking to make things happen automatically. Actually, there are two, Automation and Is It Worth the Time?. Both have exercised me all weekend, and now, after only 13.5 hours, I might be able to save myself literally minutes every day.
Two triggers set me off. One was Chris Aldrich’s post Organizing my research related reading and the other the release of an update to Late Night Software’s Script Debugger. Chris made me realise again that I’m still far too scattershot in my online reading. There are bits and pieces all over the place, and at the very least I ought to be able to bring them back to my own domain. And Script Debugger reminded me that once upon a time, Dearly Beloved, I was able to persuade my computer to do some nifty things, things that currently frustrate me no end.1
One of the bits & pieces I would like to bring home is the passages I have highlighted in things I read in Instapaper.2 It is easy enough to get IFTTT to create a new file in, say, DropBox when you highlight something new, but that file cannot contain any useful HTML tags.3 It is possible that an email from IFTTT could contain tags, but though I tried a few times, the emails never arrived, so the point is moot. Anyway, playing about with the text in the file to create a new post seemed like something I could actually do with Applescript, and so it proved.
There were a couple of big pitfalls along the way. One was getting from IFTTT’s idea of a time to an actual useful timestamp. I’m certain there are easier ways, but it was fun thinking through my own approach and making it work. Another was reading the input file in a format that Applescript wouldn’t completely mess up. That had me tearing my hair out for a while until my IndieWeb chum sknebel reminded me of something that I had seen but not paid sufficient attention to.
A cute problem is that as far as I know, IFTTT creates one file for every highlight. One blog post for every highlight seems awfully silly, so having done the hard work, I added another little script that appends additional highlights to the first post. That has some nice little things I’m proud of too, like keeping the citation at the end of the post and adjusting its wording according to the number of highlights.
All in all, I’m happy with what I’ve achieved. Sure, I could probably have done it in half the code and a quarter of the time using PHP or Python or whatever. But to do that, I’d have to understand those languages much better than I do. And the great thing about Applescript and, especially Script Debugger, is that you can so easily keep an eye on what is happening. I may even try something else in a little while.
For now, there’s still a lot of manual labour, but I know it will be relatively easy to set things up so that when I new file appears I can choose which script to run automatically. I also need to work a little on the tags and styling of the posts, but that’s a task for another day. In the meantime you can take a look at the highlight.scpt and new-highlight.scpt and, if you’re feeling very charitable, show me how I could have done all that in half the code and a quarter of the time.

There’s a meta-aspect to XKCD’s Atuomatin, especially. You spend time learning a programming language in order to do things, especially routine things, more easily and more quickly. Newer languages might be much better at doing those things even more effectively. But it takes an awful long time to learn those new languages, time that could be spent doing things. This is especially true for me because programming is not much more than an occasional hobby. So I stick with the old ways that I know. Applescript is one I used to know. ↩

Triggered by How to set up a robust web reading environment by Chris Bowler. ↩

It occurs to me — now and belatedly — that it could contain structured JSON, just one of the new things I’m not learning properly. ↩

<a href="https://www.jeremycherfas.net/blog/tag:Geeky">Geeky</a>

11 thoughts on “Organizing my research related reading”

Matt Maldre says:

March 8, 2018 at 9:28 am

I was curious where you drafted your blog posts, so I searched your site for “drafts”, and came across this wonderful post. I’m saving this to my Instapaper for further reading. Lots of great workflows here!

(You are also slowly turning me on to Webmention–I still need to install and try it out). I have a feeling you will be revolutionizing how I use my websites.

1. Chris Aldrich says:
  
  March 8, 2018 at 1:32 pm
  
  Matt, it sounds just like the early days of blogging where someone would come up with a feature and folks would iterate on the idea and share code to improve the way the web works. I hope you do join the revolution. If you do go that route, there’s lots of great material (and support) at IndieWeb.org and I’ve documented a lot of my own pieces which are WordPress-centric at IndieWeb Collection. It’s kind of funny that you searched for “drafts” to find a post I made last night in the middle of the night!
  
  A lot of my posts originate right in the traditional WordPress admin UI, but there are several kinds that use various bookmarklets or functionality like that of IFTTT.com or similar services. And then, for the most fun, I’ve got a micropub endpoint on my website that also allows me to use a variety of micropub clients to post to my site. (I know a lot of people using Instagram in conjunction with OwnYourGram to micropub photos to their website as a particular example.) I suspect that in the coming years, micropub will become much more widely adopted and function much the way that early Twitter clients allowed people multiple ways of posting to Twitter. The difference is that micropub clients can be used to post to almost any platform because it’s an open standard.
  
Jeremy Cherfas says: @ stream.jeremycherfas.net

March 9, 2018 at 12:29 am

I love reading about how other people organise their reading and writing, although it seldom impacts my own system (which I hesitate to call a system). Chris Aldrich’s post is no exception, with lots of great ideas about how to find, filter and act on the firehose of stuff that’s out there. I can’t help but wonder whether Chris and I have discussed Zettelkasten methods in the past. https://zettelkasten.de

1. Chris Aldrich says:
  
  March 9, 2018 at 1:54 pm
  
  Jeremy, we haven’t, but we probably should. Looks like my kind of rathole to go down though… 🙂
  
Jeremy Cherfas says: @ jeremycherfas.net

March 11, 2018 at 9:00 am

There’s an XKCD cartoon beloved of geeky nerds and nerdy geeks looking to make things happen automatically. Actually, there are two, Automation and Is It Worth the Time?. Both have exercised me all weekend, and now, after only 13.5 hours, I might be able to save myself literally minutes every day.
Two triggers set me off. One was Chris Aldrich’s post Organizing my research related reading and the other the release of an update to Late Night Software’s Script Debugger. Chris made me realise again that I’m still far too scattershot in my online reading. There are bits and pieces all over the place, and at the very least I ought to be able to bring them back to my own domain. And Script Debugger reminded me that once upon a time, Dearly Beloved, I was able to persuade my computer to do some nifty things, things that currently frustrate me no end.1
One of the bits & pieces I would like to bring home is the passages I have highlighted in things I read in Instapaper.2 It is easy enough to get IFTTT to create a new file in, say, DropBox when you highlight something new, but that file cannot contain any useful HTML tags.3 It is possible that an email from IFTTT could contain tags, but though I tried a few times, the emails never arrived, so the point is moot. Anyway, playing about with the text in the file to create a new post seemed like something I could actually do with Applescript, and so it proved.
There were a couple of big pitfalls along the way. One was getting from IFTTT’s idea of a time to an actual useful timestamp. I’m certain there are easier ways, but it was fun thinking through my own approach and making it work. Another was reading the input file in a format that Applescript wouldn’t completely mess up. That had me tearing my hair out for a while until my IndieWeb chum sknebel reminded me of something that I had seen but not paid sufficient attention to.
A cute problem is that as far as I know, IFTTT creates one file for every highlight. One blog post for every highlight seems awfully silly, so having done the hard work, I added another little script that appends additional highlights to the first post. That has some nice little things I’m proud of too, like keeping the citation at the end of the post and adjusting its wording according to the number of highlights.
All in all, I’m happy with what I’ve achieved. Sure, I could probably have done it in half the code and a quarter of the time using PHP or Python or whatever. But to do that, I’d have to understand those languages much better than I do. And the great thing about Applescript and, especially Script Debugger, is that you can so easily keep an eye on what is happening. I may even try something else in a little while.
For now, there’s still a lot of manual labour, but I know it will be relatively easy to set things up so that when I new file appears I can choose which script to run automatically. I also need to work a little on the tags and styling of the posts, but that’s a task for another day. In the meantime you can take a look at the highlight.scpt and new-highlight.scpt and, if you’re feeling very charitable, show me how I could have done all that in half the code and a quarter of the time.

There’s a meta-aspect to XKCD’s Atuomatin, especially. You spend time learning a programming language in order to do things, especially routine things, more easily and more quickly. Newer languages might be much better at doing those things even more effectively. But it takes an awful long time to learn those new languages, time that could be spent doing things. This is especially true for me because programming is not much more than an occasional hobby. So I stick with the old ways that I know. Applescript is one I used to know. ↩

Triggered by How to set up a robust web reading environment by Chris Bowler. ↩

It occurs to me — now and belatedly — that it could contain structured JSON, just one of the new things I’m not learning properly. ↩

<a href="https://www.jeremycherfas.net/blog/tag:Geeky">Geeky</a>

Mentions

💬 Aaron Davis
💬 Aaron Davis
💬 Chris Aldrich
💬 Chris Aldrich
💬 Aaron Davis

Likes

👍 michael_nielsen