Notes, Highlights, and Marginalia: From E-books to Online

For several years now, I’ve been meaning to do something more interesting with the notes, highlights, and marginalia from the various books I read. In particular, I’ve specifically been meaning to do it for the non-fiction I read for research, and even more so for e-books, which tend to have slightly more extract-able notes given their electronic nature. This fits in to the way in which I use this site as a commonplace book as well as the IndieWeb philosophy to own all of one’s own data.[1]

Over the past month or so, I’ve been experimenting with some fiction to see what works and what doesn’t in terms of a workflow for status updates around reading books, writing book reviews, and then extracting and depositing notes, highlights, and marginalia online. I’ve now got a relatively quick and painless workflow for exporting the book related data from my Amazon Kindle and importing it into the site with some modest markup and CSS for display. I’m sure the workflow will continue to evolve (and further automate) somewhat over the coming months, but I’m reasonably happy with where things stand.

The fact that the Amazon Kindle allows for relatively easy highlighting and annotation in e-books is excellent, but having the ability to sync to a laptop and do a one click export of all of that data, is incredibly helpful. Adding some simple CSS to the pre-formatted output gives me a reasonable base upon which to build for future writing/thinking about the material. In experimenting, I’m also coming to realize that simply owning the data isn’t enough, but now I’m driven to help make that data more directly useful to me and potentially to others.

As part of my experimenting, I’ve just uploaded some notes, highlights, and annotations for David Christian’s excellent text Maps of Time: An Introduction to Big History[2] which I read back in 2011/12. While I’ve read several of the references which I marked up in that text, I’ll have to continue evolving a workflow for doing all the related follow up (and further thinking and writing) on the reading I’ve done in the past.

I’m still reminded me of Rick Kurtzman’s sage advice to me when I was a young pisher at CAA in 1999: “If you read a script and don’t tell anyone about it, you shouldn’t have wasted the time having read it in the first place.” His point was that if you don’t try to pass along the knowledge you found by reading, you may as well give up. Even if the thing was terrible, at least say that as a minimum. In a digitally connected era, we no longer need to rely on nearly illegible scrawl in the margins to pollinate the world at a snail’s pace.[4] Take those notes, marginalia, highlights, and meta data and release it into the world. The fact that this dovetails perfectly with Cesar Hidalgo’s thesis in Why Information Grows: The Evolution of Order, from Atoms to Economies,[3] furthers my belief in having a better process for what I’m attempting here.

Hopefully in the coming months, I’ll be able to add similar data to several other books I’ve read and reviewed here on the site.

If anyone has any thoughts, tips, tricks for creating/automating this type of workflow/presentation, I’d love to hear them in the comments!

Footnotes

[1]

“Own your data,” IndieWeb. [Online]. Available: http://indieweb.org/own_your_data. [Accessed: 24-Oct-2016]

[2]

D. Christian and W. McNeill H., Maps of Time: An Introduction to Big History, 2nd ed. University of California Press, 2011.

[3]

C. Hidalgo, Why Information Grows: The Evolution of Order, from Atoms to Economies, 1st ed. Basic Books, 2015.

[4]

O. Gingerich, The Book Nobody Read: Chasing the Revolutions of Nicolaus Copernicus. Bloomsbury Publishing USA, 2004.

Published by

Chris Aldrich

I'm a biomedical and electrical engineer with interests in information theory, complexity, evolution, genetics, signal processing, IndieWeb, theoretical mathematics, and big history. I'm also a talent manager-producer-publisher in the entertainment industry with expertise in representation, distribution, finance, production, content delivery, and new media. View all posts by Chris Aldrich

15 thoughts on “Notes, Highlights, and Marginalia: From E-books to Online”

Sawyer Hollenshead says:

October 24, 2016 at 12:26 pm

Shameless plug: I wrote a bit about how I’m exporting and sharing my highlights: https://medium.com/@sawyerh/how-i-m-exporting-my-highlights-from-the-grasps-of-ibooks-and-kindle-ce6a6031b298

It’s a bit over-engineered, but I do plan on making a more user-friendly way of doing this on your own in the future, in case you’re interested.

—via Medium.com

Reply
1. Chris Aldrich says:
  
  October 25, 2016 at 12:24 pm
  Yes, surely there has to be an easier way to do all of this, and surely more than two of us see the problem.
  
  One thing that I can think of may be to potentially integrate something like the new micropub spec from the W3C into a more far-reaching solution: You might potentially take the email portion and instead of the other triggers, have Lambda create a micropub push to your site. With just a small amount of code you could add a micropub endpoint to your site and authorize Lambda to publish to it.
  
  The potential benefit of doing this is that you could create an inexpensive service to do this for multiple platforms. I believe there are micropub plugins written for several CMSs (including notably WordPress) which then allows a broader “readership” (pun intended). If a user can send an email to a service, install a plugin and authenticate, it doesn’t get too much easier. Certainly more interesting than https://www.clippings.io/, which seems to be one of the very few apps left in this area.
  
  As an aside, I know there’s a fairly big contingent in the IndieWeb community who are interested in static sites and Siteleaf sounds like an excellent CMS for that kind of use.
  
  IndieWeb also has some interesting collections of post-types, UI/UX, information, and code relating to some of these highlighting, notes/annotation types for web. There’s certainly a better bridge between books (especially ebooks) and the web:
  Syndicated copies:
  Reply
2. Chris Aldrich says:
  
  September 20, 2018 at 3:00 pm
  I’m curious if anything simpler or more modular ultimately evolved out of this?
  
  Syndicated copies:
  Reply
Chris Aldrich says:

October 24, 2016 at 1:13 pm

I’ve uploaded my notes, highlights, & annotations of “Maps of Time” by @davidgchristian http://boffosocko.com/2012/06/17/big-history #BigHistory #mustread

Reply
Jose Afonso Furtado says:

October 24, 2016 at 3:06 pm

“Notes, Highlights, and Marginalia: From E-books to Online” by Chris Aldrich @ChrisAldrich /Medium medium.com/boffo-socko/no…

Reply
informant says:

October 25, 2016 at 12:57 am

Notes, Highlights, and Marginalia: From E-books to Online by @ChrisAldrich on @Medium: medium.com/boffo-socko/no…

Reply
Nate Hoffelder says:

October 25, 2016 at 12:36 pm

Mentioned this article in Morning Coffee – 25 October 2016.

Reply
informant says:

November 2, 2016 at 8:29 am

Notes, Highlights, and Marginalia: From E-books to Online by @ChrisAldrich on @Medium: medium.com/boffo-socko/no…

Reply
Chris Aldrich says:

December 31, 2016 at 5:26 am

This Article was mentioned on PressForward as a WordPress RSS Feed Reader & Pocket/Instapaper Replacement | boffosocko.com

Reply
Chris Aldrich says:

December 31, 2016 at 1:26 pm
As many know, for the past 6 months or so, I’ve been slowly improving some of the IndieWeb tools and workflow I use to own what I’m reading both online and in physical print as well as status updates indicating those things. [1][2][3]
Since just before IndieWebCamp LA, I’ve been working on better ways to own the articles I’ve been reading and syndicate/share them out to other social platforms. The concept initially started out as a simple linkblog idea and has continually been growing, particularly with influence from my attendance of the Dodging the Memory Hole 2016: Saving Online News conference at UCLA in October. Around that same time, it was announced that Pinterest was purchasing Instapaper and they were shutting down some of Instapaper’s development and functionality. I’ve been primarily using Pocket for several years now and have desperately wanted to bring that functionality into my own site. I had also been looking at the self-hostable Wallabag alternative which is under heavy active development, but since most of my site is built on WordPress, I really preferred having a solution that integrated better into that as a workflow.
Enter PressForward
I’ve been looking closely at PressForward for the past week and change as a self-contained replacement for third party services like Pocket and Instapaper. I’ve been looking around for this type of self-hosted functionality for a while.
PressForward was originally intended for journalists and news organizations to aggregate new content, add it to their newsroom workflow, and then use it to publish new content. From what I can see it’s also got a nice following in academia as a tool for aggregating content for researchers focused on a particular area.
It only took a minute or two of looking at PressForward to realize that it had another off-label use case: as a spectacular replacement for read-later type apps!
In an IndieWeb fashion, this fantastic WordPress plugin allows me to easily own private bookmarks of things I’d like to read (PressForward calles these “Nominations” in keeping with its original use case). I can then later read them on my own website (with Mercury f.k.a Readability functionality built in), add commentary, and publish them as a read post. [Note: To my knowledge the creators of PressForward are unaware of the IndieWeb concept or philosophies.]
After some playing around for a bit and contemplating several variations, configurations, and options, I thought I’d share some thoughts about it for others considering using it in such an off-label manner. Hopefully these may also spur the developers to open up their initial concept to a broader audience as it seems very well designed and logically laid out.
Examples
The developers obviously know the value of dogfooding as at least two of them are using it in a Pocket-like fashion (as they many not have other direct use-cases).

Aram Zucker-Scharff
James Digioia

Pros
PressForward includes a beautiful, full built-in RSS Feed Reader!
This feature alone is enough to recommend using it even without any other feature. I’ve tried Orbit Reader and WhisperFollow (among others) which are both interesting in their own rights but are somewhat limited and have relatively clunky interfaces. The best part of WhisperFollow’s premise is that it has webactions built in, but I suspect these could easily be added onto PressForward.
In fact, not just hours before I’d discovered PressFoward, I’d made this comment on the WordPress Reader Refresh post announcing the refresh of WordPress.com’s own (separate) reader:

Some nice visual changes in this iteration. Makes it one of the most visually pretty feed readers out there now while still maintaining a relatively light weight.
I still wish there were more functionality pieces built into it like the indie-reader Woodwind.xyz or even Feedly. While WordPress in some sense is more creator oriented than consumption oriented, I still think that not having a more closely integrated reader built into it is still a drawback to the overall WordPress platform.

Additionally,

It’s IndieWeb and POSSE friendly
It does automatic link forwarding in a flexible/responsible manner with canonical URLs
Allows for proper attributions for the original author and content source/news outlet
Keeps lots of metadata for analyzing reading behavior
Taggable and categorizable
Allows for comments/commenting
Could be used for creating a linkblog on steroids
Archives the original article on the day it was read.
Is searchable
Could be used for collaboration and curation
Has Mercury (formerly known as Readability) integrated for a cleaner reading interface
Has a pre-configured browser bookmarklet
Is open source and incredibly well documented
One can count clicks to ones’ own site as the referer while still pushing the reader to the original
Along with other plugins like JetPack’s Publicize or Social Networks Auto-Poster, one can automatically share their reads to Twitter, Facebook, or other social media silos. In this case, you own the link, but the original publisher also gets the traffic.

Cons
No clear path for nominating articles on mobile.
This can be a dealbreaker for some, so I’ve outlined a pretty quick and simple solution below.
No direct statistics
Statistics for gauging ones’ reading aren’t built in directly (yet?), but some scripts are available. [4][5][6]
No larger data aggregation
Services like Pocket are able to aggregate the data of thousands of users to recommend and reveal articles I might also like. Sadly this self-hosted concept makes it difficult (or impossible) do have this type of functionality. However, I usually have far too much good stuff to read anyway, so maybe this isn’t such a loss.
Suggested Improvements
Adding the ability to do webactions directly from the “Nominated” screen would be fantastic, particularly for the RSS reader portion.
Default to an unread view of the current “All Content” page. I find that I have to filter the view every time I visit the page to make it usable. I suspect this would be a better default for most newsrooms too.
It would be nice to have a pre-configured archive template page in a simple linkblog format that filters posts that were nominated/drafted/published via the Plugin. This will prevent users from needing to create one that’s compatible with their current theme. Something with a date read, Title linked to the original, Author, and Source attribution could be useful for many users.
A PressForward Nomination “Bookmarklet” for Mobile
One of the big issues I came up against immediately with PressForward is ease of use on mobile. A lot of the content I read is on mobile, so being able to bookmark (nominate) articles via mobile or apps like Nuzzel or Twitter is very important. I suspect this may also be the case for many of their current user base.
Earlier this year I came across a great little Android mobile app called URL Forwarder which can be used to share things with the ubiquitous mobile sharing icons. Essentially one can use it to share the URL of the mobile page one is on to a mobile Nomination form within PressForward.
I’d suspect that there’s also a similar app for iOS, but I haven’t checked. If not available, URL Forwarder is open source on Github and could potentially be ported. There’s also a similar Android app called Bookmarklet Free which could be used instead of URL Forwarder.
PressForward’s built in bookmarklet kindly has a pre-configured URL for creating nominations, so it’s a simple case of configuring it. These details follow below for those interested.
Configuring URL Forwarder for PressForward

Open URL Forwarder
Click the “+” icon to create a filter.
Give the filter a name, “Nominate This” is a reasonable suggestion. (See photo below.)
Use the following entry for the “Filter URL” replacing example.com with your site’s domain name: http://example.com/wp-content/plugins/pressforward/includes/nomthis/nominate-this.php?u=@url
Leave the “Replaceable text” as “@url”
Finish by clicking on the checkmark in the top right corner.

Simple right?
Configuring URL Forwarder Sharing from a web page to URL Forwarder Choose “Nominate” to share to PressForward
Nominating a post via mobile
With the configuration above set up, do the following:

On the mobile page one wants to nominate, click the ubiquitous “share this” mobile icon (or share via a pull down menu, depending on your mobile browser or other app.)
Choose to share through URL Forwarder
Click on the “Nominate” option just created above.
Change/modify any data within your website administrative interface and either nominate or post as a draft. (This part is the same as one would experience using the desktop bookmarklet.)

What’s next?
Given the data intensity of both the feed reader and what portends to be years of article data, I’m left with the question of hosting it within my primary site or putting it on a subdomain?
I desperately want to keep it on the main site, but perhaps hosting it on a subdomain, similar to how both Aram Zucker-Scharff and James Digioia do it may be better advised?
I’ve also run across an issue with the automatic redirect which needs some troubleshooting as well. Hopefully this will be cleared up quickly and we’ll be off to the races.

References

[1]
C. Aldrich, “A New Reading Post-type for Bookmarking and Reading Workflow,” BoffoSocko | Musings of a Modern Day Cyberneticist, 22-Aug-2016. [Online]. Available: http://boffosocko.com/2016/08/22/a-new-reading-post-type-for-bookmarking-and-reading-workflow/. [Accessed: 31-Dec-2016]

[2]
C. Aldrich, “Owning my Online Reading Status Updates,” BoffoSocko | Musings of a Modern Day Cyberneticist, 20-Nov-2016. [Online]. Available: http://boffosocko.com/2016/11/20/owning-my-online-reading-status-updates/. [Accessed: 31-Dec-2016]

[3]
C. Aldrich, “Notes, Highlights, and Marginalia from E-books to Online,” BoffoSocko | Musings of a Modern Day Cyberneticist, 24-Oct-2016. [Online]. Available: http://boffosocko.com/2016/10/24/notes-highlights-and-marginalia/. [Accessed: 31-Dec-2016]

[4]
A. Zucker-Scharff, “Personal Statistics from 3 Months of Internet Reading,” Medium, 05-Sep-2015. [Online]. Available: https://medium.com/@aramzs/3-month-internet-reading-stats-f41fa15d63f0#.dez80up7y. [Accessed: 31-Dec-2016]

[5]
A. Zucker-Scharff, “Test functions based on PF stats for collecting data,” Gist. [Online]. Available: https://gist.github.com/AramZS/d10fe64dc33fc9ffc2d8. [Accessed: 31-Dec-2016]

[6]
A. Zucker-Scharff, “PressForward/pf_stats,” GitHub. [Online]. Available: https://github.com/PressForward/pf_stats. [Accessed: 31-Dec-2016]

Syndicated copies:
Reply
Chris Aldrich says:

February 10, 2017 at 12:44 pm

I also wrote some thoughts and had at least one interesting comment with some ideas/code here: http://boffosocko.com/2016/10/24/notes-highlights-and-marginalia/

Reply
Chris Aldrich says:

March 20, 2017 at 4:31 pm

@mattscomments No, not a plugin (yet?). I exported them via the Amazon Kindle Desktop app and added some CSS to improve the mark up a tad. Doesn’t take too long though. I greatly prefer to own this type of content on my own site first and then syndicate it to places like GoodReads after the fact.
I’ve written some details here: http://boffosocko.com/2016/10/24/notes-highlights-and-marginalia/
Searching my site for “marginalia” will uncover other resources: http://boffosocko.com/?s=marginalia
I also recently helped guide Jeremy Cherfas though some of the process and he’s well documented a similar (non-WordPress) workflow here: https://www.jeremycherfas.net/blog/setting-my-marginalia-free
Given your Twitter background at @mattmaldre:
There are some in the indieweb movement working on notes, highlights, marginalia, fragmentions, etc. You can search/find wiki pages for those topics here: https://indieweb.org. In particular, Kartik Prabhu has a fantastic set up which he describes (and you can see implemented along with sample code) at: https://kartikprabhu.com/articles/marginalia
I presume you’ll already know about https://www.w3.org/blog/news/archives/6156? As well as work by groups like http://hypothes.is.

Reply
Chris Aldrich says:

March 8, 2018 at 3:20 am
There’s so much great material out there to read and not nearly enough time. The question becomes: “How to best organize it all, so you can read even more?”
I just came across a tweet from Michael Nielsen about the topic, which is far deeper than even a few tweets could do justice to, so I thought I’d sketch out a few basic ideas about how I’ve been approaching it over the last decade or so. Ideally I’d like to circle back around to this and better document more of the individual aspects or maybe even make a short video, but for now this will hopefully suffice to add to the conversation Michael has started.

Lots of good insights in the responses. One thing stands out: this is a real pain point for many, & I don’t think anyone feels like they’ve nailed it (or how they organize information in general). It’d be great to have more ideas added to the thread! https://t.co/6KfhO5aVU3
— michael_nielsen (@michael_nielsen) March 8, 2018

How do people organize their reading? Perennially frustrated by this. I want one system that lets me trivially add books, papers, webpages, etc, re-organize very easily, search & filter. What works for you?
— michael_nielsen (@michael_nielsen) March 8, 2018

https://platform.twitter.com/widgets.js
Keep in mind that this is an evolving system which I still haven’t completely perfected (and may never), but to a great extent it works relatively well and I still easily have the ability to modify and improve it.
Overall Structure
The first piece of the overarching puzzle is to have a general structure for finding, collecting, triaging, and then processing all of the data. I’ve essentially built a simple funnel system for collecting all the basic data in the quickest manner possible. With the basics down, I can later skim through various portions to pick out the things I think are the most valuable and move them along to the next step. Ultimately I end up reading the best pieces on which I make copious notes and highlights. I’m still slowly trying to perfect the system for best keeping all this additional data as well.
Since I’ve seen so many apps and websites come and go over the years and lost lots of data to them, I far prefer to use my own personal website for doing a lot of the basic collection, particularly for online material. Toward this end, I use a variety of web services, RSS feeds, and bookmarklets to quickly accumulate the important pieces into my personal website which I use like a modern day commonplace book.
Collecting
In general, I’ve been using the Inoreader feed reader to track a large variety of RSS feeds from various clearinghouse sources (including things like ProQuest custom searches) down to individual researcher’s blogs as a means of quickly pulling in large amounts of research material. It’s one of the more flexible readers out there with a huge number of useful features including the ability to subscribe to OPML files, which many readers don’t support.
As a simple example arXiv.org has an RSS feed for the topic of “information theory” at http://arxiv.org/rss/math.IT which I subscribe to. I can quickly browse through the feed and based on titles and/or abstracts, I can quickly “star” the items I find most interesting within the reader. I have a custom recipe set up for the IFTTT.com service that pulls in all these starred articles and creates new posts for them on my WordPress blog. To these posts I can add a variety of metadata including top level categories and lower level tags in addition to other additional metadata I’m interested in.
I also have similar incoming funnel entry points via many other web services as well. So on platforms like Twitter, I also have similar workflows that allow me to use services like IFTTT.com or Zapier to push the URLs easily to my website. I can quickly “like” a tweet and a background process will suck that tweet and any URLs within it into my system for future processing. This type of workflow extends to a variety of sites where I might consume potential material I want to read and process. (Think academic social services like Mendeley, Academia.com, Diigo, or even less academic ones like Twitter, LinkedIn, etc.) Many of these services often have storage ability and also have simple browser bookmarklets that allow me to add material to them. So with a quick click, it’s saved to the service and then automatically ported into my website almost without friction.
My WordPress-based site uses the Post Kinds Plugin which takes incoming website URLs and does a very solid job of parsing those pages to extract much of the primary metadata I’d like to have without requiring a lot of work. For well structured web pages, it’ll pull in the page title, authors, date published, date updated, synopsis of the page, categories and tags, and other bits of data automatically. All these fields are also editable and searchable. Further, the plugin allows me to configure simple browser bookmarklets so that with a simple click on a web page, I can pull its URL and associated metadata into my website almost instantaneously. I can then add a note or two about what made me interested in the piece and save it for later.
Note here, that I’m usually more interested in saving material for later as quickly as I possibly can. In this part of the process, I’m rarely ever interested in reading anything immediately. I’m most interested in finding it, collecting it for later, and moving on to the next thing. This is also highly useful for things I find during my busy day that I can’t immediately find time for at the moment.
As an example, here’s a book I’ve bookmarked to read simply by clicking “like” on a tweet I cam across late last year. You’ll notice at the bottom of the post, I’ve optionally syndicated copies of the post to other platforms to “spread the wealth” as it were. Perhaps others following me via other means may see it and find it useful as well?
Triaging
At regular intervals during the week I’ll sit down for an hour or two to triage all the papers and material I’ve been sucking into my website. This typically involves reading through lots of abstracts in a bit more detail to better figure out what I want to read now and what I’d like to read at a later date. I can delete out the irrelevant material if I choose, or I can add follow up dates to custom fields for later reminders.
Slowly but surely I’m funneling down a tremendous amount of potential material into a smaller, more manageable amount that I’m truly interested in reading on a more in-depth basis.
Document storage
Calibre with GoodReads sync
Even for things I’ve winnowed down, there is still a relatively large amount of material, much of it I’ll want to save and personally archive. For a lot of this function I rely on the free multi-platform desktop application Calibre. It’s essentially an iTunes-like interface, but it’s built specifically for e-books and other documents.
Within it I maintain a small handful of libraries. One for personal e-books, one for research related textbooks/e-books, and another for journal articles. It has a very solid interface and is extremely flexible in terms of configuration and customization. You can create a large number of custom libraries and create your own searchable and sort-able fields with a huge variety of metadata. It often does a reasonable job of importing e-books, .pdf files, and other digital media and parsing out their meta data which prevents one from needing to do some of that work manually. With some well maintained metadata, one can very quickly search and sort a huge amount of documents as well as quickly prioritize them for action. Additionally, the system does a pretty solid job of converting files from one format to another, so that things like converting an .epub file into a .mobi format for Kindle are automatic.
Calibre stores the physical documents either in local computer storage, or even better, in the cloud using any of a variety of services including Dropbox, OneDrive, etc. so that one can keep one’s documents in the cloud and view them from a variety of locations (home, work, travel, tablet, etc.)
I’ve been a very heavy user of GoodReads.com for years to bookmark and organize my physical and e-book library and anti-libraries. Calibre has an exceptional plugin for GoodReads that syncs data across the two. This (and a few other plugins) are exceptionally good at pulling in missing metadata to minimize the amount that must be done via hand, which can be tedious.
Within Calibre I can manage my physical books, e-books, journal articles, and a huge variety of other document related forms and formats. I can also use it to further triage and order the things I intend to read and order them to the nth degree. My current Calibre libraries have over 10,000 documents in them including over 2,500 textbooks as well as records of most of my 1,000+ physical books. Calibre can also be used to add document data that one would like to ultimately acquire the actual documents, but currently don’t have access to.
BibTeX and reference management
In addition to everything else Calibre also has some well customized pieces for dovetailing all its metadata as a reference management system. It’ll allow one to export data in a variety of formats for document publishing and reference management including BibTex formats amongst many others.
Reading, Annotations, Highlights
Once I’ve winnowed down the material I’m interested in it’s time to start actually reading. I’ll often use Calibre to directly send my documents to my Kindle or other e-reading device, but one can also read them on one’s desktop with a variety of readers, or even from within Calibre itself. With a click or two, I can automatically email documents to my Kindle and Calibre will also auto-format them appropriately before doing so.
Typically I’ll send them to my Kindle which allows me a variety of easy methods for adding highlights and marginalia. Sometimes I’ll read .pdf files via desktop and use Adobe to add highlights and marginalia as well. When I’m done with a .pdf file, I’ll just resave it (with all the additions) back into my Calibre library.
Exporting highlights/marginalia to my website
For Kindle related documents, once I’m finished, I’ll use direct text file export or tools like clippings.io to export my highlights and marginalia for a particular text into simple HTML and import it into my website system along with all my other data. I’ve briefly written about some of this before, though I ought to better document it. All of this then becomes very easily searchable and sort-able for future potential use as well.
Here’s an example of some public notes, highlights, and other marginalia I’ve posted in the past.
Synthesis
Eventually, over time, I’ve built up a huge amount of research related data in my personal online commonplace book that is highly searchable and sortable! I also have the option to make these posts and pages public, private, or even password protected. I can create accounts on my site for collaborators to use and view private material that isn’t publicly available. I can also share posts via social media and use standards like webmention and tools like brid.gy so that comments and interactions with these pieces on platforms like Facebook, Twitter, Google+, and others is imported back to the relevant portions of my site as comments. (I’m doing it with this post, so feel free to try it out yourself by commenting on one of the syndicated copies.)
Now when I’m ready to begin writing something about what I’ve read, I’ve got all the relevant pieces, notes, and metadata in one centralized location on my website. Synthesis becomes much easier. I can even have open drafts of things as I’m reading and begin laying things out there directly if I choose. Because it’s all stored online, it’s imminently available from almost anywhere I can connect to the web. As an example, I used a few portions of this workflow to actually write this post.
Continued work
Naturally, not all of this is static and it continues to improve and evolve over time. In particular, I’m doing continued work on my personal website so that I’m able to own as much of the workflow and data there. Ideally I’d love to have all of the Calibre related piece on my website as well.
Earlier this week I even had conversations about creating new post types on my website related to things that I want to read to potentially better display and document them explicitly. When I can I try to document some of these pieces either here on my own website or on various places on the IndieWeb wiki. In fact, the IndieWeb for Education page might be a good place to start browsing for those interested.
One of the added benefits of having a lot of this data on my own website is that it not only serves as my research/data platform, but it also has the traditional ability to serve as a publishing and distribution platform!
Currently, I’m doing most of my research related work in private or draft form on the back end of my website, so it’s not always publicly available, though I often think I should make more of it public for the value of the aggregation nature it has as well as the benefit it might provide to improving scientific communication. Just think, if you were interested in some of the obscure topics I am and you could have a pre-curated RSS feed of all the things I’ve filtered through piped into your own system… now multiply this across hundreds of thousands of other scientists? Michael Nielsen posts some useful things to his Twitter feed and his website, but what I wouldn’t give to see far more of who and what he’s following, bookmarking, and actually reading? While many might find these minutiae tedious, I guarantee that people in his associated fields would find some serious value in it.
I’ve tried hundreds of other apps and tools over the years, but more often than not, they only cover a small fraction of the necessary moving pieces within a much larger moving apparatus that a working researcher and writer requires. This often means that one is often using dozens of specialized tools upon which there’s a huge duplication of data efforts. It also presumes these tools will be around for more than a few years and allow easy import/export of one’s hard fought for data and time invested in using them.
If you’re aware of something interesting in this space that might be useful, I’m happy to take a look at it. Even if I might not use the service itself, perhaps it’s got a piece of functionality that I can recreate into my own site and workflow somehow?
If you’d like help in building and fleshing out a system similar to the one I’ve outlined above, I’m happy to help do that too.
Related posts

Notes, Highlights, and Marginalia: From E-books to Online
A New Reading Post-type for Bookmarking and Reading Workflow
PressForward as an IndieWeb WordPress-based RSS Feed Reader & Pocket/Instapaper Replacement

Syndicated copies to:

Author: Chris Aldrich

I’m a biomedical and electrical engineer with interests in information theory, complexity, evolution, genetics, signal processing, theoretical mathematics, and big history.

I’m also a talent manager-producer-publisher in the entertainment industry with expertise in representation, distribution, finance, production, content delivery, and new media.
View all posts by Chris Aldrich

Syndicated copies:
Reply
Chris Aldrich says:

March 8, 2018 at 3:20 am
There’s so much great material out there to read and not nearly enough time. The question becomes: “How to best organize it all, so you can read even more?”
I just came across a tweet from Michael Nielsen about the topic, which is far deeper than even a few tweets could do justice to, so I thought I’d sketch out a few basic ideas about how I’ve been approaching it over the last decade or so. Ideally I’d like to circle back around to this and better document more of the individual aspects or maybe even make a short video, but for now this will hopefully suffice to add to the conversation Michael has started.

Lots of good insights in the responses. One thing stands out: this is a real pain point for many, & I don’t think anyone feels like they’ve nailed it (or how they organize information in general). It’d be great to have more ideas added to the thread! https://t.co/6KfhO5aVU3
— michael_nielsen (@michael_nielsen) March 8, 2018

How do people organize their reading? Perennially frustrated by this. I want one system that lets me trivially add books, papers, webpages, etc, re-organize very easily, search & filter. What works for you?
— michael_nielsen (@michael_nielsen) March 8, 2018

Keep in mind that this is an evolving system which I still haven’t completely perfected (and may never), but to a great extent it works relatively well and I still easily have the ability to modify and improve it.
Overall Structure
The first piece of the overarching puzzle is to have a general structure for finding, collecting, triaging, and then processing all of the data. I’ve essentially built a simple funnel system for collecting all the basic data in the quickest manner possible. With the basics down, I can later skim through various portions to pick out the things I think are the most valuable and move them along to the next step. Ultimately I end up reading the best pieces on which I make copious notes and highlights. I’m still slowly trying to perfect the system for best keeping all this additional data as well.
Since I’ve seen so many apps and websites come and go over the years and lost lots of data to them, I far prefer to use my own personal website for doing a lot of the basic collection, particularly for online material. Toward this end, I use a variety of web services, RSS feeds, and bookmarklets to quickly accumulate the important pieces into my personal website which I use like a modern day commonplace book.
Collecting
In general, I’ve been using the Inoreader feed reader to track a large variety of RSS feeds from various clearinghouse sources (including things like ProQuest custom searches) down to individual researcher’s blogs as a means of quickly pulling in large amounts of research material. It’s one of the more flexible readers out there with a huge number of useful features including the ability to subscribe to OPML files, which many readers don’t support.
As a simple example arXiv.org has an RSS feed for the topic of “information theory” at http://arxiv.org/rss/math.IT which I subscribe to. I can quickly browse through the feed and based on titles and/or abstracts, I can quickly “star” the items I find most interesting within the reader. I have a custom recipe set up for the IFTTT.com service that pulls in all these starred articles and creates new posts for them on my WordPress blog. To these posts I can add a variety of metadata including top level categories and lower level tags in addition to other additional metadata I’m interested in.
I also have similar incoming funnel entry points via many other web services as well. So on platforms like Twitter, I also have similar workflows that allow me to use services like IFTTT.com or Zapier to push the URLs easily to my website. I can quickly “like” a tweet and a background process will suck that tweet and any URLs within it into my system for future processing. This type of workflow extends to a variety of sites where I might consume potential material I want to read and process. (Think academic social services like Mendeley, Academia.com, Diigo, or even less academic ones like Twitter, LinkedIn, etc.) Many of these services often have storage ability and also have simple browser bookmarklets that allow me to add material to them. So with a quick click, it’s saved to the service and then automatically ported into my website almost without friction.
My WordPress-based site uses the Post Kinds Plugin which takes incoming website URLs and does a very solid job of parsing those pages to extract much of the primary metadata I’d like to have without requiring a lot of work. For well structured web pages, it’ll pull in the page title, authors, date published, date updated, synopsis of the page, categories and tags, and other bits of data automatically. All these fields are also editable and searchable. Further, the plugin allows me to configure simple browser bookmarklets so that with a simple click on a web page, I can pull its URL and associated metadata into my website almost instantaneously. I can then add a note or two about what made me interested in the piece and save it for later.
Note here, that I’m usually more interested in saving material for later as quickly as I possibly can. In this part of the process, I’m rarely ever interested in reading anything immediately. I’m most interested in finding it, collecting it for later, and moving on to the next thing. This is also highly useful for things I find during my busy day that I can’t immediately find time for at the moment.
As an example, here’s a book I’ve bookmarked to read simply by clicking “like” on a tweet I cam across late last year. You’ll notice at the bottom of the post, I’ve optionally syndicated copies of the post to other platforms to “spread the wealth” as it were. Perhaps others following me via other means may see it and find it useful as well?
Triaging
At regular intervals during the week I’ll sit down for an hour or two to triage all the papers and material I’ve been sucking into my website. This typically involves reading through lots of abstracts in a bit more detail to better figure out what I want to read now and what I’d like to read at a later date. I can delete out the irrelevant material if I choose, or I can add follow up dates to custom fields for later reminders.
Slowly but surely I’m funneling down a tremendous amount of potential material into a smaller, more manageable amount that I’m truly interested in reading on a more in-depth basis.
Document storage
Calibre with GoodReads sync
Even for things I’ve winnowed down, there is still a relatively large amount of material, much of it I’ll want to save and personally archive. For a lot of this function I rely on the free multi-platform desktop application Calibre. It’s essentially an iTunes-like interface, but it’s built specifically for e-books and other documents.
Within it I maintain a small handful of libraries. One for personal e-books, one for research related textbooks/e-books, and another for journal articles. It has a very solid interface and is extremely flexible in terms of configuration and customization. You can create a large number of custom libraries and create your own searchable and sort-able fields with a huge variety of metadata. It often does a reasonable job of importing e-books, .pdf files, and other digital media and parsing out their meta data which prevents one from needing to do some of that work manually. With some well maintained metadata, one can very quickly search and sort a huge amount of documents as well as quickly prioritize them for action. Additionally, the system does a pretty solid job of converting files from one format to another, so that things like converting an .epub file into a .mobi format for Kindle are automatic.
Calibre stores the physical documents either in local computer storage, or even better, in the cloud using any of a variety of services including Dropbox, OneDrive, etc. so that one can keep one’s documents in the cloud and view them from a variety of locations (home, work, travel, tablet, etc.)
I’ve been a very heavy user of GoodReads.com for years to bookmark and organize my physical and e-book library and anti-libraries. Calibre has an exceptional plugin for GoodReads that syncs data across the two. This (and a few other plugins) are exceptionally good at pulling in missing metadata to minimize the amount that must be done via hand, which can be tedious.
Within Calibre I can manage my physical books, e-books, journal articles, and a huge variety of other document related forms and formats. I can also use it to further triage and order the things I intend to read and order them to the nth degree. My current Calibre libraries have over 10,000 documents in them including over 2,500 textbooks as well as records of most of my 1,000+ physical books. Calibre can also be used to add document data that one would like to ultimately acquire the actual documents, but currently don’t have access to.
BibTeX and reference management
In addition to everything else Calibre also has some well customized pieces for dovetailing all its metadata as a reference management system. It’ll allow one to export data in a variety of formats for document publishing and reference management including BibTex formats amongst many others.
Reading, Annotations, Highlights
Once I’ve winnowed down the material I’m interested in it’s time to start actually reading. I’ll often use Calibre to directly send my documents to my Kindle or other e-reading device, but one can also read them on one’s desktop with a variety of readers, or even from within Calibre itself. With a click or two, I can automatically email documents to my Kindle and Calibre will also auto-format them appropriately before doing so.
Typically I’ll send them to my Kindle which allows me a variety of easy methods for adding highlights and marginalia. Sometimes I’ll read .pdf files via desktop and use Adobe to add highlights and marginalia as well. When I’m done with a .pdf file, I’ll just resave it (with all the additions) back into my Calibre library.
Exporting highlights/marginalia to my website
For Kindle related documents, once I’m finished, I’ll use direct text file export or tools like clippings.io to export my highlights and marginalia for a particular text into simple HTML and import it into my website system along with all my other data. I’ve briefly written about some of this before, though I ought to better document it. All of this then becomes very easily searchable and sort-able for future potential use as well.
Here’s an example of some public notes, highlights, and other marginalia I’ve posted in the past.
Synthesis
Eventually, over time, I’ve built up a huge amount of research related data in my personal online commonplace book that is highly searchable and sortable! I also have the option to make these posts and pages public, private, or even password protected. I can create accounts on my site for collaborators to use and view private material that isn’t publicly available. I can also share posts via social media and use standards like webmention and tools like brid.gy so that comments and interactions with these pieces on platforms like Facebook, Twitter, Google+, and others is imported back to the relevant portions of my site as comments. (I’m doing it with this post, so feel free to try it out yourself by commenting on one of the syndicated copies.)
Now when I’m ready to begin writing something about what I’ve read, I’ve got all the relevant pieces, notes, and metadata in one centralized location on my website. Synthesis becomes much easier. I can even have open drafts of things as I’m reading and begin laying things out there directly if I choose. Because it’s all stored online, it’s imminently available from almost anywhere I can connect to the web. As an example, I used a few portions of this workflow to actually write this post.
Continued work
Naturally, not all of this is static and it continues to improve and evolve over time. In particular, I’m doing continued work on my personal website so that I’m able to own as much of the workflow and data there. Ideally I’d love to have all of the Calibre related piece on my website as well.
Earlier this week I even had conversations about creating new post types on my website related to things that I want to read to potentially better display and document them explicitly. When I can I try to document some of these pieces either here on my own website or on various places on the IndieWeb wiki. In fact, the IndieWeb for Education page might be a good place to start browsing for those interested.
One of the added benefits of having a lot of this data on my own website is that it not only serves as my research/data platform, but it also has the traditional ability to serve as a publishing and distribution platform!
Currently, I’m doing most of my research related work in private or draft form on the back end of my website, so it’s not always publicly available, though I often think I should make more of it public for the value of the aggregation nature it has as well as the benefit it might provide to improving scientific communication. Just think, if you were interested in some of the obscure topics I am and you could have a pre-curated RSS feed of all the things I’ve filtered through piped into your own system… now multiply this across hundreds of thousands of other scientists? Michael Nielsen posts some useful things to his Twitter feed and his website, but what I wouldn’t give to see far more of who and what he’s following, bookmarking, and actually reading? While many might find these minutiae tedious, I guarantee that people in his associated fields would find some serious value in it.
I’ve tried hundreds of other apps and tools over the years, but more often than not, they only cover a small fraction of the necessary moving pieces within a much larger moving apparatus that a working researcher and writer requires. This often means that one is often using dozens of specialized tools upon which there’s a huge duplication of data efforts. It also presumes these tools will be around for more than a few years and allow easy import/export of one’s hard fought for data and time invested in using them.
If you’re aware of something interesting in this space that might be useful, I’m happy to take a look at it. Even if I might not use the service itself, perhaps it’s got a piece of functionality that I can recreate into my own site and workflow somehow?
If you’d like help in building and fleshing out a system similar to the one I’ve outlined above, I’m happy to help do that too.
Related posts

Notes, Highlights, and Marginalia: From E-books to Online
A New Reading Post-type for Bookmarking and Reading Workflow
PressForward as an IndieWeb WordPress-based RSS Feed Reader & Pocket/Instapaper Replacement

Syndicated copies:
Reply
Kimberly Hirsh says:

March 28, 2019 at 4:10 am

This Article was mentioned on kimberlyhirsh.com

Reply

Footnotes

Published by

Chris Aldrich

15 thoughts on “Notes, Highlights, and Marginalia: From E-books to Online”

Leave a Reply to Kimberly Hirsh Cancel reply