An exclusive look at data from the controversial web site Sci-Hub reveals that the whole world, both poor and rich, is reading pirated research papers.

Sci Hub has been in the news quite a bit over the past half a year and the bookmarked article here gives some interesting statistics. I’ll preface some of the following editorial critique with the fact that I love John Bohannon’s work; I’m glad he’s spent the time to do the research he has. Most of the rest of the critique is aimed at the publishing industry itself.

From a journalistic standpoint, I find it disingenuous that the article didn’t actually hyperlink to Sci Hub. Neither did it link out (or provide a full quote) to Alicia Wise’s Twitter post(s) nor link to her rebuttal list of 20 ways to access their content freely or inexpensively. Of course both of these are editorial related, and perhaps the rebuttal was so flimsy as to be unworthy of a link from such an esteemed publication anyway.

Sadly, Elsevier’s list of 20 ways of free/inexpensive access doesn’t really provide any simple coverage for graduate students or researchers in poorer countries which are the likeliest group of people using Sci Hub, unless they’re going to fraudulently claim they’re part of a class which they’re not, and is this morally any better than the original theft method? It’s almost assuredly never used by patients, which seem to be covered under one of the options, as the option to do so is painfully undiscoverable past their typical $30/paper firewalls. Their patchwork hodgepodge of free access is so difficult to not only discern, but one must keep in mind that this is just one of dozens of publishers a researcher must navigate to find the one thing they’re looking for right now (not to mention the thousands of times they need to do this throughout a year, much less a career). Consider this experiment, which could be a good follow up to the article: is it easier to find and download a paper by title/author/DOI via Sci Hub (a minute) versus through any of the other publishers’ platforms with a university subscription (several minutes) or without a subscription (an hour or more to days)? Just consider the time it would take to dig up every one of 30 references in an average journal article: maybe just a half an hour via Sci Hub versus the days and/or weeks it would take to jump through the multiple hoops to first discover, read about, and then gain access and then download them from the over 14 providers (and this presumes the others provide some type of “access” like Elsevier). Those who lived through the Napster revolution in music will realize that the dead simplicity of their system is primarily what helped kill the music business compared to the ecosystem that exists now with easy access through the multiple streaming sites (Spotify, Pandora, etc.) or inexpensive paid options like (iTunes). If the publishing business doesn’t want to get completely killed, they’re going to need to create the iTunes of academia. I suspect they’ll have internal bean-counters watching the percentage of the total (now apparently 5%) and will probably only do something before it passes a much larger threshold, though I imagine that they’re really hoping that the number stays stable which signals that they’re not really concerned. They’re far more likely to continue to maintain their status quo practices. Some of this ease-of-access argument is truly borne out by the statistics of open access papers which are downloaded by Sci Hub–it’s simply easier to both find and download them that way compared to traditional methods; there’s one simple pathway for both discovery and download. Surely the publishers, without colluding, could come up with a standardized method or protocol for finding and accessing their material cheaply and easily? “Hart-Davidson obtained more than 100 years of biology papers the hard way—legally with the help of the publishers. ‘It took an entire year just to get permission,’ says Thomas Padilla, the MSU librarian who did the negotiating.” John Bohannon in Who’s downloading pirated papers? Everyone Personally, I use use relatively advanced tools like LibX, which happens to be offered by my institution and which I feel isn’t very well known, and it still takes me longer to find and download a paper than it would via Sci Hub. God forbid if some enterprising hacker were to create a LibX community version for Sci Hub. Come to think of it, why haven’t any of the dozens of publishers built and supported simple tools like LibX which make their content easy to access? If we consider the analogy of academic papers to the introduction of machine guns in World War I, why should modern researchers still be using single-load rifles against an enemy that has access to nuclear weaponry? My last thought here comes on the heels of the two tweets from Alicia Wise mentioned, but not shown in the article: She mentions that the New York Times charges more than Elsevier does for a full subscription. This is tremendously disingenuous as Elsevier is but one of dozens of publishers for which one would have to subscribe to have access to the full panoply of material researchers are typically looking for. Further, Elsevier nor their competitors are making their material as easy to find and access as the New York Times does. Neither do they discount access to the point that they attempt to find the subscription point that their users find financially acceptable. Case in point: while I often read the New York Times, I rarely go over their monthly limit of articles to need any type of paid subscription. Solely because they made me an interesting offer to subscribe for 8 weeks for 99 cents, I took them up on it and renewed that deal for another subsequent 8 weeks. Not finding it worth the full$35/month price point I attempted to cancel. I had to cancel the subscription via phone, but why? The NYT customer rep made me no less than 5 different offers at ever decreasing price points–including the 99 cents for 8 weeks which I had been getting!!–to try to keep my subscription. Elsevier, nor any of their competitors has ever tried (much less so hard) to earn my business. (I’ll further posit that it’s because it’s easier to fleece at the institutional level with bulk negotiation, a model not too dissimilar to the textbook business pressuring professors on textbook adoption rather than trying to sell directly the end consumer–the student, which I’ve written about before.)

(Trigger alert: Apophasis to come) And none of this is to mention the quality control that is (or isn’t) put into the journals or papers themselves. Fortunately one need’t even go further than Bohannon’s other writings like Who’s Afraid of Peer Review? Then there are the hordes of articles on poor research design and misuse of statistical analysis and inability to repeat experiments. Not to give them any ideas, but lately it seems like Elsevier buying the Enquirer and charging $30 per article might not be a bad business decision. Maybe they just don’t want to play second-banana to TMZ? Interestingly there’s a survey at the end of the article which indicates some additional sources of academic copyright infringement. I do have to wonder how the data for the survey will be used? There’s always the possibility that logged in users will be indicating they’re circumventing copyright and opening themselves up to litigation. I also found the concept of using the massive data store as a means of applied corpus linguistics for science an entertaining proposition. This type of research could mean great things for science communication in general. I have heard of people attempting to do such meta-analysis to guide the purchase of potential intellectual property for patent trolling as well. Finally, for those who haven’t done it (ever or recently), I’ll recommend that it’s certainly well worth their time and energy to attend one or more of the many 30-60 minute sessions most academic libraries offer at the beginning of their academic terms to train library users on research tools and methods. You’ll save yourself a huge amount of time. ## Thoughts on “Some academics remain skeptical of Academia.edu” | University Affairs Replied to Some academics remain skeptical of Academia.edu (University Affairs) They warn scholars to think twice before sharing their work on the popular social network. This morning I ran across a tweet from colleague Andrew Eckford: His response was probably innocuous enough, but I thought the article should be put to task a bit more. “35 million academics, independent scholars and graduate students as users, who collectively have uploaded some eight million texts” 35 million users is an okay number, but their engagement must be spectacularly bad if only 8 million texts are available. How many researchers do you know who’ve published only a quarter of an article anywhere, much less gotten tenure? “the platform essentially bans access for academics who, for whatever reason, don’t have an Academia.edu account. It also shuts out non-academics.” They must have changed this, as pretty much anyone with an email address (including non-academics) can create a free account and use the system. I’m fairly certain that the platform was always open to the public from the start, but the article doesn’t seem to question the statement at all. If we want to argue about shutting out non-academics or even academics in poorer countries, let’s instead take a look at “big publishing” and their$30+/paper paywalls and publishing models, shall we?

Given his following discussion, I can only imagine what he thinks of big publishers in academia and that debate.

“McGill’s Dr. Sterne calls it “the gamification of research,”

Most research is too expensive to really gamify in such a simple manner. Many researchers are publishing to either get or keep their jobs and don’t have much time, information, or knowledge to try to game their reach in these ways. And if anything, the institutionalization of “publish or perish” has already accomplished far more “gamification”, Academia.edu is just helping to increase the reach of the publication. Given that research shows that most published research isn’t even read, much less cited, how bad can Academia.edu really be? [Cross reference: Reframing What Academic Freedom Means in the Digital Age]

If we look at Twitter and the blogging world as an analogy with Academia.edu and researchers, Twitter had a huge ramp up starting in 2008 and helped bloggers obtain eyeballs/readers, but where is it now? Twitter, even with a reasonable business plan is stagnant with growing grumblings that it may be failing. I suspect that without significant changes that Academia.edu (which is a much smaller niche audience than Twitter) will also eventually fall by the wayside.

The article rails against not knowing what the business model is or what’s happening with the data. I suspect that the platform itself doesn’t have a very solid business plan and they don’t know what to do with the data themselves except tout the numbers. I’d suspect they’re trying to build “critical mass” so that they can cash out by selling to one of the big publishers like Elsevier, who might actually be able to use such data. But this presupposes that they’re generating enough data; my guess is that they’re not. And on that subject, from a journalistic viewpoint, where’s the comparison to the rest of the competition including ResearchGate.net or Mendeley.com, which in fact was purchased by Elsevier? As it stands, this simply looks like a “hit piece” on Academia.edu, and sadly not a very well researched or reasoned one.

In sum, the article sounds to me like a bunch of Luddites running around yelling “fire”, particularly when I’d imagine that most referred to in the piece feed into the more corporate side of publishing in major journals rather than publishing it themselves on their own websites. I’d further suspect they’re probably not even practicing academic samizdat. It feels to me like the author and some of those quoted aren’t actively participating in the social media space to be able to comment on it intelligently. If the paper wants to pick at the academy in this manner, why don’t they write an exposé on the fact that most academics still have websites that look like they’re from 1995 (if, in fact, they have anything beyond their University’s mandated business card placeholder) when there are a wealth of free and simple tools they could use? Let’s at least build a cart before we start whipping the horse.

For academics who really want to spend some time and thought on a potential solution to all of this, I’ll suggest that they start out by owning their own domain and own their own data and work. The movement certainly has an interesting philosophy that’s a great start in fixing the problem; it can be found at http://www.indiewebcamp.com.

## Moneyball for Book Publishers: A Detailed Look at How We Read

Read Moneyball for Book Publishers: A Detailed Look at How We Read (The New York Times)
A reader analytics company in London wants to use data on our reading habits to transform how publishers acquire, edit and market books.

## Marginalia and Revision Control

At the end of April, I read an article entitled “In the Margins” in the Johns Hopkins University Arts & Sciences magazine.  I was particularly struck by the comments of eminent scholar Jacques Neefs on page thirteen (or paragraph 20) about computers making marginalia a thing of the past:

I actually think that he may be completely wrong and that current technology actually allows us to keep far more marginalia! (Has anyone heard of digital exhaust?) The bigger issue may be that many writers just don’t know how to keep a better running log of their work to maintain all the relevant marginalia they’re actually producing. (Of course there’s also the subsequent broader librarian’s “digital dilemma” of maintaining formats for the future. As an example, thing about how easy or hard it might be for you to read that ubiquitous 3.5 inch floppy disk you used in 1995.)

A a technologist who has spent many years in the entertainment industry, I feel compelled to point everyone towards the concept of revision control (or version control) within the realm of computer science.  Though it’s primarily used in tracking changes in computer programs and is often a tool used by large teams of programmers, it can very easily be used for tracking changes in almost any type of writing from novels, short stories, screenplays, legal contracts, or any type of textual documentation of nearly any sort.

## Example Use Cases for Revision Control

### Publishing

As a direct example, I’m using what is known as a Git repository to track every change I make in a textbook I’m currently writing.  I can literally go back and view every change I’ve made since beginning the project, so though I’m directly revising one (or more) text files, all of my “marginalia” and revisions are saved and available.  Currently I’m only doing it for my own reference and for additional backup not supposing that anyone other than myself or an editor possibly may want to ever peruse it.  If I was working in conjunction with otheres, there are ways for me to track the changes, edits, or notes that others (perhaps an editor or collaborator) might make.

In addition to the general back-up of the project (in case of catastrophic computer failure), I also have the ability to go back and find that paragraph (or multiple pages) I deleted last week in haste, but realize that I desperately want them back now instead of having to recreate them de n0vo.

Because it’s all digital, future scholars also won’t have problems parsing my handwriting issues as has occasionally come up in differentiating Mary Shelley’s writing from that of her husband in digital projects like the Shelley Godwin Archive. The fact that all changes are tracked and placed in a tree-like structure will indicate who wrote what and when and will indicate which changes were ultimately accepted and merged into the final version.

### Screenplays in Hollywood

One particular use case I can easily see for such technology is tracking changes in screenplays over time.  I’m honestly shocked that every production company or even more likely studios don’t use such technology to follow changes in drafts over time. In the end, doing such tracking will certainly make Writers Guild of America (WGA) arbitrations much easier as literally every contribution to a script can be tracked to give screenwriters appropriate credit. The end results with the easy ability to time-machine one’s way back into older drafts is truly lovely, and the outputs give so much more information about changes in the script compared to the traditional and all-too-simple (*) which screenwriters use to indicate that something/anything changed on a specific line or the different colored pages which are used on scripts during production.

I can also picture future screenwriters using services like GitHub as platforms for storing and distributing their screenplays to potential agents, managers, and producers.

### Redlining Legal Documents

Having seen thousands of legal agreements go back and forth over the years, revision control is a natural tool for tracking the redlining and changes of legal documents as they change over time before they are finally (or even never) executed. I have to imagine that being able to abstract out the appropriate metadata in the long run may actually help attorneys, agents, etc. to become better negotiators, but something like this is a project for another day.

In addition to direct research for projects being undertaken by academics like Neefs, academics should look into using revision control in their own daily work and writings.  While writing a book, paper, journal article, essay, monograph, etc. (or graduate students writing theses) one could use their own Git repository to not only save but to back up all of their own work not only for themselves primarily, but also future scholars who come later who would not otherwise have access to the “marginalia” one creates while manufacturing their written thoughts in digital form.

I can easily picture Git as a very simple “next step” in furthering the concept of the digital humanities as well as in helping to bridge the gap between C.P. Snow’s “two cultures.” (I’d also suggest that revision control is a relatively simple step one could take before learning a particular programming language, which I think should be a mandatory tool in everyone’s daily toolbox regardless of their field(s) of interest.)

## Start Using Revision Control

“But how do I get started?” you ask.

Know going in that it may take parts of a day to get things set up and running, but once you’ve started with the basics, things are actually pretty easy and you can continue to learn the more advanced subtleties as you progress.  Once things are working smoothly, the additional overhead you’ll be expending won’t be too much more than the old method of hitting Alt-S to save one of your old Word documents in the time before auto-save became ubiquitous.

First one should start by choosing one of the myriad revision control systems that exist.  For the sake of brevity in this short introductory post, I’ll simply suggest that users take a very close look at Git because of its ubiquity and popularity in the computer science world and the fact that it includes a tremendously large amount of free information and support from a variety of sites on the internet. Git also has the benefit of having versions for all major operating systems (Windows, MacOS, and Linux). Git also has the benefit of a relatively long and robust life within the computer science community meaning that it’s very stable and has many more resources for the uninitiated to draw upon.

Once one has Git installed on their computer and has begun using it, I’d then recommending linking one’s local copy of the repository to a cloud storage solution like either GitHub or BitBucket.  While GitHub is certainly one of the most popular Git-related services out there (because it acts, in part, as the hub for a large portion of the open internet and thus promotes sharing), I often recommend using BitBucket as it allows free unlimited private but still share-able repositories while GitHub requires a small subscription fee for keeping one’s work private. Having a repository in the cloud will help tremendously in that your work will be available and downloadable from almost anywhere and because it also serves as a de-facto back-up solution for your work.

I’ve recently been playing around with version control to help streamline the writing/editing process for a book I’ve been writing. Though Git and it’s variants probably seem more daunting than they should to the everyday user, they really represent a very powerful tool. I’ve spent less than two days learning the basics of both Git and hosted repositories (GitHub and Bitbucket), and it has been more than well worth the minor effort.

There is a huge wealth of information on revision control in general and on installing and using Git available on the internet, including full textbooks. For the complete beginners, I’d recommend starting with The Chronicle’s “A Gentle Introduction to Version Control.” Keep in mind that though some of these resources look highly technical, it’s because many are trying to enumerate every function one could potentially desire, when even just the basic core functionality is more than enough to begin with. (I could analogize it to learning to drive a car versus actually reading the full manual so that you know how to take the engine apart and put it back together from scratch. To start with revision control, you only need to learn to “drive.”) Professors might also avail themselves of the use of their local institutional libraries which may host small sessions on learning such tools, or they might avail themselves of the help of their colleagues or students in the computer science department. For others, I’d recommend taking a look at Git’s primary website. BitBucket has an excellent step-by-step tutorial (and troubleshooting) for setting up the requisite software and using it.

## What do you use for revision control?

I’ll welcome any thoughts, experiences, or additional resources one might want to share with others in the comments.

## Rap Genius, a Textual Annotation Browser for Education, Digital Humanities, Science, and Publishing

Since the beginning of January, I’ve come back to regularly browsing and using the website Rap GeniusI’m sure that some of the education uses including poetry and annotations of classics had existed the last time I had visited, but I was very interested in seeing some of the scientific journal article uses which I hadn’t seen before. Very quickly browsing around opened up a wealth of ideas for using the platform within the digital humanities as well as for a variety of educational uses.

## Overview of Rap Genius

Briefly, the Rap Genius website was originally set up as an innovative lyrics service to allow users to not only upload song lyrics, but to mark them up with annotations as to the meanings of words, phrases, and provide information about the pop-culture references within the lyrics themselves.  (It’s not too terribly different from Google’s now-defunct Sidewicki or the impressive Highbrow, textual annotation browser, but has some subtle differences as well as improvements.)

Users can use not only text, but photos, video, and even audio to supplement the listings. Built-in functionality includes the ability to link the works to popular social media audio services SoundCloud, and Spotify as well as YouTube. Alternately one might think of it as VH1’s “Pop-up Video”, but for text on the Internet. Ultimately the site expanded to include the topics of rock, poetry, and news.  The rock section is fairly straightforward following the format of the rap section while the poetry section includes not only works of poetry (from The Rime of the Ancient Mariner to the King James version of The Bible), but also plays (the works of William Shakespeare) and complete novels (like F. Scott Fitzgerald’s The Great Gatsby.) News includes articles as well as cultural touchstones like the 2013 White House Correspondents’ Dinner Speech and the recent State of the Union. Ultimately all of the channels within Rap Genius platform share the same types of functionality, but are applied to slightly different categories to help differentiate the content and make things easier to find.  Eventually there may be a specific “Education Genius” (or other) landing page(s) to split out the content in the future depending on user needs.

On even its first blush, I can see this type of website functionality being used in a variety of educational settings including Open Access Journals, classroom use, for close readings, for MOOCs, publishing in general, and even for maintaining simple-to-use websites for classes. The best part is that the ecosystem is very actively growing and expanding with a recent release of an iPhone app and an announcement of a major deal with Universal to license music lyrics.

## General Education Use

To begin with, Rap Genius’ YouTube channel includes an excellent short video on how Poetry Genius might be used in a classroom setting for facilitating close-readings. In addition to the ability to make annotations, the site can be used to maintain a class specific website (no need to use other blogging platforms like WordPress or Blogger for things like this anymore) along with nice additions like maintaining a class roster built right in.  Once material begins to be posted, students and teachers alike are given a broad set of tools to add content, make annotations, ask questions, and provide answers in an almost real-time setting.

## MOOC Use Cases

Given the rapid growth of the MOOC-revolution (massively open online courseware) over the past several years, one of the remaining difficulties in administering such a class can hinge not only on being able to easily provide audio visual content to students, but allow them a means of easily interacting with it and each other in the learning process.  Poetry Genius (aka Education Genius) has a very interesting view into solving both of these problems, and, in fact, I can easily see the current version of the platform being used to replace competing platforms like Coursera, EdX, Udacity and others in a whole cloth fashion.

Currently most MOOC’s provide some type of simple topic-based threaded fora in which students post comments and questions as well as answers.  In many MOOCs this format becomes ungainly because of the size of the class (10,000+ students) and the quality of the content which is being placed into it. Many students simply eschew the fora because the time commitment per amount of knowledge/value gained is simply not worth their while. Within the Poetry Genius platform, students can comment directly on the material or ask questions, or even propose improvements, and the administrators (the professor or teaching assistants in this case) can accept, reject or send feedback request to students to amend their work and add it to the larger annotated work.  Fellow classmates can also vote up or down individual comments.

As I was noticing the interesting educational-related functionality of the Rap Genius platform, I ran across what is presumably the first MOOC attempting to integrate the platform into its pedagogical structure. Dr. Laura Nasrallah’s HarvardX course “Early Christianity: The Letters of Paul,” which started in January, asks students to also create Poetry Genius accounts to read and comment on the biblical texts which are a part of the course. The difficult portion of attempting to use Poetry Genius for this course is the thousands of “me-too” posters who are simply making what one might consider to be “throw-away” commentary rather than the intended “close reading” commentary for a more academic environment. (This type of posting is also seen in many of the fora-based online courses.) Not enough students are contributing substantial material, and when they are, it needs to be better and more quickly edited and curated into the main post to provide greater value to students as they’re reading along. Thus when 20,000 students jump into the fray, there’s too much initial chaos and the value that is being extracted out of it upon initial use is fairly limited – particularly if one is browsing through dozens of useless comments. It’s not until after-the-fact – once comments have been accepted/curated – that the real value will emerge. The course staff is going to have to spend more time doing this function in real time to provide greater value to the students in the class, particularly given the high number of people without intense scholarly training just jumping into the system and filling it with generally useless commentary. In internet parlance, the Poetry Genius site is experiencing the “Robert Scoble Effect” which changes the experience on it. (By way of explanation, Robert Scoble is a technology journalist/pundit/early-adopter with a massive follower base.  His power-user approach and his large following can drastically change his experience with web-based technology compared to the  common everyday user. It can also often bring down new services as was common in the early days of the social media movement.)

Typically with the average poem or rap song, the commentary grows slowly/organically and is edited along the way. In a MOOC setting with potentially hundreds of thousands of students, the commentary is like a massive fire-hose which makes it seemingly useless without immediate real-time editing. Poetry Genius may need a slightly different model for using their platform in larger MOOC-style courses versus the smaller classroom settings seen in high school or college (10-100 students). In the particular case for “The Letters of Paul,” if the course staff had gone into the platform first and seeded some of the readings with their own sample commentary to act as a model of what is expected, then the students would be a bit more accepting of what is expected. I understand Dr. Nasrallah and her teaching assistants are in the system and annotating as well, but it should also be more obvious which annotations are hers (or those of teaching assistants) to help better guide the “discussion” and act as a model. Certainly the materials generated on Poetry Genius will be much more useful for future students who take the course in future iterations. Naturally, Poetry Genius exists for the primary use of annotation, while I’m sure that the creators will be tweaking classroom-specific use as the platform grows and user needs/requirements change.

As a contrast to the HarvardX class, and for an additional example, one can also take a peek at Cathy Davidson’s Rap Genius presence for her Coursera class “The History and Future (Mostly) of Higher Education.”

## Open Access Journal Use

In my mind, this type of platform can easily and usefully be used for publishing open access journal articles. In fact, one could use the platform to self-publish journal articles and leave them open to ongoing peer review. Sadly at present, there seems to be only a small handful of examples on the site, including a PLOS ONE article, which will give a reasonable example of some of the functionality which is possible.  Any author could annotate and footnote their own article as well as include a wealth of photos, graphs, and tables giving a much more multimedia view into their own work.  Following this any academic with an account could also annotate the text with questions, problems, suggestions and all of these can be voted up or down as well as be remedied within the text itself. Other articles can also have the ability to directly cross-reference specific sections of previously posted articles.

Individual labs or groups with “journal clubs” could certainly join in the larger public commentary and annotation on a particular article, but higher level administrative accounts within the system can also create a proverbial clean slate on an article and allow members to privately post up their thoughts and commentaries which are then closed to the group and not visible to the broader public. (This type of functionality can be useful for Mrs. Smith’s 10th grade class annotating The Great Gatsby so that they’re not too heavily influenced by the hundreds or possibly thousands of prior comments within a given text as they do their own personal close readings.) One may note that some of this type of functionality can already be seen in competitive services like Mendeley, but the Rap Genius platform seems to take the presentation and annotation functionalities to the next level. For those with an interest in these types of uses, I recommend Mendeley’s own group: Reinventing the Scientific Paper.

A Rap Genius representative indicated they were pursuing potential opportunities with JSTOR that might potentially expand on these types of opportunities.

## Publishing

Like many social media related sites including platforms like WordPress, Tumblr, and Twitter, Rap Genius gives it’s users the ability to self-publish almost any type of content. I can see some excellent cross-promotional opportunities with large MOOC-type classes and the site. For example, professors/teachers who have written their own custom textbooks for MOOCs (eg. Keith Devlin’s Introduction to Mathematical Thinking course at Stanford via Coursera) could post up the entire text on the Poetry Genius site and use it not only to correct mistakes/typos and make improvements over time, but they can use it to discover things which aren’t clear to students who can make comments, ask questions, etc. There’s also the possibility that advanced students can actively help make portions clear themselves when there are 10,000+ students and just 1-2 professors along with 1-2 teaching assistants. Certainly either within or without the MOOC movement, this type of annotation set up may work well to allow authors to tentatively publish, edit, and modify their textbooks, novels, articles, journal articles, monographs, or even Ph.D. theses. I’m particularly reminded of Kathleen Fitzpatrick’s open writing/editing of her book Planned Obsolescence via Media Commons. Academics could certainly look at the Rap Genius platform as a simpler more user-friendly version of this type of process.

## Other Uses

I’m personally interested in being able to annotate science and math related articles and have passed along some tips for the Rap Genius team to include functionality like mathjax to be able to utilize Tex/LaTeX related functionality for typesetting mathematics via the web in the future.

Naturally, there are a myriad of other functionalities that can be built into this type of platform – I’m personally waiting for a way to annotate episodes of “The Simpsons”, so I can explain all of the film references and in-jokes to friends who laugh at their jokes, but never seem to know why – but I can’t write all of them here myself.

Interested users can easily sign up for a general Rap Genius account and dig right into the interface.  Those interested in education-specific functionality can request to be granted an “Educator Account” within the Rap Genius system to play around with the additional functionality available to educators. Every page in the system has an “Education” link at the top for further information and details. There’s also an Educator’s Forum [requires free login] for discussions relating specifically to educational use of the site.

## How to Sidestep Mathematical Equations in Popular Science Books

In the publishing industry there is a general rule-of-thumb that every mathematical equation included in a book will cut the audience of science books written for a popular audience in half – presumably in a geometric progression. This typically means that including even a handful of equations will give you an effective readership of zero – something no author and certainly no editor or publisher wants.

I suspect that there is a corollary to this that every picture included in the text will help to increase your readership, though possibly not by as proportionally a large amount.

In any case, while reading Melanie Mitchell’s text Complexity: A Guided Tour [Cambridge University Press, 2009] this weekend, I noticed that, in what appears to be a concerted effort to include an equation without technically writing it into the text and to simultaneously increase readership by including a picture, she cleverly used a picture of Boltzmann’s tombstone in Vienna! Most fans of thermodynamics will immediately recognize Boltzmann’s equation for entropy, $S = k log W$, which appears engraved on the tombstone over his bust.

I hope that future mathematicians, scientists, and engineers will keep this in mind and have their tombstones engraved with key formulae to assist future authors in doing the same – hopefully this will help to increase the amount of mathematics that is deemed “acceptable” by the general public.

## Academy of Motion Picture Arts & Sciences study on The Digital Dilemma

With a slight nod toward the Academy’s announcements of the Oscar nominees this morning, there’s something more interesting which they’ve recently released which hasn’t gotten nearly as much press, but portends to be much more vital in the long run.

As books enter the digital age and we watch the continued convergence of rich media like video and audio enter into e-book formats with announcements last week like Apple’s foray into digital publishing, the ability to catalog, maintain and store many types of digital media is becoming an increasing problem.  Last week the Academy released part two of their study on strategic issues in archiving and accessing digital motion picture materials in their report entitled The Digital Dilemma 2. Many of you will find it interesting/useful, particularly in light of the Academy’s description

Clicking on the image of the report below provides some additional information as well as the ability (with a simple login) to download a .pdf copy of their entire report.

There is also a recent Variety article which gives a more fully fleshed out overview of many of the issues at hand.

In the meanwhile, if you’re going to make a bet in this year’s Oscar pool, perhaps putting your money on the “Digital Dilemma” might be more useful than on Brad Pitt for Best Actor in “Moneyball”?

## Darwin Library, Now Online, Reveals Mind of 19th-Century Naturalist | The Chronicle

Bookmarked Darwin Library, Now Online, Reveals Mind of 19th-Century Naturalist by Jie Jenny Zou (The Chronicle of Higher Education)

A portion of Charles Darwin’s vast scientific library—including handwritten notes that the 19-century English naturalist scribbled in the margins of his books—has been digitized and is available online. Readers can now get a firsthand look into the mind of the man behind the theory of evolution.

The project to digitize Darwin’s extensive library, which includes 1,480 scientific books, was a joint effort with the University of Cambridge, the Darwin Manuscripts Project at the American Museum of Natural History, the Natural History Museum in Britain, and the Biodiversity Heritage Library.

The digital library, which includes 330 of the most heavily annotated books in the collection, is fully indexed—allowing readers to search through transcriptions of the naturalist’s handwritten notes that were compiled by the Darwin scholars Mario A. Di Gregorio and Nick Gill in 1990.

Charles Darwin’s Library from the Biodiversity Heritage Library

## Barnes & Noble Board Would Face Tough Choices in a Buyout Vote | Dealbook

Read Barnes & Noble Faces Tough Choices in a Buyout Vote by Steven Davidoff Solomon (DealBook)
If Leonard Riggio, Barnes & Noble's chairman, joins Liberty Media's proposed buyout of his company, the board needs to decide how to handle his 30 percent stake before shareholders vote on the deal.
This story from the New York Times’ Dealbook is a good quick read on some of the details and machinations of the Barnes & Noble buyout. Perhaps additional analysis on it from a game theoretical viewpoint would yield new insight?

## Failings and Opportunities of the Publishing Industry in the Digital Age

On Sunday, the Los Angeles Times printed a story about the future of reading entitled “Book publishers see their role as gatekeepers shrink.”

The article covers most of the story fairly well, but leaves out some fundamental pieces of the business picture.  It discusses a few particular cases of some very well known authors in the publishing world including the likes of Stephen King, Seth Godin, Paulo Coehlo, Greg Bear, and Neal Stephenson and how new digital publishing platforms are slowly changing the publishing business.

Indeed, many authors are bypassing traditional publishing routes and self-publishing their works directly online, and many are taking a much larger slice of the financial rewards in doing so.

The article, however, completely fails to mention or address how new online methods will be handling editorial and publicity functions differently than they’re handled now, and the future of the publishing business both now and in the future relies on both significantly.

It is interesting, and not somewhat ironic to note that, even in the case of this particular article, as the newspaper business in which it finds its outlet, has changed possibly more drastically than the book publishing business. If reading the article online, one is forced to click through four different pages on which a minimum of five different (and in my opinion, terrifically) intrusive ads appear per page. Without getting into the details of the subject of advertising, even more interesting, is that many of these ads are served up by Google Ads based on keywords, so three just on the first page were specifically publishing related.

Two of the ads were soliciting people to self-publish their own work. One touts how easy it is to publish, while the other glosses over the publicity portion with a glib statement offering an additional “555 Book Promotion Tips”! (I’m personally wondering if there can possibly be so many book promotion tips?)

Following the link in the third ad on the first page to its advertised site one discovers it states:

Although I find the portion about “baby steps” particularly entertaining, the first thing I’ll note is that the typical person is likely more readily equipped with the ability to distribute and market a children’s book than they might be at crafting one. Sadly however, there are very few who are capable of any of these tasks at a particularly high level, which is why there are relatively few new childrens’ books on the market each year and the majority of sales are older tried-and-true titles.

I hope the average reader sees the above come-on as the twenty-first century equivalent of the snake oil salesman who is tempting the typical wanna-be-author to call about their so-called “Free” Children’s Book Publishing Guide. I’m sure recipients of the guide end up paying the publisher to get their book out the door and more likely than not, it doesn’t end up in main stream brick-and-mortar establishments like Barnes & Noble or Borders, but only sells a handful of copies in easy to reach online venues like Amazon. I might suggest that the majority of sales will come directly from the author and his or her friends and family. I would further argue that neither now nor in the immediate or even distant future that many aspiring authors will be self-publishing much of anything and managing to make even a modest living by doing so.

Now of course all of the above begs the question of why exactly is it that people need/want a traditional publisher? What role or function do publishers actually perform for the business and why might they be around in the coming future?

The typical publishing houses perform three primary functions: filtering/editing material, distributing material, and promoting material. The current significant threat to the publishing business from online retailers like Amazon.com, Barnes & Noble, Borders, and even the recently launched Google Books is the distribution platforms themselves.  It certainly doesn’t take much to strike low cost deals with online retailers to distribute books, and even less so when they’re distributing them as e-books which cuts out the most significant cost in the business — that of the paper to print them on. This leaves traditional publishing houses with two remaining functions: filtering/editing material and the promotion/publicity function.

The Los Angeles Times article certainly doesn’t state it, but everyone you meet on the street could tell you that writers like Stephen King don’t really need any more publicity than what they’ve got already. Their fan followings are so significantly large that they only need to tell two people online that they’ve got a new book and they’ll sell thousands of copies of any book they release. In fact, I might wager that Stephen King could release ten horrific (don’t mistake this for horror) novels before their low quality would likely begin to significantly erode his sales numbers.  If he’s releasing them on Amazon.com and keeping 70% of the income compared to the average 6-18% most writers are receiving, he’s in phenomenally good shape. (I’m sure given his status and track record in the publishing business, he’s receiving a much larger portion of his book sales from his publisher than 18% by the way; I’d also be willing to bet if he approached Amazon directly, he could get a better distribution deal than the currently offered 70/30 split.)

What will eventually sway the majority of the industry is when completely unknown new writers can publish into these electronic platforms and receive the marketing push they need to become the next Stephen King or Neal Stephenson. At the moment, none of the major e-book publishing platforms are giving much, if any, of this type of publicity to any of their new authors, and many aren’t even giving it to the major writers. Thus, currently, even the major writers are relying primarily on their traditional publishers for publicity to push their sales.

I will admit that when 80% of all readers are online and consuming their reading material in e-book format and utilizing the full support of social media and cross-collateralization of the best portion of their word-of-mouth, that perhaps authors won’t need as much PR help. But until that day platforms will significantly need to ramp it up. Financially one wonders what a platform like Amazon.com will charge for a front and center advertisement for a new best-seller to push sales? Will they be looking for a 50/50 split on those sales? Exclusivity in their channel? This is where the business will become even more dicey. Suddenly authors who think they’re shedding the chains of their current publishers will be shackling themselves with newer and more significant manacles and leg irons.

The last piece of the business that needs to be subsumed is the editorial portion of the manufacturing process.  Agents and editors serve a significant role in that they filter out thousands and thousands of terrifically unreadable books. In fact, one might argue that even now they’re letting far too many marginal books through the system and into the market.

If we consider the millions of books housed in the Library of Congress and their general circulation, one might realize that only one tenth of a percent or less of books are receiving all the attention. Certainly classics like William Shakespeare and Charles Dickens are more widely read than the millions of nearly unknown writers who take up just as much shelf space in that esteemed library.

Most houses publish on the order of ten to a hundred titles per year, but they rely heavily on only one or two of them being major hits to cover not only the cost of the total failures, but to provide the company with some semblance of profit.  (This model is not unlike the same way that the feature film business works in Hollywood; if you throw enough spaghetti, something is bound to stick.)

The question then becomes: “how does the e-publishing business accomplish this editing and publicity in a better and less expensive way?” This question needs to be looked at from a pre-publication as well as a post-publication perspective.

From the pre-publication viewpoint the Los Angeles Times article interestingly mentions that many authors appreciate having a “conversation” with their readers and allowing it to inform their work. However, creators of the stature of Stephen King cannot possibly take in and consume criticism from their thousands of fans in any reasonable way not to mention the detriment to their output if they were forced to read and deal with all that criticism and feedback.  Even smaller stature authors often find it overwhelming to take in criticism from their agents, editors, and even a small handful of close friends, family, and colleagues.  Taking a quick look at the acknowledgement portions of a few dozen books generally reveals fewer than 10 people being thanked much less hundreds of names from their general reading public – people they neither know well, much less trust implicitly.

From the post-publication perspective, both printing on demand and e-book formats excise one of the largest costs of the supply chain management portions of the publishing world, but staff costs and salary are certainly very close in line after them.  One might argue that social media is the answer here and we can rely on services like LibraryThing, GoodReads, and others to supply this editorial/publicity process and eventually broad sampling and positive and negative reviews will win the day to cross good, but unknown writers into the popular consciousness. This may sound reasonable on the surface, but take a look at similar large recommendation services in the social media space like Yelp. These services already have hundreds of thousands of users, but they’re not nearly as useful as they need to be from a recommendation perspective and they’re not terrifically reliable in that they’re very often easily gamed. (Consider the number of positive reviews that appear on Yelp that are most likely written by the proprietors of the establishments themselves.) This outlet for editorial certainly has the potential to improve in the coming years, but it will still be quite some time before it has the possibility of totally ousting the current editorial and filtering regime.

From a mathematical and game theoretical perspective one must also consider how many people are going to subject themselves (willingly and for free) to some really bad reading material and then bother to write either a good or bad review of their experience. This particularly when the vast majority of readers are more than content to ride the coattails of the “suckers” who do the majority of the review work.

There are certainly a number of other factors at play in the publishing business as it changes form, but those discussed above are certainly significant in its continuing evolution.  Given the state of technology and its speed, if people feel that the tradition publishing world will collapse, then we should take its evolution to the nth degree. Using an argument like this, then even platforms like Amazon and Google Books will eventually need to narrow their financial split with authors down to infinitesimal margins as authors should be able to control every portion of their work without any interlopers taking any portion of their proceeds. We’ll leave the discussion of whether all of this might fit into the concept of the tragedy of the commons for a future date.

## Is the Los Angeles Times Simply Publishing Press Releases for Companies Like Barnes & Noble?

The Los Angeles Times published an online article entitled “Barnes & Noble says e-books outsell physical books online.” While I understand that this is a quiet holiday week, the Times should be doing better work than simply republishing press releases from corporations trying to garner post-holiday sales.  Some of the thoughts they might have included: