I’ve been going through a number of broken links on my website and slowly, but surely, bringing many of them back to life. Thanks Broken Link Checker! Apparently there were 429 broken links, but I’m able to quickly fix many of them because as I made my posts, I backed up the original links automatically using Post Archival in the Internet Archive. (Somehow this plugin has violated one of WordPress’ guidelines and can’t be downloaded, though I haven’t seen any details about why or been notified about it.)
I’ve only come across one or two which archive.org didn’t crawl or didn’t have. Many of the broken links I’m able to link directly to archive copies on the same day I made them and my archive snapshots were the only ones ever made.
It was the biggest disaster in the history of the music business — and almost nobody knew. This is the story of the 2008 Universal fire.
This brings back some memories of when I worked for several months for Iron Mountain at their Hollywood facility right next to Anawalt lumber. They had quite a large repository of music masters stored there as well as a custom nitrate film vault. At the time I remember thinking many of the same things mentioned here. I suspect that there’s an even bigger issue in film preservation, though this particular article makes it seem otherwise.
I’m surprised that the author doesn’t whip out any references to the burning of the Library at Alexandria, which may have been roughly on par in terms of cultural loss to society. It’s painfully sad that UMG covered up the devastating loss.
The artwork for the piece is really brilliant. Some great art direction here.
This talk will present innovative uses of Docker containers, emulators and web archives to allow anyone to experience old web sites using old web browsers, as demonstrated by the Webrecorder and oldweb.today projects. Combining containerization with emulation can provide new techniques in preserving both scholarly and artistic interactive works, and enable obsolete technologies like Flash and Java applets to be accessible today and in the future. The talk will briefly cover the technology and how it can be deployed both locally and in the cloud. Latest research in this area, such as automated preservation of education publishing platforms like Scalar will also be presented. The presentation will include live demos and users will also be invited to try the latest version of oldweb.today and interact with old browsers directly in their browser. The Q&A will help serve to foster a discussion on the potential opportunities and challenges of containerization technology in ‘future-proofing’ interactive web content and software.
I’ve got a new piece over at The Atlantic on Barack Obama’s prospective presidential library, which will be digital rather than physical. This has caused some consternation. We need to realize, however, that the Obama library is already largely digital:
The vast majority of the record his presid...
I love the perspective given here, and in the article, of how important a digital library might be.
The means and methods of digital preservation also become an interesting test case for this particular presidency because so much of it was born digitally. I’m curious what the overlaps are for those working in the archival research space? In fact, I know that groups like the Reynolds Journalism Institute have been hosting conferences like Dodging the Memory Hole which are working at preserving born digital news and I suspect there’s a huge overlap with what digital libraries like this one are doing. I have to think Dan would make an interesting keynote speaker if there were another Dodging the Memory Hole conference in the near future.
Given my technological background, I’m less reticent than some detractors of digital libraries, but this article reminds me of some of the structural differences in this particular library from an executive and curatorial perspective. Some of these were well laid out in an episode of On the Media which I listened to recently. I’d be curious to hear what Dan thinks of this aspect of the curatorial design, particularly given the differences a primarily digital archive might have. For example, who builds the search interface? Who builds the API for such an archive and how might it be designed to potentially limit access of some portions of the data? Design choices may potentially make it easier for researchers, but given the current and some past administrations, what could happen if curators were less than ideal? What happens with changes in technology? What about digital rot or even link rot? Who chooses formats? Will they be standardized somehow? What prevents pieces from being digitally tampered with? When those who win get to write the history, what prevents those in the future from digitally rewriting the narrative? There’s lots to consider here.
What do you do with 11,000 blogs on a platform that is over a decade old? That is the question that the Division of Teaching and Learning Technologies (DTLT) and the UMW Libraries are trying to answer. This essay outlines the challenges of maintaining a large WordPress multisite installation and offers potential solutions for preserving institutional digital history. Using a combination of data mining, personal outreach, and available web archiving tools, we show the importance of a systematic, collaborative approach to the challenges we didn’t expect to face in 2007 when UMW Blogs launched. Complicating matters is the increased awareness of digital privacy and the importance of maintaining ownership and control over one’s data online; the collaborative nature of a multisite and the life cycle of a student or even faculty member within an institution blurs the lines of who owns or controls the data found on one of these sites. The answers may seem obvious, but as each test case emerges, the situation becomes more and more complex. As an increasing number of institutions are dealing with legacy digital platforms that are housing intellectual property and scholarship, we believe that this essay will outline one potential path forward for the long-term sustainability and preservation.
Some interesting things to consider for a DoOO project in terms of longevity and archiving.
When it comes to their stuff, people often have a hard time letting go. When the object of their obsession are rooms full of old clothes or newspapers, it can be unhealthy—even dangerous. But what about a stash that fits on 10 5-inch hard drives?
This is an important topic and something which should be tended to on an ongoing basis.
Ben Welsh of the LA Times data desk has built Savemy.News which leverages Twitter in combination with archive.is, webcitation.org, and archive.org to allow journalists to quickly create multiple archives of their work by simply inputting the URLs of their related pages. It’s also got a useful download functionality too.
Those with heavier digital journalism backgrounds and portfolios may find some useful information and research coming out of Reynolds Journalism Institute’s Dodging the Memory Hole series of conferences. I can direct those interested to a variety of archivists, librarians, researchers, and technologists should they need heavier lifting that simpler solutions than archive.org, et al.
Additional ideas for archiving and saving online work can be found on the IndieWeb wiki page archival copy. There are some additional useful ideas and articles on the IndieWeb for Journalism page as well. I’d welcome anyone with additional ideas or input to feel free to add to any of these pages for others’ benefit as well. If you’re unfamiliar with wiki notation or editing, feel free to reply to this post; I’m happy to make additions on your behalf or help you log in and navigate the system directly.
If you don’t have a website where you keep your personal archive and/or portfolio online already, now might be a good time to put one together. The IndieWeb page mentioned above has some useful ideas, real world examples, and even links to tutorials.
As an added bonus for those who clicked through, if you’re temporarily unemployed and don’t have your own website/portfolio already, I’m happy to help build an IndieWeb-friendly website (gratis) to make it easier to store and display your past and future articles.
I’ve recently outlined how ideas like a Domain of One’s Own and IndieWeb philosophies could be used to allow researchers and academics to practice academic samizdat on the open web to own and maintain their own open academic research and writing. A part of this process is the need to have useful and worthwhile back up and archiving ability as one thing we have come to know in the history of the web is that link rot is not our friend.
Toward that end, for those in the space I’ll point out some useful resources including the IndieWeb wiki pages for archival copies. Please contribute to it if you can. Another brilliant resource is the annual Dodging the Memory Hole conference which is run by the Reynolds Journalism Institute.
While Dodging the Memory Hole is geared toward saving online news in particular, many of the conversations are nearly identical to those in the broader archival space and also involve larger institutional resources and constituencies like the Internet Archive, the Library of Congress, and university libraries as well. The conference is typically in the fall of each year and is usually announced in August/September sometime, so keep an eye out for its announcement. In the erstwhile, they’ve recorded past sessions and have archive copies of much of their prior work in addition to creating a network of academics, technologists, and journalists around these ideas and related work. I’ve got a Twitter list of prior DtMH participants and stake-holders for those interested.
I’ll also note briefly, that as I self-publish on my own self-hosted domain, I use a simple plugin so that both my content and the content to which I link are being sent to the Internet Archive to create copies there. In addition to semi-regular back ups I make locally, this hopefully helps to mitigate potential future loss and link rot.
As a side note, major bonus points to Robin DeRosa (@actualham) for the use of the IndieWeb hashtag in her post!!
Dave Winer has a great post today on the closing of blogs.harvard.edu. These are sites run by Berkman, some dating back to 2003, which are being shut down. My galaxy brain goes towards the idea of …
An interesting take on self-hosting and DoOO ideas with regard to archiving and maintaing web presences. I’ll try to write a bit more on this myself shortly as it’s an important area that needs to be expanded for all on the open web.
I got an email in the middle of the night asking if I had seen an announcement from Berkman Center at Harvard that they will stop hosting blogs.harvard.edu. It's not clear what will happen to the archives. Let's have a discussion about this. That was the first academic blog hosting system anywhere. It was where we planned and reported on our Berkman Thursday meetups, and BloggerCon. It's where the first podcasts were hosted. When we tried to figure out what makes a weblog a weblog, that's where the result was posted. There's a lot of history there. I can understand turning off the creation of new posts, making the old blogs read-only, but as a university it seems to me that Harvard should have a strong interest in maintaining the archive, in case anyone in the future wants to study the role we played in starting up these (as it turns out) important human activities.
This is some earthshaking news. Large research institutions like this should be maintaining archives of these types of things in a defacto manner. Will have to think about some implications for others in the DoOO and IndieWeb spaces.
The researcher’s post can webmention an aggregating website similar to the way they would pre-print their research on a server like arXiv.org. The aggregating website can then parse the original and display the title, author(s), publication date, revision date(s), abstract, and even the full paper itself. This aggregator can act as a subscription hub (with WebSub technology) to which other researchers can use to find, discover, and read the original research.
Readers of the original research can then write about, highlight, annotate, and even reply to it on their own websites to effectuate peer-review which then gets sent to the original by way of Webmention technology as well. The work of the peer-reviewers stands in the public as potential work which could be used for possible evaluation for promotion and tenure.
Readers of original research can post metadata relating to it on their own website including bookmarks, reads, likes, replies, annotations, etc. and send webmentions not only to the original but to the aggregation sites which could aggregate these responses which could also be given point values based on interaction/engagement levels (i.e. bookmarking something as “want to read” is 1 point where as indicating one has read something is 2 points, or that one has replied to something is 4 points and other publications which officially cite it provide 5 points. Such a scoring system could be used to provide a better citation measure of the overall value of of a research article in a networked world. In general, Webmention could be used to provide a two way audit-able trail for citations in general and the citation trail can be used in combination with something like the Vouch protocol to prevent gaming the system with spam.
Government institutions (like Library of Congress), universities, academic institutions, libraries, and non-profits (like the Internet Archive) can also create and maintain an archival copy of digital and/or printed copies of research for future generations. This would be necessary to guard against the death of researchers and their sites disappearing from the internet so as to provide better longevity.
How many more times do people have to get stiffed by a free web service that just bites the dust and leaves you bubkas?
A monster post, some ranting on companies like Storify who offer free services that leverage our effort to get worth enough to get sold – when they do they just yank our content, an approach for local archiving your storify dying content, a new home spun tool for extracting all embeddable content links and how to use it to create your own archives in WordPress.
Storify Is Nuking, for no credible reason, All Your Content
Okay there are two kinds of people or organizations that create things for the web. One is looking to make money or fame and cares not what happens once they get either (or none and go back to flipping burgers). The other has an understanding and care for the history and future of the web, and makes every effort to make archived content live on, to not leave trails of dead links.
I like Alan Levine’s take on type one and type two silo services. Adobe/Storify definitely seems to be doing things the wrong way for shutting down a service. He does a great job of laying out some thought on how to create collection posts, particularly on WordPress, though I suspect the user interface could easily be recreated on other platforms.
I would add some caution to some of his methods as he suggests using WordPress’s embed capabilities by using raw URLs to services like Twitter. While this can be a reasonable short term solution and the output looks nice, if the original tweet or content at that URL is deleted (or Twitter shuts down and 86s it the same way Storify has just done), then you’re out of luck again!
Better than relying on the auto-embed handled by WordPress, actually copy the entire embed from Twitter to capture the text and content from the original.
There’s a big difference in the following two pieces of data:
<blockquote class="twitter-tweet" data-lang="en">
<p dir="ltr" lang="en">I hope <a href="https://twitter.com/Storify?ref_src=twsrc%5Etfw">@storify</a> will follow the example set by <a href="https://twitter.com/dougkaye?ref_src=twsrc%5Etfw">@dougkaye</a> when he shut down ITConversations: <a href="https://t.co/oBTWmR5M3A">https://t.co/oBTWmR5M3A</a>.</p>
My shows there are now preserved (<a href="https://t.co/IuIUMvMXi3">https://t.co/IuIUMvMXi3</a>) in a way that none of my magazine writing was.
— Jon Udell (@judell) <a href="https://twitter.com/judell/status/940973536675471360?ref_src=twsrc%5Etfw">December 13, 2017</a>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8">
While WordPress ostensibly displays them the same, one will work as long as Twitter lives, and the other lives as long as your own site lives and actually maintains the original content.
Now there are certainly bigger issues for saving video content this way from places like YouTube given copyright issues as well as bandwidth and other technical concerns. In these cases, perhaps embedding the URLs only within WordPress is the way to go. But keep in mind what it is you’re actually copying/archiving when you use the method he discusses.
Side note: I prefer the closer Yiddish spelling of bupkis. It is however a great term for what you often end up receiving from social silos that provide you with services that you can usually pretty easily maintain yourself.
Introduction to what one would consider basic web communication
A few days ago I had written a post on my website and a colleague had written a reply on his own website. Because we were both using the W3C Webmention specification on our websites, my site received the notification of his response and displayed it in the comments section of my website. (This in and of itself is really magic enough–cross website @mentions!)
To reply back to him I previously would have written a separate second post on my site in turn to reply to his, thereby fragmenting the conversation across multiple posts and making it harder to follow the conversation. (This is somewhat similar to what Medium.com does with their commenting system as each reply/comment is its own standalone page.)
Instead, I’ve now been able to configure my website to allow me to write a reply directly to a response within my comments section admin UI (or even in the comments section of the original page itself), publish it, and have the comment be sent to his reply and display it there. Two copies for the price of one!
This means that now, WordPress-based websites (at least self-hosted versions running the WordPress.org code) can easily and simply allow multiple parties to write posts on their own sites and participate in multi-sided conversations back and forth while all parties maintain copies of all sides of the conversation on their own websites in a way that maintains all of the context. As a result, if one site should be shut down or disappear, the remaining websites will still have a fully archived copy of the entire conversation thread. (Let’s hear it for the resilience of the web!)
What is happening?
This functionality is seemingly so simple that one is left wondering:
“Why wasn’t this baked into WordPress (and the rest of the web) from the start?”
“Why wasn’t this built after the rise of Twitter, Facebook, or other websites which do this as a basic function?”
“How can I get it tout suite?!” (aka gimme, gimme, gimme, and right now!!!)
While seeming simple, the technical hurdles aren’t necessarily because there had previously never been a universal protocol for the web to allow it. (The Webmentions spec now makes it possible.) Sites like Facebook, Twitter, and others enable it because they’ve got a highly closed and highly customized environment that makes it a simpler problem to solve. In fact, even old-school web-based bulletin boards allowed this!
But even within social media one will immediately notice that you can’t use your Facebook account to reply to a Twitter account. And why not?! (While the web would be far better if one website or page could talk to another, these sites don’t for the simple economic reason that they want you using only their site and not others, and not enabling this functionality keeps you locked into what they’re selling.)
I’ll detail the basic set up below, but thought that it would be highly illustrative to have a diagram of what’s physically happening in case the description above seems a bit confusing to picture properly. I’ll depict two websites, each in their own column and color-coded so that content from site A is one color while content from site B is another color.
It really seems nearly incomprehensible to me how this hasn’t been built into the core functionality of the web from the beginning of at least the blogosphere. Yet here we are, and somehow I’m demonstrating how to do this from one WordPress site to another via the open web in 2017. To me this is the entire difference between a true Internet and just using someone else’s intranet.
While this general functionality is doable on any website, I’ll stick to enabling it specifically on WordPress, a content management system that is powering roughly 30% of all websites on the internet. You’ll naturally need your own self-hosted WordPress-based website with a few custom plugins and a modern semantic-based theme. (Those interested in setting it up on other platforms are more than welcome to explore the resources of the IndieWeb wiki and their chat which has a wealth of resources.)
As a minimum set you’ll want to have the following list of plugins enabled and configured:
Other instructions and help for setting these up and configuring them can be found on the IndieWeb wiki, though not all of the steps there are necessarily required for this functionality.
Ideally this all should function regardless of the theme you have chosen, but WordPress only provides the most basic support for microformats version 1 and doesn’t support the more modern version 2 out of the box. As a result, the display of comments from site to site may be a bit wonky depending on how supportive your particular theme is of the microformats standards. As you can see I’m using a relatively standard version of the TwentySixteen theme without a lot of customization and getting some reasonable results. If you have a choice, I’d recommend one of the following specific themes which have solid semantic markup:
The final plugin that enables sending comments from one comment section to another is the WordPress Webmention for Comments plugin. As it is still somewhat experimental and is not available in the WordPress repository, you’ll need to download it from GitHub and activate it. That’s it! There aren’t any settings or anything else to configure.
With the plugin installed, you should now be able to send comments and replies to replies directly within your comments admin UI (or directly within your comments section in individual pages, though this can not require additional clicks to get there, but you also don’t have the benefit of the admin editor either).
There is one current caveat however. For the plugin to actually send the webmention properly, it will need to have a URL in your reply that includes the microformats u-in-reply-to class. Currently you’ll need to do this manually until the plugin can properly parse and target the fragmentions for the comments properly. I hope the functionality can be added to the plugin to make the experience seamless in the future.
So what does this u-in-reply-to part actually look like? Here’s an example of the one I used to send my reply:
The class tells the receiving site that the webmention is a reply and to display it as such and the URL is necessary for your webmention plugin to know where to send the notification. You’d simply need to change the URL and the word (or words) that appear between the anchor tags.
If you want to have a hidden link and still send a webmention you could potentially add your link to a zero width space as well. This would look like the following:
Based on my experiments, using a <link> via HTML will work, but it will send it as a plain webmention to the site and it won’t show up natively as a reply.
Sadly, a plain text reply doesn’t work (yet), but hopefully some simple changes could be made to force it to using the common fragmentions pattern that WordPress uses for replies.
Interestingly this capability has been around for a while, it just hasn’t been well documented or described. I hope now that those with WordPress sites that already support Webmentions will have a better idea what this plugin is doing and how works.
Eventually one might expect that all the bugs in the system get worked out and the sub-plugin for sending comment Webmentions will be rolled up into the main Webmentions plugin, which incidentally handles fragmentions already.
In addition to the notes above, I will say that this is still technically experimental code not running on many websites, so its functionality may not be exact or perfect in actual use, though in experimenting with it I have found it to be very stable. I would recommend checking that the replies actually post to the receiving site, which incidentally must be able to accept webmentions. If the receiving website doesn’t have webmention support, one will need to manually cut and paste the content there (and likely check the receive notification of replies via email, so you can stay apprised of future replies).
You can check the receiving site’s webmention support in most browsers by right clicking and viewing the pages source. Within the source one should see code in the <head> section of the page which indicates there is a webmention endpoint. Here is an example of the code typically injected into WordPress websites that you’d be looking for:
Also keep in mind that some users moderate their comments, so that even though your mention was sent, they may need to approve it prior to it displaying on the page.
If you do notice problems or issues or have quirks, please file the issue with as full a description of what you did and what resulted as you can so that it can be troubleshot and made to work not only for you, but hopefully work better for everyone else.
Give it a try
So you’ve implemented everything above? Go ahead and write a reply on your own WordPress website and send me a webmention! I’ll do my best to reply directly to you so you can send another reply to make sure you’ve got things working properly.
Once you’re set, go forward and continue helping to make the web a better place.
I wanted to take a moment to give special thanks to Aaron Parecki, Matthias Pfefferle, and David Shanske who have done most of the Herculean work to get this and related functionality working. And thanks also to all who make up the IndieWeb community that are pushing the boundaries of what the web is and what it can accomplish. And finally, thanks to Khürt Williams who became the unwitting guinea pig for my first attempt at this. Thank you all!