Is it possible to annotate links in Hypothes.is that are in the Internet Archive? My browser bookmarklet for it doesn’t work on such archived pages. I can imagine that there are several javascript or iframe related technical reasons for it. An information related reason may be that bringing togeth...
The ability to annotate archived material on the Internet Archive with Hypothes.is is definitely possible, and I do it from time to time. I’m not sure which browser or annotation tool (via, browser extensions, other) you’re using, but it’s possible that one or more combinations may have issues allowing you to do it or not. The standard browser extension on Chrome has worked well for me in the past.
Hypothes.is has methods for establishing document equivalency which archive.org apparently conforms. I did an academic experiment a few years back with an NYT article about books where you’ll see equivalent annotations on the original, the archived version, and a copy on my own site that has a rel="canonical" link back to the original as well:
I don’t recommend doing the rel-canonical trick on your own site frequently as I have noticed a bug, which I don’t think has been fixed.
The careful technologist with one tool or another, will see that I and a couple others have been occasionally delving into the archive and annotating Manfred Kuehn’s work. (I see at least one annotation from 2016, which was probably native on his original site before it was shut down in 2018.) I’ve found some great gems and leads into some historical work from his old site. In particular, he’s got some translations from German texts that I haven’t seen in other places.
<Dash> I didn't see any mention of the header in the github repo so I feel it's helpful to mention this: in case anyone gets ratelimited, Parler will honor an arbitrary x-forwarded-for header with any IP and, well, not ratelimit you according to my unscientific test
[...]
<kallsyms> Dash: lmao really?
[...]
<andrew> Dash: okay, that's actually pretty huge cc kiska arkiver Fusl
<Kaz> heh, good to know Dash
<ave> this is amazing
[...]
<NotNite> yep
<Kaz> does that also apply to the API Dash?
<andrew> so, who wants to modify the thing to generate new IPs for X-Forwarded-For for each job and see what difference that makes?
[...]
<Dash> Not sure, but i'm credential stuffing them with 300 threads without getting ratelimited
[...]
<Dash> Someone else should probably test
When you visit web archives to go back in time and look at a web page, you naturally expect it to display the content exactly as it appeared on the live web at that particular datetime. That is, of course, with the assumption in mind that all of the resources on the page were captured at or near the time of the datetime displayed in the banner for the root HTML page. However, we noticed that it is not always the case and problems with archiving Twitter's new UI can result in replaying Twitter profile pages that never existed on the live web. In our previous blog post, we talked about how difficult it is to archive Twitter's new UI, and in this blog post, we uncover how the new Twitter UI mementos in the Internet Archive are vulnerable to temporal violations.
An interesting quirk of archiving pages on the modern internet.
We recently highlighted opportunities for partners and peers to learn more about web archiving technology and practices through the Archive-It Advanced Training webinar series–all recorded and available on-demand. As more organizations and communities find web archiving needs though, Internet Archive staff are also introducing new and extended training materials to get them crawling for the first time.
A collection of books that supports emergency remote teaching, research activities, independent scholarship, and intellectual stimulation while universities, schools, training centers, and libraries are closed.
If you want to Internet Archive a tweet, copy the long number (tweet ID) and stick it on the end of this: https://web.archive.org/save/https://twitter.com/intent/tweet?in_reply_to=
So, I spend a long time trying to set up PESOS for individual silos on IFTTT, specifically Facebook and Instagram, because they are terrible. I’ve got it currently set up to publish my initial post, but no back feed support yet. Also, this is going to wordpress, but it shouldn’t matter (in theor...
This is some brilliant work. Thanks for puzzling it all out.
I do have a few questions/clarifications though so as not to be confused since there are a few pieces you’ve left out.
For the IndieAuth token, which is created at /wp-admin/users.php?page=indieauth_user_token one only needs to give it a title and the “create” scope?
For the “then” portion that uses IFTTT.com’s Webhooks service are the following correct?
The URL is (when used with WordPress) of the form: https://example.com/wp-json/micropub/1.0/endpoint
The Method is: POST
The Content Type I’m guessing based on the Body field you’ve included is: application/x-www-form-urlencoded
For your Pocket example, it looks like you’re using the Post Kinds Plugin, so I’m guessing that you could have gotten away without the {{Excerpt}} and {{Title}} portions and just have sent the URL which Post Kinds picks up and parses to give you your context portion with a title and an excerpt anyway?
It looks like part of the trouble of this PESOS set up is that you’re too reliant in the long run of relying on Pocket (or other services) being around in the long term. If Pocket disappears, then really, so does most of your bookmark, which ideally should point to the canonical URL of the content you’re bookmarking. Of course perhaps IFTTT may not give you that URL in many cases. It looks to me like the URL you’re bookmarking would make a more appropriate syndication link URL.
For most of my bookmarks, likes, reads, etc. I use a plugin that scrapes my post and saves a copy of the contents of all the URLs on my page to the Internet Archive so that even in the event of a site death, a copy of the content is saved for me for a later date.
In any case, I do like this method if one can get it working. For some PESOS sources, I’ve used IFTTT before, though typically with RSS feeds if the silo provides them. Even then I’m often saving them directly to WordPress as drafts for later modification if the data that IFTTT is providing is less than ideal. Sometimes worse, using RSS doesn’t allow one to use Post Kinds URL field and parsing functionality the way your webhook method does.
10 days ago I was sitting in a room in Los Angeles with 12 other folks listening to Marie Selvanadin, Sundi Richard, and Adam Croom talk about work they’re doing with Domains, and it was good! That session was followed by Peter Sentz providing insight on how BYU Domains provides and supports top-level domains and hosting for over 10,000 users on their campus. And first thing that Friday morning Lauren and I kicked the day off by highlighting Tim Clarke’s awesome work with the Berg Builds community directory as well as Coventry Domains‘s full-blown frame for a curriculum around Domains with Coventry Learn. In fact, the first 3 hours of Day 2 were a powerful reminder of just how much amazing work is happening at the various schools that are providing the good old world wide web as platform to their academic communities.
https://roadshow.reclaimhosting.com/LA/
I’m still bummed I couldn’t make it to this event…
One of the questions that came up during the SPLOT workshop is if there’s a SPLOT for podcasting, which reminded me of this post Adam Croom wrote a while back about his podcasting workflow: “My Podcasting Workflow with Amazon S3.” . We’re always on the look-out for new SPLOTs to bring to the Reclaim masses, and it would be cool to have an example that moves beyond WordPress just to make the point a SPLOT is not limited to WordPress (as much as we love it) —so maybe Adam and I can get the band back together.❧
I wonder if this could be used to create a SPLOT that isn’t WordPress based potentially using APIs from the Internet Archive and Huffduffer? WordPress-based infrastructure could be used to create it certainly and aggregation could be done around tags. It looks like the Huffduffer username SPLOT is available.
–annotated December 17, 2019 at 10:46AM
I’ve been going through a number of broken links on my website and slowly, but surely, bringing many of them back to life. Thanks Broken Link Checker! Apparently there were 429 broken links, but I’m able to quickly fix many of them because as I made my posts, I backed up the original links automatically using Post Archival in the Internet Archive. (Somehow this plugin has violated one of WordPress’ guidelines and can’t be downloaded, though I haven’t seen any details about why or been notified about it.)
I’ve only come across one or two which archive.org didn’t crawl or didn’t have. Many of the broken links I’m able to link directly to archive copies on the same day I made them and my archive snapshots were the only ones ever made.
Browse the collection, and look for a title the represents how you feel about Ontario Extend, or a colleague’s work. Tweet it out so we can all listen to the digital record spin.
While looking forward to IndieWeb Summit this weekend, I take a listen back at the past courtesy of the Ontario Extend and the Great 78 Project at the Internet Archive.
Yesterday I was contemplating calendar heatmaps which are probably best known from the user interface of GitHub which relatively shows how active someone is on the website. I’ve discovered that JetPack for WordPress provides a similar functionality on the back end (in blue instead of green), but sadly doesn’t make it available for display on the front end of websites. I’ve filed a feature request to see if it’s something they’d work on in the future, so if having something like this seems useful to you, please click through and give the post a +1.
Circular Widthmaps
Today I saw a note that led me to the Internet Archive which I know has recently had a redesign. I’m not sure if the functionality I saw was part of this redesign, but it’s pretty awesome. I’m not sure quite what to call this sort of circular bar chart given what it does, but circular widthmap seems vaguely appropriate. Here’s a link to the archive.org page for my website that shows this cool UI, screencaptures of which also appear below: http://web.archive.org/web/sitemap/https://www.boffosocko.com/
Instead of using color gradations to indicate a relative number of posts, the UI is measuring things via width in ever increasing concentric circles. The innermost circle indicates the root domain and successive levels out add additional paths from my site. Because I’m using dated archive paths, there’s a level of circle by year (2019, 2018, 2017, etc.) then another level outside that by months (April 2019, March 2019, etc.), and finally the outermost circle which indicates individual posts. As a result, the width of a particular year or month indicates relatively how active that time frame was on my website (or at least how active Archive.org thinks it was based on its robot crawler.)
Of course the segments on the circles also measure things like categories and tags on my site as well along with the date based archives. Thus I can gauge how often I use particular categories for example.
I’ll also note that in the 2018 portion of the circle for July 11th, I had a post that slashdotted my website when it took off on Hacker News. That individual day is represented as really wide on that circular ring because it has an additional concentric circle outside of it that represents the hundreds of comment URL fragments for that post. So one must keep in mind that things in some of the internal rings aren’t as relative because they may be heavily affected by portions of content further out on the ring.
How awesome would it be if this were embed-able and usable on my own website?
According to Pocket’s account I read 766,000 words or the equivalent of about 10 books. My most saved topics were current events, science, technology, health, and education.
The most popular things I apparently saved this year:
I’ll have to work at getting better to create my own end-of-year statistics since my own website has a better accounting of what I’ve actually read (it isn’t all public) and bookmarked. I do like that their service does some aggregate comparison of my data versus all the other user data (anonymized from my perspective).
Pocket also does a relatively good job of doing discovery of good things to read based on aggregate user data in terms of categories like “Best of” and “Popular”. They also give me weekly email updates of things I’ve bookmarked there as reminders to go back and read them, which I find a useful functionality which they haven’t over-gamified. Presently my own closest functionality to this is to be subscribed to the RSS feed of my own public bookmarks in a feed reader (which I find generally useful) as well as regularly checking on my private bookmarks on my websites’s back end (something as easy as clicking on a browser bookmark) and even looking at my “on this day” functionality to review over things from years past.
I’ll note that I currently rely more on Nuzzle for real-time discovery on a daily basis however.
Greg McVerry might appreciate that they’re gamifying reading by presenting me with a badge.
As an aside while I’m thinking of it, it might be a cool thing if the IndieWeb wiki received webmentions, so that self-documentation I do on my own website automatically appeared on the appropriate linked pages either in a webmention section or perhaps the “See Also” section. If wikis did this generally, it would be a cool means of potentially building communities and fuelling discovery on the broader web. Imagine if adding to a wiki via Webmention were as easy as syndicating content to a site like IndieNews or IndieWeb.XYZ? It could also function as a useful method of archiving web content from original pages to places like the Internet Archive in a simple way, much like how I currently auto-archive my individual pages automatically on the day they’re published.
Yesterday, Quora announced that 100 million user accounts were compromised, including private activity like downvotes and direct messages, by a “malicious third party.”
Data breaches are a frustrating part of the lifecycle of every online service — as they grow in popularity, they become a big...