👓 Opinion: Silicon Valley Can’t Be Trusted With Our History | Buzz Feed

Read Opinion: Silicon Valley Can't Be Trusted With Our History by Evan HillEvan Hill (BuzzFeed)

We create almost everything on the internet, but we control almost none of it.

As time passes, I fear that more and more of what happened in those days will live only in memory. The internet has slowly unraveled since 2011: Image-hosting sites went out of business, link shorteners shut down, tweets got deleted, and YouTube accounts were shuttered. One broken link at a time, one of the most heavily documented historical events of the social media era could fade away before our eyes.

If Edward McCain (t) hasn’t come across this article yet, it might make an interesting case study for this year’s Dodging the Memory Hole conference. Definitely an interesting case of people archiving their online content.

cc: Journalism Digital News Archive (t); Donald W. Reynolds Journalism Institute (t)

Syndicated copies to:

🔖 Webrecorder: Create high-fidelity, interactive web archives of any web site you browse

Bookmarked Webrecorder (webrecorder.io)
Create high-fidelity, interactive web archives of any web site you browse.

This looks like a cool archiving tool!

h/t: Dodging the Memory Hole 2017

Syndicated copies to:

In honor of Dodging the Memory Hole 2017 this week, for free (hosting and domain registration not included) I’ll offer to build one journalist or academic a basic IndieWeb-capable WordPress-based portfolio website to display and archive their personal work.

Preference will be given to those in attendance at the conference, but any who need an “author platform” for their work are welcome. Comment or reply below by 11/25/17 to enter.

 

Syndicated copies to:

👓 Creating an archive of my online writing, from 2002-2017 | Richard MacManus

Read Creating an archive of my online writing, from 2002-2017 by Richard MacManus (richardmacmanus.com)
I’ve just spent an inordinate amount of time creating an archive of all my past online writing work, in particular of the tech blog I founded ReadWriteWeb. I thought I’d outline my reasons for doing this, and why I ended up relying heavily on the Internet Archive instead of the original website sources.

Journalists, take note of how Richard MacManus created an online archive of his writing work!

I’m sure it took a tremendous amount of work given his long history of writing, but he’s now got a great archive as well as a nearly complete online portfolio of his work. If you haven’t done this or have just started out, here are some potentially useful resources to guide your thoughts.

I’m curious how others are doing this type of online archive. Feel free to share your methods.

Syndicated copies to:

Dodging the Memory Hole 2017 Conference at the Internet Archive November 15-16, 2017

RSVPed Interested in Attending https://www.rjionline.org/events/dodging-the-memory-hole-2017
Please join us at Dodging the Memory Hole 2017: Saving Online News on Nov. 15-16 at the Internet Archive headquarters in San Francisco. Speakers, panelists and attendees will explore solutions to the most urgent threat to cultural memory today — the loss of online news content. The forum will focus on progress made in and successful models of long-term preservation of born-digital news content. Journalistic content published on websites and through social media channels is ephemeral and easily lost in a tsunami of digital content. Join professional journalists, librarians, archivists, technologists and entrepreneurs in addressing the urgent need to save the first rough draft of history in digital form. The two-day forum — funded by the Donald W. Reynolds Journalism Institute and an Institute of Museum and Library Services grant awarded to the Journalism Digital News Archive, UCLA Library and the Educopia Institute — will feature thought leaders, stakeholders and digital preservation practitioners who are passionate about preserving born-digital news. Sessions will include speakers, multi-member panels, lightning round speakers and poster presenters examining existing initiatives and novel practices for protecting and preserving online journalism.

I attended this conference at UCLA in Fall 2016; it was fantastic! I highly recommend it to journalists, coders, Indieweb enthusiasts, publishers, and others interested in the related topics covered.

Syndicated copies to:

App.net archive

Bookmarked App.net archive by Manton Reece (manton.org)
Linkrot and the lack of permanence on the web is a recurring theme for this blog. In the final days as App.net was winding down, I wanted to put my money where my mouth was. I spun up a couple new servers and wrote a set of scripts to essentially download every post on App.net. It feels like a fragile archive, put together hastily, but I believe it’s mostly complete. I’ve also downloaded thumbnail versions of some of the public photos hosted on App.net.

Interesting to see that Manton Reece created an impromptu archive of all of App.net before it shut down.​​

Syndicated copies to:

This has to be the most awesome Indieweb pull request I’ve seen this year.

This has to be the most awesome Indieweb pull request I've seen this year.

WithKnown is a fantastic, free, and opensource content management service that supports some of the most bleeding edge technology on the internet. I’ve been playing with it for over two years and love it!

And today, there’s another reason to love it even more…

This is also a great reminder that developers can have a lasting and useful impact on the world around them–even in the political arena.

Syndicated copies to:

GitHub have published some guidance on persistence and archiving of repositories for academics #openscience

Reposted GitHub have published some guidance on persistence and archiving of repositories for academics #openscience by Arfon SmithArfon Smith (Twitter)
GitHub have published some guidance on persistence and archiving of repositories for academics https://help.github.com/articles/about-archiving-content-and-data-on-github/ #openscience

The crowd from Dodging the Memory Hole are sure to find this interesting!

Syndicated copies to:

#DTMH2016: Saving Online News | NPR RAD recap

Liked #DTMH2016: Saving Online News (RAD recap) (with images, tweets) by NPR Research, Archives, & Data StrategyNPR Research, Archives, & Data Strategy (Storify)
Dodging The Memory Hole is an action-oriented conference and event series that brings together journalists, technologists, and information specialists to strategize solutions for organizing and preserving born-digital news.
Syndicated copies to:

Web Science and Digital Libraries Research Group: 2016-10-13: Dodging The Memory Hole 2016 Trip Report (#dtmh2016)

Liked 2016-10-13: Dodging The Memory Hole 2016 Trip Report (#dtmh2016) by John BerlinJohn Berlin (Web Science and Digital Libraries Research Group: ws-dl.blogspot.com)
A summary/recap of the Dodging the Memory Hole 2016 conference held at UCLA's Charles Young Research Library in Los Angeles, California over two days in October to discuss and highlight potential solutions to the issue of preserving born-digital news.
Syndicated copies to:

Photo Gallery from Dodging the Memory Hole 2016

Images from a conference at UCLA concerned with saving born digital news

Details for the conference can be found at Dodging the Memory Hole 2016.

The Journalism Digital News Archive has posted a nice bunch of photos as well.

My previous posts and notes about the conference:

Syndicated copies to:

🔖 Want to read: Personal Archiving: Preserving Our Digital Heritage by Donald T. Hawkins

🔖 Want to read: Personal Archiving: Preserving Our Digital Heritage by Donald T. Hawkins

H/T to Sawyer Hollenshead.

This may also be of interest to those who’ve attended Dodging the Digital Memory Hole related events as well as those in the IndieWeb who may be concerned about their data living beyond them.

Personal Archiving: Preserving Our Digital Heritage by Donald T. Hawkins
Personal Archiving: Preserving Our Digital Heritage
by Donald T. Hawkins
Syndicated copies to:

Notes from Day 2 of Dodging the Memory Hole: Saving Online News | Friday, October 14, 2016

Some quick thoughts and an archive of the audio and my Twitter notes during the day

If you missed the notes from Day 1, see this post.

It may take me a week or so to finish putting some general thoughts and additional resources together based on the two day conference so that I might give a more thorough accounting of my opinions as well as next steps. Until then, I hope that the details and mini-archive of content below may help others who attended, or provide a resource for those who couldn’t make the conference.

Overall, it was an incredibly well programmed and run conference, so kudos to all those involved who kept things moving along. I’m now certainly much more aware at the gaping memory hole the internet is facing despite the heroic efforts of a small handful of people and institutions attempting to improve the situation. I’ll try to go into more detail later about a handful of specific topics and next steps as well as a listing of resources I came across which may provide to be useful tools for both those in the archiving/preserving and IndieWeb communities.

Archive of materials for Day 2

Audio Files

Below are the recorded audio files embedded in .m4a format (using a Livescribe Pulse Pen) for several sessions held throughout the day. To my knowledge, none of the breakout sessions were recorded except for the one which appears below.

Summarizing archival collections using storytelling techniques


Presentation: Summarizing archival collections using storytelling techniques by Michael Nelson, Ph.D., Old Dominion University

Saving the first draft of history


Special guest speaker: Saving the first draft of history: The unlikely rescue of the AP’s Vietnam War files by Peter Arnett, winner of the Pulitzer Prize for journalism
Peter Arnett talking about news reporting in Vietnam in  60s.

Kiss your app goodbye: the fragility of data journalism


Panel: Kiss your app goodbye: the fragility of data journalism
Featuring Meredith Broussard, New York University; Regina Lee Roberts, Stanford University; Ben Welsh, The Los Angeles Times; moderator Martin Klein, Ph.D., Los Alamos National Laboratory

The future of the past: modernizing The New York Times archive


Panel: The future of the past: modernizing The New York Times archive
Featuring The New York Times Technology Team: Evan Sandhaus, Jane Cotler and Sophia Van Valkenburg; moderated by Edward McCain, RJI and MU Libraries

Lightning Rounds: Six Presenters



Lightning rounds (in two parts)
Six + one presenters: Jefferson Bailey, Terry Britt, Katherine Boss (and team), Cynthia Joyce, Mark Graham, Jennifer Younger and Kalev Leetaru
1: Jefferson Bailey, Internet Archive, “Supporting Data-Driven Research using News-Related Web Archives” 2: Terry Britt, University of Missouri, “News archives as cornerstones of collective memory” 3: Katherine Boss, Meredith Broussard and Eva Revear, New York University: “Challenges facing preservation of born-digital news applications” 4: Cynthia Joyce, University of Mississippi, “Keyword ‘Katrina’: Re-collecting the unsearchable past” 5: Mark Graham, Internet Archive/The Wayback Machine, “Archiving news at the Internet Archive” 6: Jennifer Younger, Catholic Research Resources Alliance: “Digital Preservation, Aggregated, Collaborative, Catholic” 7. Kalev Leetaru, senior fellow, The George Washington University and founder of the GDELT Project: A Look Inside The World’s Largest Initiative To Understand And Archive The World’s News

Technology and Community


Presentation: Technology and community: Why we need partners, collaborators, and friends by Kate Zwaard, Library of Congress

Breakout: Working with CMS


Working with CMS, led by Eric Weig, University of Kentucky

Alignment and reciprocity


Alignment & reciprocity by Katherine Skinner, Ph.D., executive director, the Educopia Institute

Closing remarks


Closing remarks by Edward McCain, RJI and MU Libraries and Todd Grappone, associate university librarian, UCLA

Live Tweet Archive

Reminder: In many cases my tweets don’t reflect direct quotes of the attributed speaker, but are often slightly modified for clarity and length for posting to Twitter. I have made a reasonable attempt in all cases to capture the overall sentiment of individual statements while using as many original words of the participant as possible. Typically, for speed, there wasn’t much editing of these notes. Below I’ve changed the attribution of one or two tweets to reflect the proper person(s). Fore convenience, I’ve also added a few hyperlinks to useful resources after the fact that didn’t have time to make the original tweets. I’ve attached .m4a audio files of most of the audio for the day (apologies for shaky quality as it’s unedited) which can be used for more direct attribution if desired. The Reynolds Journalism Institute videotaped the entire day and livestreamed it. Presumably they will release the video on their website for a more immersive experience.

Peter Arnett:

Condoms were required issue in Vietnam–we used them to waterproof film containers in the field.

Do not stay close to the head of a column, medics, or radiomen. #warreportingadvice

I told the AP I would undertake the task of destroying all the reporters’ files from the war.

Instead the AP files moved around with me.

Eventually the 10 trunks of material went back to the AP when they hired a brilliant archivist.

“The negatives can outweigh the positives when you’re in trouble.”

Edward McCain:

Our first panel:Kiss your app goodbye: the fragility of data jornalism

Meredith Broussard:

I teach data journalism at NYU

A news app is not what you’d install on your phone

Dollars for Docs is a good example of a news app

A news app is something that allows the user to put themself into the story.

Often there are three CMSs: web, print, and video.

News apps don’t live in any of the CMSs. They’re bespoke and live on a separate data server.

This has implications for crawlers which can’t handle them well.

Then how do we save news apps? We’re looking at examples and then generalizing.

Everyblock.com was a good example based on chicagocrime and later bought by NBC and shut down.

What?! The internet isn’t forever? Databases need to be save differently than web pages.

Reprozip was developed by NYU Center for Data and we’re using it to save the code, data, and environment.

Ben Welsh:

My slides will be at http://bit.ly/frameworkfix. I work on the data desk @LATimes

We make apps that serve our audience.

We also make internal tools that empower the newsroom.

We also use our nerdy skills to do cool things.

Most of us aren’t good programmers, we “cheat” by using frameworks.

Frameworks do a lot of basic things for you, so you don’t have to know how to do it yourself.

Archiving tools often aren’t built into these frameworks.

Instagram, Pinterest, Mozilla, and the LA Times use django as our framework.

Memento for WordPress is a great way to archive pages.

We must do more. We need archiving baked into the systems from the start.

Slides at http://bit.ly/frameworkfix

Regina Roberts:

Got data? I’m a librarian at Stanford University.

I’ll mention Christine Borgman’s book Big Data, Little Data, No data.

Journalists are great data liberators: FOIA requests, cleaning data, visualizing, getting stories out of data.

But what happens to the data once the story is published?

BLDR: Big Local Digital Repository, an open repository for sharing open data.

Solutions that exist: Hydra at http://projecthydra.org or Open ICPSR www.openicpsr.org

For metadata: www.ddialliance.org, RDF, International Image Interoperability Framework (iiif) and MODS

Martin Klein:

We’ll open up for questions.

Audience Question:

What’s more important: obey copyright laws or preserving the content?

Regina Roberts:

The new creative commons licenses are very helpful, but we have to be attentive to many issues.

Perhaps archiving it and embargoing for later?

Ben Welsh:

Saving the published work is more important to me, and the rest of the byproduct is gravy.

Evan Sandhaus:

I work for the New York Times, you may have heard of it…

Doing a quick demo of Times Machine from @NYTimes

Sophia van Valkenburg:

Talking about modernizing the born-digital legacy content.

Our problem was how to make an article from 2004 look like it had been published today.

There were 100’s of thousands of articles missing.

There was no one definitive list of missing articles.

Outlining the workflow for reconciling the archive XML and the definitive list of URLs for conversion.

It’s important to use more than one source for building an archive.

Jane Cotler:

I’m going to talk about all of “the little things” that came up along the way..

Article Matching: Fusion – How to convert print XML with web HTML that was scraped.

Primarily, we looked at common phrases between the corpus of the two different data sets.

We prioritized the print data over the digital data.

We maintain a system called switchboard that redirects from old URLs to the new ones to prevent link rot.

The case of the missing sections: some sections of the content were blank and not transcribed.

We made the decision of taking out data we had in lieu of making a better user experience for missing sections.

In the future, we’d also like to put photos back into the articles.

Evan Sandhaus:

Modernizing and archiving the @NYTimes archives is an ongoing challenge.

Edward McCain:

Can you discuss the decision to go with a more modern interface rather than a traditional archive of how it looked?

Evan Sandhaus:

Some of the decision was to get the data into an accessible format for modern users.

We do need to continue work on preserving the original experience.

Edward McCain:

Is there a way to distinguish between the print version and the online versions in the archive?

Audience Question:

Could a researcher do work on the entire corpora? Is it available for subscription?

Edward McCain:

We do have a sub-section of data availalbe, but don’t have it prior to 1960.

Audience Question:

Have you documented the process you’ve used on this preservation project?

Sophia van Valkenburg:

We did save all of the code for the project within GitHub.

Jane Cotler:

We do have meeting notes which provide some documentation, though they’re not thorough.

ChrisAldrich:

Oh dear. Of roughly 1,155 tweets I counted about #DtMH2016 in the last week, roughly 25% came from me. #noisy

Opensource tool I had mentioned to several: @wallabagapp A self-hostable application for saving web pages https://www.wallabag.org

Syndicated copies to: