🔖 Want to read: Personal Archiving: Preserving Our Digital Heritage by Donald T. Hawkins

🔖 Want to read: Personal Archiving: Preserving Our Digital Heritage by Donald T. Hawkins

H/T to Sawyer Hollenshead.

This may also be of interest to those who’ve attended Dodging the Digital Memory Hole related events as well as those in the IndieWeb who may be concerned about their data living beyond them.

Personal Archiving: Preserving Our Digital Heritage by Donald T. Hawkins
Personal Archiving: Preserving Our Digital Heritage
by Donald T. Hawkins

Notes, Highlights, and Marginalia: From E-books to Online

For several years now, I’ve been meaning to do something more interesting with the notes, highlights, and marginalia from the various books I read. In particular, I’ve specifically been meaning to do it for the non-fiction I read for research, and even more so for e-books, which tend to have slightly more extract-able notes given their electronic nature. This fits in to the way in which I use this site as a commonplace book as well as the IndieWeb philosophy to own all of one’s own data.[1]

Over the past month or so, I’ve been experimenting with some fiction to see what works and what doesn’t in terms of a workflow for status updates around reading books, writing book reviews, and then extracting and depositing notes, highlights, and marginalia online. I’ve now got a relatively quick and painless workflow for exporting the book related data from my Amazon Kindle and importing it into the site with some modest markup and CSS for display. I’m sure the workflow will continue to evolve (and further automate) somewhat over the coming months, but I’m reasonably happy with where things stand.

The fact that the Amazon Kindle allows for relatively easy highlighting and annotation in e-books is excellent, but having the ability to sync to a laptop and do a one click export of all of that data, is incredibly helpful. Adding some simple CSS to the pre-formatted output gives me a reasonable base upon which to build for future writing/thinking about the material. In experimenting, I’m also coming to realize that simply owning the data isn’t enough, but now I’m driven to help make that data more directly useful to me and potentially to others.

As part of my experimenting, I’ve just uploaded some notes, highlights, and annotations for David Christian’s excellent text Maps of Time: An Introduction to Big History[2] which I read back in 2011/12. While I’ve read several of the references which I marked up in that text, I’ll have to continue evolving a workflow for doing all the related follow up (and further thinking and writing) on the reading I’ve done in the past.

I’m still reminded me of Rick Kurtzman’s sage advice to me when I was a young pisher at CAA in 1999: “If you read a script and don’t tell anyone about it, you shouldn’t have wasted the time having read it in the first place.” His point was that if you don’t try to pass along the knowledge you found by reading, you may as well give up. Even if the thing was terrible, at least say that as a minimum. In a digitally connected era, we no longer need to rely on nearly illegible scrawl in the margins to pollinate the world at a snail’s pace.[4] Take those notes, marginalia, highlights, and meta data and release it into the world. The fact that this dovetails perfectly with Cesar Hidalgo’s thesis in Why Information Grows: The Evolution of Order, from Atoms to Economies,[3] furthers my belief in having a better process for what I’m attempting here.

Hopefully in the coming months, I’ll be able to add similar data to several other books I’ve read and reviewed here on the site.

If anyone has any thoughts, tips, tricks for creating/automating this type of workflow/presentation, I’d love to hear them in the comments!

Footnotes

[1]
“Own your data,” IndieWeb. [Online]. Available: http://indieweb.org/own_your_data. [Accessed: 24-Oct-2016]
[2]
D. Christian and W. McNeill H., Maps of Time: An Introduction to Big History, 2nd ed. University of California Press, 2011.
[3]
C. Hidalgo, Why Information Grows: The Evolution of Order, from Atoms to Economies, 1st ed. Basic Books, 2015.
[4]
O. Gingerich, The Book Nobody Read: Chasing the Revolutions of Nicolaus Copernicus. Bloomsbury Publishing USA, 2004.

📖 On Page 49 of 448 of Dealing with China by Henry M. Paulson, Jr.

📖 On Page 49 of 448 of Dealing with China by Henry M. Paulson, Jr.

Former head of Goldman Sachs and U.S. Treasury Secretary Henry M. Paulson , Jr. and the cover of his 2015 book Dealing with China
Former head of Goldman Sachs and U.S. Treasury Secretary Henry M. Paulson , Jr. and the cover of his 2015 book Dealing with China

📖 On page 24 of 274 of Complex Analysis with Applications by Richard A. Silverman

📖 On page 24 of 274 of Complex Analysis with Applications by Richard A. Silverman

I enjoyed his treatment of inversion, but it seems like there’s a better way of laying the idea out, particularly for applications. Straightforward coverage of nested intervals and rectangles, limit points, convergent sequences, Cauchy convergence criterion. Given the level, I would have preferred some additional review of basic analysis and topology; he seems to do the bare minimum here.

A Case for Why Disqus Should Implement Webmentions

Internet-wide @Mentions

There is a relatively new candidate recommendation from the W3C for a game changing social web specification called Webmention which essentially makes it possible to do Twitter-like @mentions (or Medium-style) across the internet from site to site (as opposed to simply within a siloed site/walled garden like Twitter).

Webmentions would allow me to write a comment to someone else’s post on my own Tumblr site, for example, and then with a URL of the site I’m replying to in my post which serves as the @mention, the other site (which could be on WordPress, Drupal, Tumblr, or anything really) which also supports Webmentions could receive my comment and display it in their comment section.

Given the tremendous number of sites (and multi-platform sites) on which Disqus operates, it would be an excellent candidate to support the Webmention spec to allow a huge amount of inter-site activity on the internet. First it could include the snippet of code for allowing the site on which a comment is originally written to send Webmentions and secondly, it could allow for the snippet of code which allows for receiving Webmentions. The current Disqus infrastructure could also serve to reduce spam and display those comments in a pretty way. Naturally Disqus could continue to serve the same social functionality it has in the past.

Aggregating the conversation across the Internet into one place

Making things even more useful, there’s currently a third party free service called Brid.gy which uses open APIs of Twitter, Facebook, Instagram, Google+, and Flickr to bootstrap them to send these Webmentions or inter-site @mentions. What does this mean? After signing up at Bridgy, it means I could potentially create a post on my Disqus-enabled Tumblr (WordPress, or other powered site), share that post with its URL to Facebook, and any comments or likes made on the Facebook post will be sent as Webmentions to the comments section on my Tumblr site as if they’d been made there natively. (Disqus could add the metadata to indicate the permalink and location of where the comment originated.) This means I can receive comments on my blog/site from Twitter, Facebook, Instagram, G+, etc. without a huge amount of overhead, and even better, instead of being spread out in multiple different places, the conversation around my original piece of content could be conglomerated with the original!

Comments could be displayed inline naturally, and likes could be implemented as UI facepile either above or below the typical comment section. By enabling the sending/receiving of Webmentions, Disqus could further corner the market on comments. Even easier for Disqus, a lot of the code has already been written and is open source .

Web 3.0?

I believe that Webmention, when implemented, is going to cause a major sea-change in the way people use the web. Dare I say Web3.0?!

The neighborhood coyote is nothing if not punctual

The neighborhood coyote is nothing if not punctual
The neighborhood coyote is nothing if not punctual
Every day for the past several days, our local coyote has come sauntering down the street at about the same time.

Instagram filter used: Gingham

Photo taken at: Little Free Library

📖 On page 16 of Dealing with China by Henry M. Paulson, Jr.

📖 On page 16 of 448 of Dealing with China by Henry M. Paulson, Jr.

A simple preface followed by an anecdote about the beginning of a deal relating to telecom. The style is quick moving and history, details, and philosophy are liberally injected into the story as it moves along. This seems both interesting as well as instructive.

Highlights, Quotes, & Marginalia

“There are some who believe that an immutable law of history holds that conflict is inevitable when a rising power begins to bump up against an established one. But no law is immutable. Choices matter. Lessons can be learned.”

—page XIV

“Prescriptions, after all, are easier to make than predictions.”

—page XIV

“Note taking allows Party and government officials to get quick reads on what went on at meetings they didn’t attend. […] Private meetings with senior government officials without recoring devices or note takers are rare and highly sought after.”

—page 10

“…the so-called iron rice bowl, the cradle-to-grave care and support guaranteed by the government through the big companies people worked for.”

—page 11

“The Party had made a simple bargain with the people: economic growth in return for political stability. That in turn meant Party control. Prosperity was the source of Party legitimacy.”

—page 11

“Messages in China are sent in ways that aren’t always direct; you have to read the signs.”

—page 14

“It was the nature of dealing with China: nothing was done until it was done.”

—page 14

📗 Started reading Dealing with China by Henry M. Paulson, Jr.

📗 Started reading Dealing with China by Henry M. Paulson, Jr.

Former head of Goldman Sachs and U.S. Treasury Secretary Henry M. Paulson , Jr. and the cover of his 2015 book Dealing with China
Former head of Goldman Sachs and U.S. Treasury Secretary Henry M. Paulson , Jr. and the cover of his 2015 book Dealing with China

🔖 Want to read Dealing with China: An Insider Unmasks the New Economic Superpower by Henry M. Paulson, Jr.

🔖 Want to read Dealing with China: An Insider Unmasks the New Economic Superpower by Henry M. Paulson, Jr.

Picked up a copy at Little Free Library #21797 at 8:29 am
ISBN: 978-1-4555-0421-3 First Edition Hardcover

Former head of Goldman Sachs and U.S. Treasury Secretary Henry M. Paulson , Jr. and the cover of his 2015 book Dealing with China
Former head of Goldman Sachs and U.S. Treasury Secretary Henry M. Paulson , Jr. and the cover of his 2015 book Dealing with China

How many social media related accounts can one person have on the web?!

Over the years I almost feel like I’ve tried to max out the number of web services I could sign up for. I was always on the look out for that new killer app or social service, so I’ve tried almost all of them at one point or another. That I can remember, I’ve had at least 179, and likely there are very many more that I’m simply forgetting. Research indicates it is difficult enough to keep track of 150 people, much less that many people through that many websites.

As an exercise, I’ve made an attempt to list all of the social media and user accounts I’ve had on the web since the early/mid-2000s. They’re listed below at the bottom of this post and broken up somewhat by usage area and subject for ease of use. I’ll maintain an official list of them here.

This partial list may give many others the opportunity to see how fragmented their own identities can be on the web. Who are you and to which communities because you live in multiple different places? I feel the list also shows the immense value inherent in the IndieWeb philosophy to own one’s own domain and data. The value of the IndieWeb is even more apparent when I think of all the defunct, abandoned, shut down, or bought out web services I’ve used which I’ve done my best to list at the bottom.

When I think of all the hours of content that I and others have created and shared on some of these defunct sites for which we’ll never recover the data, I almost want to sob. Instead, I’ve promised only to cry, “Never again!” People interested in more of the vast volumes of data lost are invited to look at this list of site-deaths, which is itself is far from comprehensive.

No more digital sharecropping

Over time, I’ll make an attempt, where possible, to own the data from each of the services listed below and port it here to my own domain. More importantly, I refuse to do any more digital sharecropping. I’m not creating new posts, status updates, photos, or other content that doesn’t live on my own site first. Sure I’ll take advantage of the network effects of popular services like Twitter, Facebook, and Instagram to engage my family, friends, and community who choose to live in those places, but it will only happen by syndicating data that I already own to those services after-the-fact.

What about the interactive parts? The comments and interactions on those social services?

Through the magic of new web standards like WebMention, essentially an internet wide @mention functionality similar to that on Twitter, Medium, and even Facebook, and a fantastic service called brid.gy, all the likes and comments from Twitter, Facebook, Google+, Instagram, and others, I get direct notifications of the comments on my syndicated material which comes back directly to my own website as comments on the original posts. Those with websites that support WebMention natively can write their comments to my posts directly on their own site and rely on it to automatically notify me of their response.

Isn’t this beginning to sound to you like the way the internet should work?

One URL to rule them all

When I think back on setting up these hundreds of digital services, I nearly wince at all the time and effort I’ve spent inputting my name, my photo, or even just including URL links to my Facebook and Twitter accounts.

Now I have one and only one URL that I can care about and pay attention to: my own!

Join me for IndieWebCamp Los Angeles

I’ve written in bits about my involvement with the IndieWeb in the past, but I’ve actually had incoming calls over the past several weeks from people interested in setting up their own websites. Many have asked: what is it exactly? how can they do something similar? is it hard?

My answer is that it isn’t nearly as hard as you might have thought. If you can manage to sign up and maintain your Facebook account, you can put together all the moving parts to have your own IndieWeb enabled website.

“But, Chris, I’m still a little hesitant…”

Okay, how about I (and many others) offer to help you out? I’m going to be hosting IndieWebCamp Los Angeles over the weekend of November 5th and 6th in Santa Monica. I’m inviting you all to attend with the hope that by the time the weekend is over, you’ll have not only a good significant start, but you’ll have the tools, resources, and confidence to continue building in improvements over time.

IndieWebCamp Los Angeles

<

div class=”p-location h-card”>Pivotal
1333 2nd Street,
Suite 200
Santa Monica, CA,
90401
United States

When
  • Saturday:
  • Sunday:
R.S.V.P.

We’ve set up a variety of places for people to easily R.S.V.P. for the two-day event, choose the one that’s convenient for you:
* Eventbrite: https://www.eventbrite.com/e/indiewebcamp-la-2016-tickets-24335345674
* Lanyrd: http://lanyrd.com/2016/indiewebcamp-la
* Facebook: https://www.facebook.com/events/1701240643421269
* Meetup: https://www.meetup.com/IndieWeb-Homebrew-Website-Club-Los-Angeles/events/233698594/
If you’ve already got an IndieWeb enabled website and are able to R.S.V.P. by using your own site, try one of the following two R.S.V.P. locations:
* Indie Event: http://veganstraightedge.com/events/2016/04/01/indiewebcamp-la-2016
* IndieWeb Wiki: https://indieweb.org/2016/LA/Guest_List

I hope to see you there!

 

Now for that unwieldly list of sites I’ve spent untold hours setting up and maintaining…

Editor’s note:
A regularly updated version of this list is maintained here.

Primary Internet Presences

Chris Aldrich | BoffoSocko

Chris Aldrich Social Stream

Content from the above two sites is syndicated primarily, but not exclusively, or evenly to the following silo-based profiles

Facebook
Twitter
Google+
Tumblr
LinkedIn
Medium
GoodReads
Foursquare
YouTube
Reddit
Flickr
WordPress.com

Contributor to

WithKnown (Dormant)
IndieWeb.org (Wiki)
Little Free Library #8424 Blog
Mendeley ITBio References
Chris Aldrich Radio3 (Link Blog)
Category Theory Summer Study Group
JHU AEME
Johns Hopkins Twitter Feed (Previous)
JHU Facebook Fan Page (Previous)

 Identity

Gravatar
Keybase
About.Me
DandyID
Vizify

Other Social Profiles

Yelp
Findery
Periscope
Pinterest
Storify
MeetUp
500px
Skitch
KickStarter
Patreon
TwitPic
StumbleUpon
del.icio.us
MySpace
Klout

Academia / Research Related

Mendeley
Academia.edu
Research Gate
IEEE Information Theory Society (ITSOC)
Quora
ORCID
Hypothes.is
Genius (fka Rap Genius, aka News Genius, etc)
Diigo
FigShare – Research Data
Zotero
Worldcat
OdySci – Engineering Research
CiteULike
Open Study
StackExchange
Math-Stackexchange
MathOverflow
TeX-StackExchange
Theoretical Physics-StackExchange
Linguistics-StackExchange
Digital Signal Processing-StackExchange
Cooking-StackExchange
Physics Forums
Sciencescape

MOOC Related

Coursera
Khan Academy
Degreed

Reading Related

GoodReads
Pocket
Flipboard
Book Crossing
Digg
Readlist
MobileRead
Read Fold
ReadingPack
SlideShare
Wordnik
Milq
Disqus (Comments)
Intense Debate (Comments)
Wattpad
BookVibe
Reading.am (Bookmarking)
Amazon Profile
Wishlist: Evolutionary Theory
Wishlist: Information Theory
Wishlist: Mathematics
Camp NaNoWriMo
NaNoWriMo

Programming Related

GitHub
BitBucket
GitLab – URL doesn’t resolve to account
Free Code Camp
Code School
Codepen

Audio / Video

Huffduffer
Last.fm
Spotify
Pandora (Radio)
Soundcloud
Vimeo
Rdio
IMDb
Telfie (TV Checkin)
Soundtracking
Hulu
UStream
Livestream
MixCloud
Spreaker
Audioboo (Audio)
Bambuser (Video)
Orfium
The Session (Irish Music)

Food / Travel / Meetings

Nosh
FoodSpotting
Tripit (Travel)
Lanyard (Conference)
Conferize (Conference)

Miscellaneous

RebelMouse (unused)
Peach (app only)
Kinja (commenting system/pseudo-blog)
Mnemotechniques (Memory Forum)
WordPress.org
Ask.fm
AppBrain Android Phone Apps
BlogCatalog
MySpace (Old School)
Identi.ca (Status)
Plurk (Status)
TinyLetter
Plaxo
YCombinator
Tsu
NewGov.US
Venmo
Quitter.se (Status)
Quitter.no (Status)
ColoUrLovers
Beeminder

Defunct Social Sites

Picasa (Redirects to G+)
Eat.ly (Food Blog)
Google Sidewiki (Annotation)
Wakoopa (Software usage)
Seesmic (Video, Status)
Jaiku (Status)
Friendster (Social Media)
Flipzu
Mixx
<a href=”http://getglue.com/chrisaldrich” target=”_blank rel=”" noopener noreferrer”>GetGlue (Video checkin)
FootFeed (Location)
Google Reader (Reader)
CinchCast (Audio)
Backtype (Commenting)
Tungle.me (Calendar)
Chime.In (Status)
MyBigCampus (College related)
Pownce (Status) – closed 02/09
Cliqset (Status) –  closed 11/22/10
Brightkite (Location/Status) – closed 12/10/10
Buzz (Status) – closed 12/15/11
Gowalla (Location) – closed 3/11/12
Picplz (Photo)- closed 9/2/12
Posterous (Blog) – closed 4/30/13 [all content from this site has been recovered and ported]
Upcoming (Calendar) – closed 4/30/13
ClaimID (Identity) – closed 12/12/13
Qik (Video) – closed 4/30/14
Readmill (Reading)- closed 7/1/14
Orkut (Status) – closed 9/1/14
Plinky – closed 9/1/14
FriendFeed (Social Networking)- closed 4/10/15
Plancast (Calendar) – closed 1/21/16
Symantec Personal Identity Program (Identity) – closing 9/11/16
Shelfari (Reading) – closed 3/16/16

How many social media identities do YOU have?

Reading Katumuwa

Watched Reading Katumuwa from YouTube
Video published on Apr 18, 2014.

This video by Travis Saul features a digital rendering of the Stele of Katumuwa. The ancient stele was discovered by University of Chicago archaeologists at Zincirli, Turkey in 2008. The inscription on the stele, written in a local dialect of Aramaic, is dated to around 735 BC. In word and image, Katumuwa asks his descendants to remember and honor him in his mortuary chapel at an annual sacrificial feast for his soul, which inhabited not his bodily remains, but the stone itself.

This reading of the Aramaic inscription and its English translation is kindly provided by Dennis Pardee, Henry Crown Professor of Hebrew Studies, Department of Near Eastern Languages and Civilizations at the University of Chicago. For more detailed information about the inscription, read his chapter featured in this Oriental Institute Museum Publication:

Pardee, Dennis. “The Katumuwa Inscription” in, In Remembrance of Me: Feasting with the Dead in the Ancient Middle East, edited by V.R. Hermann and J.D. Schloen, pp.45-48. Oriental Institute Museum Publication 37. 2014. Chicago: The Oriental Institute.

http://oi.uchicago.edu/research/pubs/catalog/oimp/oimp37.html

The reading is also featured in the video “Remembering Katumuwa” featured in the Special Exhibit “In Remembrance of Me: Feasting with the Dead in the Ancient Middle East” at the Oriental Institute Museum, University of Chicago, April 8 2014–January 4 2015.
https://oi.uchicago.edu/museum/special/remembrance/

In Remembrance of Me: Feasting with the Dead in the Ancient Middle East.

Virginia Rimmer Herrmann and J. David Schloen, eds., In Remembrance of Me: Feasting with the Dead in the Ancient Middle East. Chicago: The Oriental Institute of the University of Chicago, 2014.
Download the free e-book: http://oi.uchicago.edu/sites/oi.uchicago.edu/files/uploads/shared/docs/oimp37.pdf

Remembering Katumuwa

Additional context for the stele

h/t to my friend Dave Harris for sending this along to me.

Notes from Day 2 of Dodging the Memory Hole: Saving Online News | Friday, October 14, 2016

If you missed the notes from Day 1, see this post.

It may take me a week or so to finish putting some general thoughts and additional resources together based on the two day conference so that I might give a more thorough accounting of my opinions as well as next steps. Until then, I hope that the details and mini-archive of content below may help others who attended, or provide a resource for those who couldn’t make the conference.

Overall, it was an incredibly well programmed and run conference, so kudos to all those involved who kept things moving along. I’m now certainly much more aware at the gaping memory hole the internet is facing despite the heroic efforts of a small handful of people and institutions attempting to improve the situation. I’ll try to go into more detail later about a handful of specific topics and next steps as well as a listing of resources I came across which may provide to be useful tools for both those in the archiving/preserving and IndieWeb communities.

Archive of materials for Day 2

Audio Files

Below are the recorded audio files embedded in .m4a format (using a Livescribe Pulse Pen) for several sessions held throughout the day. To my knowledge, none of the breakout sessions were recorded except for the one which appears below.

Summarizing archival collections using storytelling techniques


Presentation: Summarizing archival collections using storytelling techniques by Michael Nelson, Ph.D., Old Dominion University

Saving the first draft of history


Special guest speaker: Saving the first draft of history: The unlikely rescue of the AP’s Vietnam War files by Peter Arnett, winner of the Pulitzer Prize for journalism
Peter Arnett talking about news reporting in Vietnam in  60s.

Kiss your app goodbye: the fragility of data journalism


Panel: Kiss your app goodbye: the fragility of data journalism
Featuring Meredith Broussard, New York University; Regina Lee Roberts, Stanford University; Ben Welsh, The Los Angeles Times; moderator Martin Klein, Ph.D., Los Alamos National Laboratory

The future of the past: modernizing The New York Times archive


Panel: The future of the past: modernizing The New York Times archive
Featuring The New York Times Technology Team: Evan Sandhaus, Jane Cotler and Sophia Van Valkenburg; moderated by Edward McCain, RJI and MU Libraries

Lightning Rounds: Six Presenters



Lightning rounds (in two parts)
Six + one presenters: Jefferson Bailey, Terry Britt, Katherine Boss (and team), Cynthia Joyce, Mark Graham, Jennifer Younger and Kalev Leetaru
1: Jefferson Bailey, Internet Archive, “Supporting Data-Driven Research using News-Related Web Archives” 2: Terry Britt, University of Missouri, “News archives as cornerstones of collective memory” 3: Katherine Boss, Meredith Broussard and Eva Revear, New York University: “Challenges facing preservation of born-digital news applications” 4: Cynthia Joyce, University of Mississippi, “Keyword ‘Katrina’: Re-collecting the unsearchable past” 5: Mark Graham, Internet Archive/The Wayback Machine, “Archiving news at the Internet Archive” 6: Jennifer Younger, Catholic Research Resources Alliance: “Digital Preservation, Aggregated, Collaborative, Catholic” 7. Kalev Leetaru, senior fellow, The George Washington University and founder of the GDELT Project: A Look Inside The World’s Largest Initiative To Understand And Archive The World’s News

Technology and Community


Presentation: Technology and community: Why we need partners, collaborators, and friends by Kate Zwaard, Library of Congress

Breakout: Working with CMS


Working with CMS, led by Eric Weig, University of Kentucky

Alignment and reciprocity


Alignment & reciprocity by Katherine Skinner, Ph.D., executive director, the Educopia Institute

Closing remarks


Closing remarks by Edward McCain, RJI and MU Libraries and Todd Grappone, associate university librarian, UCLA

Live Tweet Archive

Reminder: In many cases my tweets don’t reflect direct quotes of the attributed speaker, but are often slightly modified for clarity and length for posting to Twitter. I have made a reasonable attempt in all cases to capture the overall sentiment of individual statements while using as many original words of the participant as possible. Typically, for speed, there wasn’t much editing of these notes. Below I’ve changed the attribution of one or two tweets to reflect the proper person(s). Fore convenience, I’ve also added a few hyperlinks to useful resources after the fact that didn’t have time to make the original tweets. I’ve attached .m4a audio files of most of the audio for the day (apologies for shaky quality as it’s unedited) which can be used for more direct attribution if desired. The Reynolds Journalism Institute videotaped the entire day and livestreamed it. Presumably they will release the video on their website for a more immersive experience.

Peter Arnett:

Condoms were required issue in Vietnam–we used them to waterproof film containers in the field.

Do not stay close to the head of a column, medics, or radiomen. #warreportingadvice

I told the AP I would undertake the task of destroying all the reporters’ files from the war.

Instead the AP files moved around with me.

Eventually the 10 trunks of material went back to the AP when they hired a brilliant archivist.

“The negatives can outweigh the positives when you’re in trouble.”

Edward McCain:

Our first panel:Kiss your app goodbye: the fragility of data jornalism

Meredith Broussard:

I teach data journalism at NYU

A news app is not what you’d install on your phone

Dollars for Docs is a good example of a news app

A news app is something that allows the user to put themself into the story.

Often there are three CMSs: web, print, and video.

News apps don’t live in any of the CMSs. They’re bespoke and live on a separate data server.

This has implications for crawlers which can’t handle them well.

Then how do we save news apps? We’re looking at examples and then generalizing.

Everyblock.com was a good example based on chicagocrime and later bought by NBC and shut down.

What?! The internet isn’t forever? Databases need to be save differently than web pages.

Reprozip was developed by NYU Center for Data and we’re using it to save the code, data, and environment.

Ben Welsh:

My slides will be at http://bit.ly/frameworkfix. I work on the data desk @LATimes

We make apps that serve our audience.

We also make internal tools that empower the newsroom.

We also use our nerdy skills to do cool things.

Most of us aren’t good programmers, we “cheat” by using frameworks.

Frameworks do a lot of basic things for you, so you don’t have to know how to do it yourself.

Archiving tools often aren’t built into these frameworks.

Instagram, Pinterest, Mozilla, and the LA Times use django as our framework.

Memento for WordPress is a great way to archive pages.

We must do more. We need archiving baked into the systems from the start.

Slides at http://bit.ly/frameworkfix

Regina Roberts:

Got data? I’m a librarian at Stanford University.

I’ll mention Christine Borgman’s book Big Data, Little Data, No data.

Journalists are great data liberators: FOIA requests, cleaning data, visualizing, getting stories out of data.

But what happens to the data once the story is published?

BLDR: Big Local Digital Repository, an open repository for sharing open data.

Solutions that exist: Hydra at http://projecthydra.org or Open ICPSR www.openicpsr.org

For metadata: www.ddialliance.org, RDF, International Image Interoperability Framework (iiif) and MODS

Martin Klein:

We’ll open up for questions.

Audience Question:

What’s more important: obey copyright laws or preserving the content?

Regina Roberts:

The new creative commons licenses are very helpful, but we have to be attentive to many issues.

Perhaps archiving it and embargoing for later?

Ben Welsh:

Saving the published work is more important to me, and the rest of the byproduct is gravy.

Evan Sandhaus:

I work for the New York Times, you may have heard of it…

Doing a quick demo of Times Machine from @NYTimes

Sophia van Valkenburg:

Talking about modernizing the born-digital legacy content.

Our problem was how to make an article from 2004 look like it had been published today.

There were 100’s of thousands of articles missing.

There was no one definitive list of missing articles.

Outlining the workflow for reconciling the archive XML and the definitive list of URLs for conversion.

It’s important to use more than one source for building an archive.

Jane Cotler:

I’m going to talk about all of “the little things” that came up along the way..

Article Matching: Fusion – How to convert print XML with web HTML that was scraped.

Primarily, we looked at common phrases between the corpus of the two different data sets.

We prioritized the print data over the digital data.

We maintain a system called switchboard that redirects from old URLs to the new ones to prevent link rot.

The case of the missing sections: some sections of the content were blank and not transcribed.

We made the decision of taking out data we had in lieu of making a better user experience for missing sections.

In the future, we’d also like to put photos back into the articles.

Evan Sandhaus:

Modernizing and archiving the @NYTimes archives is an ongoing challenge.

Edward McCain:

Can you discuss the decision to go with a more modern interface rather than a traditional archive of how it looked?

Evan Sandhaus:

Some of the decision was to get the data into an accessible format for modern users.

We do need to continue work on preserving the original experience.

Edward McCain:

Is there a way to distinguish between the print version and the online versions in the archive?

Audience Question:

Could a researcher do work on the entire corpora? Is it available for subscription?

Edward McCain:

We do have a sub-section of data availalbe, but don’t have it prior to 1960.

Audience Question:

Have you documented the process you’ve used on this preservation project?

Sophia van Valkenburg:

We did save all of the code for the project within GitHub.

Jane Cotler:

We do have meeting notes which provide some documentation, though they’re not thorough.

ChrisAldrich:

Oh dear. Of roughly 1,155 tweets I counted about #DtMH2016 in the last week, roughly 25% came from me. #noisy

Opensource tool I had mentioned to several: @wallabagapp A self-hostable application for saving web pages https://www.wallabag.org