With platforms such as Wordpress or Tumblr, starting a blog has never been easier, but in most cases, you still have to go through a simple registration process before you can start publishing. Now, messaging platform Telegram has made it even simpler with Telegraph, the blogging platform that doesn't require any kind of registration.
This past summer, I wrote The Essential Meta Tags for Social Media about how developers can prepare web pages to optimize their appearance when shared on s
Twitter’s ceaseless search for someone to tell the social network where to go and how to get there has come to a momentary pause. The company announced today, on Twitter of course, that it has hired startup founder Keith Coleman as vice president of product. Coleman, according to his Twitter bio, is the CEO of Yes Inc., a relatively unknown Bay Area startup responsible for two social apps called Frenzy and WZD. Frenzy offers a way to make quick plans with friends, while WZD is a blend Facebook and Snapchat that lets you share what you’re doing with friends by posting photos and videos layered with emoji and text. Because Yes is joining Twitter alongside Coleman, both apps are being shut down, according to a note posted to Yes’ website. Prior to Yes, Coleman was a product lead at Google overseeing services like Gmail and its chat companion.
A look at some of the best apps, hacks and mashups available for music streaming and scrobbling service Last.fm.
Curious about alternatives Last.fm’s broken RSS feeds and what people are doing with their listening data. Some relatively interesting ideas in here, but nothing earth shattering. One or two were focused on visualization, but otherwise nothing I felt I could use.
I love that by following certain people, my timeline has become a stream of interesting and entertaining information. I love that sometimes I am able to fit my little publication just so into the 140 characters given to me.
As Facebook attempted to capture the fast-moving energy of the news cycle from Twitter, and shied away from policing political content, it created a system that played to confirmation bias and set ...
[ hypothesis user = 'chrisaldrich' tags = 'akbf112116']
Interview with Greg Leppert. Founder of Reading.am. Co-founder of Svpply. Reads a lot.
Today, we’re excited to announce that Instapaper is joining Pinterest. In the three years since betaworks acquired Instapaper from Marco Arment, we’ve completely rewritten our backend, overhauled our mobile and web clients, improved parsing and search, and introduced tons of great features like highlights, text-to-speech, and speed reading to the product.
There is a relatively new candidate recommendation from the W3C for a game changing social web specification called Webmention which essentially makes it possible to do Twitter-like @mentions (or Medium-style) across the internet from site to site (as opposed to simply within a siloed site/walled garden like Twitter).
Webmentions would allow me to write a comment to someone else’s post on my own Tumblr site, for example, and then with a URL of the site I’m replying to in my post which serves as the @mention, the other site (which could be on WordPress, Drupal, Tumblr, or anything really) which also supports Webmentions could receive my comment and display it in their comment section.
Given the tremendous number of sites (and multi-platform sites) on which Disqus operates, it would be an excellent candidate to support the Webmention spec to allow a huge amount of inter-site activity on the internet. First it could include the snippet of code for allowing the site on which a comment is originally written to send Webmentions and secondly, it could allow for the snippet of code which allows for receiving Webmentions. The current Disqus infrastructure could also serve to reduce spam and display those comments in a pretty way. Naturally Disqus could continue to serve the same social functionality it has in the past.
Aggregating the conversation across the Internet into one place
Making things even more useful, there’s currently a third party free service called Brid.gy which uses open APIs of Twitter, Facebook, Instagram, Google+, and Flickr to bootstrap them to send these Webmentions or inter-site @mentions. What does this mean? After signing up at Bridgy, it means I could potentially create a post on my Disqus-enabled Tumblr (WordPress, or other powered site), share that post with its URL to Facebook, and any comments or likes made on the Facebook post will be sent as Webmentions to the comments section on my Tumblr site as if they’d been made there natively. (Disqus could add the metadata to indicate the permalink and location of where the comment originated.) This means I can receive comments on my blog/site from Twitter, Facebook, Instagram, G+, etc. without a huge amount of overhead, and even better, instead of being spread out in multiple different places, the conversation around my original piece of content could be conglomerated with the original!
Comments could be displayed inline naturally, and likes could be implemented as UI facepile either above or below the typical comment section. By enabling the sending/receiving of Webmentions, Disqus could further corner the market on comments. Even easier for Disqus, a lot of the code has already been written and is open source .
I believe that Webmention, when implemented, is going to cause a major sea-change in the way people use the web. Dare I say Web3.0?!
Over the years I almost feel like I’ve tried to max out the number of web services I could sign up for. I was always on the look out for that new killer app or social service, so I’ve tried almost all of them at one point or another. That I can remember, I’ve had at least 179, and likely there are very many more that I’m simply forgetting. Research indicates it is difficult enough to keep track of 150 people, much less that many people through that many websites.
As an exercise, I’ve made an attempt to list all of the social media and user accounts I’ve had on the web since the early/mid-2000s. They’re listed below at the bottom of this post and broken up somewhat by usage area and subject for ease of use. I’ll maintain an official list of them here.
This partial list may give many others the opportunity to see how fragmented their own identities can be on the web. Who are you and to which communities because you live in multiple different places? I feel the list also shows the immense value inherent in the IndieWeb philosophy to own one’s own domain and data. The value of the IndieWeb is even more apparent when I think of all the defunct, abandoned, shut down, or bought out web services I’ve used which I’ve done my best to list at the bottom.
When I think of all the hours of content that I and others have created and shared on some of these defunct sites for which we’ll never recover the data, I almost want to sob. Instead, I’ve promised only to cry, “Never again!” People interested in more of the vast volumes of data lost are invited to look at this list of site-deaths, which is itself is far from comprehensive.
No more digital sharecropping
Over time, I’ll make an attempt, where possible, to own the data from each of the services listed below and port it here to my own domain. More importantly, I refuse to do any more digital sharecropping. I’m not creating new posts, status updates, photos, or other content that doesn’t live on my own site first. Sure I’ll take advantage of the network effects of popular services like Twitter, Facebook, and Instagram to engage my family, friends, and community who choose to live in those places, but it will only happen by syndicating data that I already own to those services after-the-fact.
What about the interactive parts? The comments and interactions on those social services?
Through the magic of new web standards like WebMention, essentially an internet wide @mention functionality similar to that on Twitter, Medium, and even Facebook, and a fantastic service called brid.gy, all the likes and comments from Twitter, Facebook, Google+, Instagram, and others, I get direct notifications of the comments on my syndicated material which comes back directly to my own website as comments on the original posts. Those with websites that support WebMention natively can write their comments to my posts directly on their own site and rely on it to automatically notify me of their response.
Isn’t this beginning to sound to you like the way the internet should work?
One URL to rule them all
When I think back on setting up these hundreds of digital services, I nearly wince at all the time and effort I’ve spent inputting my name, my photo, or even just including URL links to my Facebook and Twitter accounts.
Now I have one and only one URL that I can care about and pay attention to: my own!
Join me for IndieWebCamp Los Angeles
I’ve written in bits about my involvement with the IndieWeb in the past, but I’ve actually had incoming calls over the past several weeks from people interested in setting up their own websites. Many have asked: what is it exactly? how can they do something similar? is it hard?
My answer is that it isn’t nearly as hard as you might have thought. If you can manage to sign up and maintain your Facebook account, you can put together all the moving parts to have your own IndieWeb enabled website.
“But, Chris, I’m still a little hesitant…”
Okay, how about I (and many others) offer to help you out? I’m going to be hosting IndieWebCamp Los Angeles over the weekend of November 5th and 6th in Santa Monica. I’m inviting you all to attend with the hope that by the time the weekend is over, you’ll have not only a good significant start, but you’ll have the tools, resources, and confidence to continue building in improvements over time.
IndieWebCamp Los Angeles
1333 2nd Street,
Santa Monica, CA,
We’ve set up a variety of places for people to easily R.S.V.P. for the two-day event, choose the one that’s convenient for you:
* Eventbrite: https://www.eventbrite.com/e/indiewebcamp-la-2016-tickets-24335345674
* Lanyrd: http://lanyrd.com/2016/indiewebcamp-la
* Facebook: https://www.facebook.com/events/1701240643421269
* Meetup: https://www.meetup.com/IndieWeb-Homebrew-Website-Club-Los-Angeles/events/233698594/
If you’ve already got an IndieWeb enabled website and are able to R.S.V.P. by using your own site, try one of the following two R.S.V.P. locations:
* Indie Event: http://veganstraightedge.com/events/2016/04/01/indiewebcamp-la-2016
* IndieWeb Wiki: https://indieweb.org/2016/LA/Guest_List
I hope to see you there!
Now for that unwieldly list of sites I’ve spent untold hours setting up and maintaining…
Primary Internet Presences
Content from the above two sites is syndicated primarily, but not exclusively, or evenly to the following silo-based profiles
Little Free Library #8424 Blog
Mendeley ITBio References
Chris Aldrich Radio3 (Link Blog)
Category Theory Summer Study Group
Johns Hopkins Twitter Feed (Previous)
JHU Facebook Fan Page (Previous)
Other Social Profiles
Academia / Research Related
IEEE Information Theory Society (ITSOC)
Genius (fka Rap Genius, aka News Genius, etc)
FigShare – Research Data
OdySci – Engineering Research
Digital Signal Processing-StackExchange
Intense Debate (Comments)
Wishlist: Evolutionary Theory
Wishlist: Information Theory
Audio / Video
Food / Travel / Meetings
Peach (app only)
Kinja (commenting system/pseudo-blog)
Mnemotechniques (Memory Forum)
AppBrain Android Phone Apps
Defunct Social Sites
(Redirects to G+)
Seesmic (Video, Status)
GetGlue (Video checkin)
Google Reader (Reader)
(Status) – closed 02/09
(Status) – closed 11/22/10
Brightkite (Location/Status) – closed 12/10/10
Buzz (Status) – closed 12/15/11
(Location) – closed 3/11/12
(Photo)- closed 9/2/12
Posterous (Blog) – closed 4/30/13 [all content from this site has been recovered and ported]
Upcoming (Calendar) – closed 4/30/13
(Identity) – closed 12/12/13
Qik (Video) – closed 4/30/14
(Reading)- closed 7/1/14
(Status) – closed 9/1/14
– closed 9/1/14
FriendFeed (Social Networking)- closed 4/10/15
(Calendar) – closed 1/21/16
(Identity) – closing 9/11/16
Shelfari (Reading) – closed 3/16/16
How many social media identities do YOU have?
If you missed the notes from Day 1, see this post.
It may take me a week or so to finish putting some general thoughts and additional resources together based on the two day conference so that I might give a more thorough accounting of my opinions as well as next steps. Until then, I hope that the details and mini-archive of content below may help others who attended, or provide a resource for those who couldn’t make the conference.
Overall, it was an incredibly well programmed and run conference, so kudos to all those involved who kept things moving along. I’m now certainly much more aware at the gaping memory hole the internet is facing despite the heroic efforts of a small handful of people and institutions attempting to improve the situation. I’ll try to go into more detail later about a handful of specific topics and next steps as well as a listing of resources I came across which may provide to be useful tools for both those in the archiving/preserving and IndieWeb communities.
Archive of materials for Day 2
Below are the recorded audio files embedded in .m4a format (using a Livescribe Pulse Pen) for several sessions held throughout the day. To my knowledge, none of the breakout sessions were recorded except for the one which appears below.
Summarizing archival collections using storytelling techniques
Presentation: Summarizing archival collections using storytelling techniques by Michael Nelson, Ph.D., Old Dominion University
Saving the first draft of history
Special guest speaker: Saving the first draft of history: The unlikely rescue of the AP’s Vietnam War files by Peter Arnett, winner of the Pulitzer Prize for journalism
Kiss your app goodbye: the fragility of data journalism
Panel: Kiss your app goodbye: the fragility of data journalism
Featuring Meredith Broussard, New York University; Regina Lee Roberts, Stanford University; Ben Welsh, The Los Angeles Times; moderator Martin Klein, Ph.D., Los Alamos National Laboratory
The future of the past: modernizing The New York Times archive
Panel: The future of the past: modernizing The New York Times archive
Featuring The New York Times Technology Team: Evan Sandhaus, Jane Cotler and Sophia Van Valkenburg; moderated by Edward McCain, RJI and MU Libraries
Lightning Rounds: Six Presenters
Lightning rounds (in two parts)
Six + one presenters: Jefferson Bailey, Terry Britt, Katherine Boss (and team), Cynthia Joyce, Mark Graham, Jennifer Younger and Kalev Leetaru
1: Jefferson Bailey, Internet Archive, “Supporting Data-Driven Research using News-Related Web Archives” 2: Terry Britt, University of Missouri, “News archives as cornerstones of collective memory” 3: Katherine Boss, Meredith Broussard and Eva Revear, New York University: “Challenges facing preservation of born-digital news applications” 4: Cynthia Joyce, University of Mississippi, “Keyword ‘Katrina’: Re-collecting the unsearchable past” 5: Mark Graham, Internet Archive/The Wayback Machine, “Archiving news at the Internet Archive” 6: Jennifer Younger, Catholic Research Resources Alliance: “Digital Preservation, Aggregated, Collaborative, Catholic” 7. Kalev Leetaru, senior fellow, The George Washington University and founder of the GDELT Project: A Look Inside The World’s Largest Initiative To Understand And Archive The World’s News
Technology and Community
Presentation: Technology and community: Why we need partners, collaborators, and friends by Kate Zwaard, Library of Congress
Breakout: Working with CMS
Working with CMS, led by Eric Weig, University of Kentucky
Alignment and reciprocity
Alignment & reciprocity by Katherine Skinner, Ph.D., executive director, the Educopia Institute
Closing remarks by Edward McCain, RJI and MU Libraries and Todd Grappone, associate university librarian, UCLA
Live Tweet Archive
Reminder: In many cases my tweets don’t reflect direct quotes of the attributed speaker, but are often slightly modified for clarity and length for posting to Twitter. I have made a reasonable attempt in all cases to capture the overall sentiment of individual statements while using as many original words of the participant as possible. Typically, for speed, there wasn’t much editing of these notes. Below I’ve changed the attribution of one or two tweets to reflect the proper person(s). Fore convenience, I’ve also added a few hyperlinks to useful resources after the fact that didn’t have time to make the original tweets. I’ve attached .m4a audio files of most of the audio for the day (apologies for shaky quality as it’s unedited) which can be used for more direct attribution if desired. The Reynolds Journalism Institute videotaped the entire day and livestreamed it. Presumably they will release the video on their website for a more immersive experience.
Condoms were required issue in Vietnam–we used them to waterproof film containers in the field.
Do not stay close to the head of a column, medics, or radiomen. #warreportingadvice
I told the AP I would undertake the task of destroying all the reporters’ files from the war.
Instead the AP files moved around with me.
Eventually the 10 trunks of material went back to the AP when they hired a brilliant archivist.
“The negatives can outweigh the positives when you’re in trouble.”
Our first panel:Kiss your app goodbye: the fragility of data jornalism
I teach data journalism at NYU
A news app is not what you’d install on your phone
Dollars for Docs is a good example of a news app
A news app is something that allows the user to put themself into the story.
Often there are three CMSs: web, print, and video.
News apps don’t live in any of the CMSs. They’re bespoke and live on a separate data server.
This has implications for crawlers which can’t handle them well.
Then how do we save news apps? We’re looking at examples and then generalizing.
Everyblock.com was a good example based on chicagocrime and later bought by NBC and shut down.
What?! The internet isn’t forever? Databases need to be save differently than web pages.
Reprozip was developed by NYU Center for Data and we’re using it to save the code, data, and environment.
We make apps that serve our audience.
We also make internal tools that empower the newsroom.
We also use our nerdy skills to do cool things.
Most of us aren’t good programmers, we “cheat” by using frameworks.
Frameworks do a lot of basic things for you, so you don’t have to know how to do it yourself.
Archiving tools often aren’t built into these frameworks.
Instagram, Pinterest, Mozilla, and the LA Times use django as our framework.
Memento for WordPress is a great way to archive pages.
We must do more. We need archiving baked into the systems from the start.
Slides at http://bit.ly/frameworkfix
Got data? I’m a librarian at Stanford University.
I’ll mention Christine Borgman’s book Big Data, Little Data, No data.
Journalists are great data liberators: FOIA requests, cleaning data, visualizing, getting stories out of data.
But what happens to the data once the story is published?
BLDR: Big Local Digital Repository, an open repository for sharing open data.
For metadata: www.ddialliance.org, RDF, International Image Interoperability Framework (iiif) and MODS
We’ll open up for questions.
What’s more important: obey copyright laws or preserving the content?
The new creative commons licenses are very helpful, but we have to be attentive to many issues.
Perhaps archiving it and embargoing for later?
Saving the published work is more important to me, and the rest of the byproduct is gravy.
I work for the New York Times, you may have heard of it…
Talking about modernizing the born-digital legacy content.
Our problem was how to make an article from 2004 look like it had been published today.
There were 100’s of thousands of articles missing.
There was no one definitive list of missing articles.
Outlining the workflow for reconciling the archive XML and the definitive list of URLs for conversion.
It’s important to use more than one source for building an archive.
I’m going to talk about all of “the little things” that came up along the way..
Article Matching: Fusion – How to convert print XML with web HTML that was scraped.
Primarily, we looked at common phrases between the corpus of the two different data sets.
We prioritized the print data over the digital data.
We maintain a system called switchboard that redirects from old URLs to the new ones to prevent link rot.
The case of the missing sections: some sections of the content were blank and not transcribed.
We made the decision of taking out data we had in lieu of making a better user experience for missing sections.
In the future, we’d also like to put photos back into the articles.
Can you discuss the decision to go with a more modern interface rather than a traditional archive of how it looked?
Some of the decision was to get the data into an accessible format for modern users.
We do need to continue work on preserving the original experience.
Is there a way to distinguish between the print version and the online versions in the archive?
Could a researcher do work on the entire corpora? Is it available for subscription?
We do have a sub-section of data availalbe, but don’t have it prior to 1960.
Have you documented the process you’ve used on this preservation project?
We did save all of the code for the project within GitHub.
We do have meeting notes which provide some documentation, though they’re not thorough.
Today I spent most of the majority of the day attending the first of a two day conference at UCLA’s Charles Young Research Library entitled “Dodging the Memory Hole: Saving Online News.” While I knew mostly what I was getting into, it hadn’t really occurred to me how much of what is on the web is not backed up or archived in any meaningful way. As a part of human nature, people neglect to back up any of their data, but huge swaths of really important data with newsworthy and historic value is being heavily neglected. Fortunately it’s an interesting enough problem to draw the 100 or so scholars, researchers, technologists, and journalists who showed up for the start of an interesting group being conglomerated through the Reynolds Journalism Institute and several sponsors of the event.
What particularly strikes me is how many of the philosophies of the IndieWeb movement and tools developed by it are applicable to some of the problems that online news faces. I suspect that if more journalists were practicing members of the IndieWeb and used their sites not only for collecting and storing the underlying data upon which they base their stories, but to publish them as well, then some of the (future) archival process may be easier to accomplish. I’ve got so many disparate thoughts running around my mind after the first day that it’ll take a bit of time to process before I write out some more detailed thoughts.
Twitter List for the Conference
As a reminder to those attending, I’ve accumulated a list of everyone who’s tweeted with the hashtag #DtMH2016, so that attendees can more easily follow each other as well as communicate online following our few days together in Los Angeles. Twitter also allows subscribing to entire lists too if that’s something in which people have interest.
Archiving the day
It seems only fitting that an attendee of a conference about saving and archiving digital news, would make a reasonable attempt to archive some of his experience right?! Toward that end, below is an archive of my tweetstorm during the day marked up with microformats and including hovercards for the speakers with appropriate available metadata. For those interested, I used a fantastic web app called Noter Live to capture, tweet, and more easily archive the stream.
Note that in many cases my tweets don’t reflect direct quotes of the attributed speaker, but are often slightly modified for clarity and length for posting to Twitter. I have made a reasonable attempt in all cases to capture the overall sentiment of individual statements while using as many original words of the participant as possible. Typically, for speed, there wasn’t much editing of these notes. I’m also attaching .m4a audio files of most of the audio for the day (apologies for shaky quality as it’s unedited) which can be used for more direct attribution if desired. The Reynolds Journalism Institute videotaped the entire day and livestreamed it. Presumably they will release the video on their website for a more immersive experience.
If you prefer to read the stream of notes in the original Twitter format, so that you can like/retweet/comment on individual pieces, this link should give you the entire stream. Naturally, comments are also welcome below.
Below are the audio files for several sessions held throughout the day.
Greetings and Keynote
Greetings: Edward McCain, digital curator of journalism, Donald W. Reynolds Journalism Institute (RJI) and University of Missouri Libraries and Ginny Steel, university librarian, UCLA
Keynote: Digital salvage operations — what’s worth saving? given by Hjalmar Gislason, vice president of data, Qlik
Why save online news? and NewsScape
Panel: “Why save online news?” featuring Chris Freeland, Washington University; Matt Weber, Ph.D., Rutgers, The State University of New Jersey; Laura Wrubel, The George Washington University; moderator Ana Krahmer, Ph.D., University of North Texas
Presentation: “NewsScape: preserving TV news” given by Tim Groeling, Ph.D., UCLA Communication Studies Department
Born-digital news preservation in perspective
Speaker: Clifford Lynch, Ph.D., executive director, Coalition for Networked Information on “Born-digital news preservation in perspective”
Live Tweet Archive
Getting Noter Live fired up for Dodging the Memory Hole 2016: Saving Online News https://www.rjionline.org/dtmh2016
I’m glad I’m not at NBC trying to figure out the details for releasing THE APPRENTICE tapes.
Let’s thank @UCLA and the library for hosting us all.
While you’re here, don’t forget to vote/provide feedback throughout the day for IMLS
Someone once pulled up behind me and said “Hi Tiiiigeeerrr!” #Mizzou
A server at the Missourian crashed as the system was obsolete and running on baling wire. We lost 15 years of archives
The dean & head of Libraries created a position to save born digital news.
We’d like to help define stake-holder roles in relation to the problem.
Newspaper is really an outmoded term now.
I’d like to celebrate that we have 14 student scholars here today.
We’d like to have you identify specific projects that we can take to funding sources to begin work after the conference
We’ll be going to our first speaker who will be introduced by Martin Klein from Los Alamos.
Hjalmar Gislason is a self-described digital nerd. He’s the Vice President of Data.
I wonder how one becomes the President of Data?
My Icelandic name may be the most complicated part of my talk this morning.
Speaking on Digital Salvage Operations: What’s worth Saving”
My father in law accidentally threw away my wife’s favorite stuffed animal. #DeafTeddy
Some people just throw everything away because they’re not being used. Others keep everything and don’t throw it away.
The fundamental question: Do you want to save everything or do you want to get rid of everything?
I joined @qlik two years ago and moved to Boston.
Before that I was with spurl.net which was about saving copies of webpages they’d previously visited.
I had also previously invested in kjarninn which is translated as core.
We used to have little data, now we’re with gigantic data and moving to gargantuan data soon.
One of my goals today is to broaden our perspective about what data needs saving.
There’s the Web, the “Deep” Web, then there’s “Other” data which is at the bottom of the pyramid.
I got to see into the process of #panamapapers but I’d like to discuss the consequences from April 3rd.
The amount of meetings were almost more than could have been covered in real time in Iceland.
The #panamapapers were a soap opera, much like US politics.
Looking back at the process is highly interesting, but it’s difficult to look at all the data as they unfoldedd
How can we capture all the media minute by minute as a story unfolds.
You can’t trust that you can go back to a story at a certain time and know that it hasn’t been changed. #1984 #Orwell
There was a relatively pro-HRC piece earlier this year @NYTimes that was changed.
Newsdiffs tracks changes in news over time. The HRC article had changed a lot.
Let’s say you referenced @CNN 10 years ago, likely now, the CMS and the story have both changed.
8 years ago, I asked, wouldn’t we like to have the social media from Iceland’s only Nobel Laureate as a teenager?
What is private/public, ethical/unethical when dealing with data?
Much data is hidden behind passwords or on systems which are not easily accessed from a database perspective.
Most of the content published on Facebook isn’t public. It’s hard to archive in addition to being big.
We as archivists have no claim on the hidden data within Facebook.
Then there’s “other” data: 500 hours of video us uploaded to YouTube per minute.
No organization can go around watching all of this video data. Which parts are newsworthy?
Content could surface much later or could surface through later research.
Hornbjargsviti lighthouse recorded the weather every three hours for years creating lots of data.
And that was just one of hundreds of sites that recorded this type of data in Iceland.
Lots of this data is lost. Much that has been found was by coincidence. It was never thought to archive it.
This type of weather data could be very valuable to researchers later on.
There was also a large archive of Icelandic data that was found.
Showing a timelapse of Icelandic earthquakes https://vimeo.com/24442762
You can watch the magma working it’s way through the ground before it makes it’s way up through the land.
National Geographic featured this video in a documentary.
Sometimes context is important when it comes to data. What is archived today may be more important later.
As the economic crisis unfolded in Greece, it turned out the data that was used to allow them into EU was wrong.
The data was published at the time of the crisis, but there was no record of what the data looked like 5 years earlier.
Only way to recreate the data was to take prior printed sources. This is usu only done in extraordinary cirucumstances.
We captured 150k+ data sets with more than 8 billion “facts” which was just a tiny fraction of what exists.
How can we delve deeper into large data sets, all with different configurations and proprietary systems.
“There’s a story in every piece of data.”
Once a year energy consumption seems to dip because February has fewer days than other months. Plotting it matters.
Year over year comparisons can be difficult because of things like 3 day weekends which shift over time.
Here’s a graph of the population of Iceland. We’ve had our fair share of diseases and volcanic eruptions.
To compare, here’s a graph of the population of sheep. They outnumber us by an order(s) of magnitude.
In the 1780’s there was an event that killed off lots of sheep, so people had the upper hand.
Do we learn more from reading today’s “newspaper” or one from 30, 50, or 100 years ago?
There was a letter to the editor about an eruption and people had to move into the city.
letter: “We can’t have all these people come here, we need to build for our own people first.”
This isn’t too different from our problems today with respect to Syria. In that case, the people actually lived closer.
In the born-digital age, what will the experience look like trying to capture today 40 years hence?
Will it even be possible?
Machine data connections will outnumber “people” data connections by a factor of 10 or more very quickly.
With data, we need to analyze, store, and discard data. How do we decide in a spit-second what to keep & discard?
We’re back to the father-in-law and mother-in-law question: What to get rid of and what to save?
Computing is continually beating human tasks: chess, Go, driving a car. They build on lots more experience based on data
Whoever has the most data on driving cars and landscape will be the ultimate winner in that particular space.
Data is valuable, sometimes we just don’t know which yet.
Hoarding is not a strategy.
You can only guess at what will be important.
“Commercial use in Doubt” The third sub-headline in a newspaper about an early test of television.
There’s more to it than just the web.
Hoarding isn’t a strategy really resonates with librarians, what could that relationship look like?
One should bring in data science, industry may be ahead of libraries.
Cross-disciplinary approaches may be best. How can you get a data scientist to look at your problem? Get their attention?
There’s 60K+ books about the Viet Nam War. How do we learn to integrate what we learn after an event (like that)?
Perspective always comes with time, as additional information arrives.
Scientific papers are archived in a good way, but the underlying data is a problem.
In the future you may have the ability to add supplementary data as a supplement what appears in a book (in a better way)
Archives can give the ability to have much greater depth on many topics.
Are there any centers of excellence on the topics we’re discussing today? This conference may be IT.
We need more people that come from the technical side of things to be watching this online news problem.
Hacks/Hackers is a meetup group that takes place all over the world.
It brings the journalists and computer scientists together regularly for beers. It’s some of the outreach we need.
If you’re not interested in money, this is a good area to explore. 10 minute break.
Don’t forget to leave your thoughts on the questions at the back of the room.
We’re going to get started with our first panel. Why is it important to save online news?
I’m Matt Weber from Rugters University and in communications.
I’ll talk about web archives and news media and how they interact.
I worked at Tribune Corp. for several years and covered politics in DC.
I wanted to study the way in which the news media is changing.
We’re increadingly seeing digital only media with no offline surrogate.
It’s becomign increasingly difficult to do anything but look at it now as it exists.
There was no large scale online repository of online news to do research.
#OccupyWallStreet is one of the first examples of stories that exist online in ocurence and reportage.
There’s a growing need to archive content around local news particularly politics and democracy.
When there is a rich and vibrant local news environment, people are more likely to become engaged.
Local news is one of the least thought about from an archive perspective.
I’m at GWU Librarys in the scholarly technology group.
I’m involved in social feed manager which allows archivists to put together archives from social services.
Kimberly Gross, a faculty member, studies tweets of news outlets and journalists.
We created a prototype tool to allow them to collect data from social media.
Journalists were 2011 primarily using their Twitter presences to direct people to articles rather than for conversation
We collect data of political candidates.
I’m an associate library and representing “Documenting the Now” with WashU, UCRiverside, & UofMd
Documenting the Now revolves around Twitter documentation.
It started with the Ferguson story and documenting media, videos during the protests in the community.
What can we as memory institutions do to capture the data?
We gathered 14million tweets relating to Ferguson within two weeks.
We tried to build a platform that others could use in the future for similar data capture relating to social.
Ethics is important in archiving this type of news data.
Digitally preserving pdfs from news organizations and hyper-local news in Texas.
We’re approaching 5million pages of archived local news.
What is news that needs to be archived, and why?
First, what is news? The definition is unique to each individual.
We need to capture as much of the social news and social representation of news which is fragmented.
It’s an important part of society today.
We no longer produce hard copies like we did a decade ago. We need to capture the online portion.
We’d like to get the perspective of journalists, and don’t have one on the panel today.
We looked at how midterm election candidates used Twitter. Is that news itself? What tools do we use to archive it?
What does it mean to archive news by private citizens?
Twitter was THE place to find information in St. Louis during the Ferguson protests.
Local news outlets weren’t as good as Twitter during the protests.
I could hear the protest from 5 blocks away and only found news about it on Twitter.
The story was bing covered very differently on Twitter than the local (mainstream) news.
Alternate voices in the mix were very interesting and important.
Twitter was in the moment and wasn’t being edited and causing a delay.
What can we learn from this massive number of Ferguson tweets.
It gives us information about organizing, and what language was being used.
I think about the archival portion of this question. By whom does it need to be archived?
What do we archive next?
How are we representing the current population now?
Who is going to take on the burden of archiving? Should it be corporate? Cultural memory institution?
Someone needs to currate it, who does that?
our next question: What do you view as primary barriers to news archiving?
How do we organize and staff? There’s no shortage of work.
Tools and software can help the process, but libraries are usually staffed very thinly.
No single institution can do this type of work alone. Collaboration is important.
Two barriers we deal with: terms of service are an issue with archiving. We don’t own it, but can use it.
Libraries want to own the data in perpetuity. We don’t own our data.
There’s a disconnect in some of the business models for commercialization and archiving.
Issues with accessing data.
People were worried about becoming targets or losing jobs because of participation.
What is role of ethics of archiving this type of data? Allowing opting out?
What about redacting portions? anonymizing the contributions?
Publishers have a responsibility for archiving their product. Permission from publishers can be difficult.
We have a lot of underserved communities. What do we do with comments on stories?
Corporations may not continue to exist in the future and data will be lost.
There’s a balance to be struck between the business side and the public good.
It’s hard to convince for profit about the value of archiving for the social good.
Next Q: What opportunities have revealed themselves in preserving news?
Finding commonalities and differences in projects is important.
What does it mean to us to archive different media types? (think diversity)
What’s happening in my community? in the nation? across the world?
The long-history in our archives will help us learn about each other.
We can only do so much with the resources we have.
We’ve worked on a cyber cemetery product in the past.
Someone else can use the tools we create within their initiatives.
repeating ?: What are issues in archiving longerform video data with regard to stories on Periscope?
How do you channel the energy around archiving news archiving?
Research in the area is all so new.
Does anyone have any experience with legal wrangling with social services?
The ACLU is waging a lawsuit against Twitter about archived tweets.
Outreach to community papers is very rhizomic.
How do you take local examples and make them a national model?
We’re teenagers now in the evolution of what we’re doing.
Peter Arnett just said “This is all ore interesting than I thought it would be.”
Next Presentation: NewsScape: preserving TV news
I’ll be talking about the NewsScape project of Francis Steen, Director, Communication Studies Archive
I’m leading the archiving of the analog portion of the collection.
The oldest of our collection dates from the 1950’s. We’ve hosted them on YouTube which has created some traction.
Commenters have been an issue with posting to YouTube as well as copyright.
NewsScape is the largest collecction of TV news and public affairs programs (local & national)
Prior to 2006, we don’t know what we’ve got.
Paul said “Ill record everytihing I can and someone in the future can deal with it.”
We have 50K hours of Betamax.
VHS are actually most threatened, despite being newest tapes.
Our budget was seriously strapped.
Maintaining closed captioning is important to our archiving efforts.
We’ve done 36k hours of encoding this year.
We use a layer of dead VCR’s over our good VCR’s to prevent RF interference and audio buzzing. 🙂
Post-2006 We’re now doing straight to digital
Preservation is the first step, but we need to be more than the world’s best DVR.
Searching the news is important too.
Showing a data visualization of news analysis with regard to the Heathcare Reform movement.
We’re doing facial analysis as well.
We have interactive tools at viz2016.com.
We’ve tracked how often candidates have smiled in election 2016. Hillary > Trump
We want to share details within our collection, but don’t have tools yet.
Having a good VCR repairman has helped us a lot.
Breaking for lunch…
Talk “Born-digital news preservation in perspective”
There’s a shared consensus that preserving scholarly publications is important.
While delivery models have shifted, there must be some fall back to allow content to survive publisher failure.
Preservation was a joint investment between memory institutions and publishers.
Keepers register their coverage of journals for redundancy.
In studying coverage, we’ve discovered Elsevier is REALLY well covered, but they’re not what we’re worried about.
It’s the small journals as edge cases that really need more coverage.
Smaller journals don’t have resources to get into the keeper services and it’s more expensive.
Many Open Access Journals are passion projects and heavily underfunded and they are poorly covered.
Being mindful of these business dynamics is key when thinking about archiving news.
There are a handful of large news outlets that are “too big to fail.”
There are huge numbers of small outlets like subject verticals, foreign diasporas, etc. that need to be watched
Different strategies should be used for different outlets.
The material on lots of links (as sources) disappears after a short period of time.
While Archive.org is a great resource, it can’t do everything.
Preserving underlying evidence is really important.
How we deal with massive databases and queries against them are a difficult problem.
I’m not aware of studies of link rot with relationship to online news.
Who steps up to preserve major data dumps like Snowden, PanamaPapers, or email breaches?
Social media is a collection of observations and small facts without necessarily being journalism.
Journalism is a deliberate act and is meant to be public while social media is not.
We need to come up with a consensus about what parts of social media should be preserved as news..
News does often delve into social media as part of its evidence base now.
Responsible journalism should include archival storage, but it doesn’t yet.
Under current law, we can’t protect a lot of this material without the permission of the creator(s).
The Library of Congress can demand deposit, but doesn’t.
With funding issues, I’m not wild about the Library of Congress being the only entity [for storage.]
In the UK, there are multiple repositories.
testing to see if I’m still live
What happens if you livetweet too much in one day.
I run across notices on the web like this regularly and it used to aggravate me to no end:
Infuriatingly it usually involved having just spent 5 minutes reading something and then spending 10 minutes to hours writing a reasoned and thoughtful response. (Because every troll knows that’s what the internet was designed to encourage, right?)
After pressing the reply button (even scarier than hitting the “Publish” button because you don’t have the ability to edit it after-the-fact and someone else now “owns” your content), you see the dreaded notice that your comment is “AWAITING MODERATION…”
Will they approve it? Will they delete it? Is it gone forever? Did they really get it, or did it disappear into the ether? Oh #%@$!, I wish I’d made a back up copy because that took a bit of work, and I might like to refer to it again later. Are they going to censor my thoughts? Silence my voice?
I Get It: The Need for Moderation
I completely get the need for moderation on the web, particularly as almost no one is as kind, considerate, courteous, or civil as my friend P.M. Forni. (And who could be — he literally wrote the book(s) on the subject!)
On a daily basis, I’m spammed by sites desperate to sell or promote FIFA coins, Ray Bans, Christian Louboutin shoes, or even worse types of hateful blather, so I too gently moderate. I try to save my own readers from having to see such drivel, and don’t want to provide a platform or audience for them to shout from or at, respectively.
I won’t be silenced anymore
No longer can I be silenced by random moderators that I often don’t know.
Why, you ask?
I now post everything I write online onto a site I own first.
Because now, thanks to philosophies from the Indieweb movement and technologies like webmention, which growing numbers of websites are beginning to support, I now post everything I write online onto a site I own first. There it can be read in perpetuity by anyone who chooses to come read it, or from where I can syndicate it out to the myriad of social media sites for others to read en masse. (And maybe my voice has more reach than the site I’m posting to?)
Functionality like webmention (a more modern version of pingback or trackback) then allows my content to be sent to the website I was replying to in an elegant way for (eventual?) display. Or I can copy and paste it directly if they don’t support modern protocols.
Sure, they can choose to moderate me or choose not to feature my viewpoint on their own site if they wish, but at least I still own the work I put into those thoughts. I don’t have to worry about where they went or how I might be able to find them in the future. They will always be mine, and that is empowering.
Would you like to own your own data? Own your own domain? Free yourself from the restrictions of the social media silos like Facebook, Instagram, and Twitter? Visit Indieweb.org to see how you can do these things. Chat with like-minded individuals who can also help you out. Attend an upcoming IndieWebCamp or a local Homebrew Website Club in your area, or start one of your own!
Does blogging need to be different than it was?
agree with John that blogs seemingly occupy a different space in online life today than they did a decade ago, but I won’t concede that, for me at least, most of it has moved to the social media silos.
I think the role of the blog is different than it was even just a couple of years ago. It’s not the sole outpost of an online life, although it can be an anchor, holding it in place. — John Scalzi
Why? About two years ago I began delving into the evolving movement known as IndieWeb, which has re-empowered me to take back my web presence and use my own blog/website as my primary online hub and identity. The tools I’ve found there allow me to not only post everything to my own site first and then syndicate it out to the social circles and sites I feel it might resonate with, but best of all, the majority of the activity (comments, likes, shares, etc.) on those sites boomerangs back to the comments on my own site! This gives me a better grasp on where others are interacting with my content, and I can interact along with them on the platforms that they choose to use.
Some of the benefit is certainly a data ownership question — for who is left holding the bag if a major site like Twitter or Facebook is bought out or shut down? This has happened to me in dozens of cases over the past decade where I’ve put lots of content and thought into a site only to see it shuttered and have all of my data and community disappear with it.
Other benefits include: cutting down on notification clutter, more enriching interactions, and less time wasted scrolling through social sites.
Reply from my own site
Now I’m able to use my own site to write a comment on John’s post (where the comments are currently technically closed), and keep it for myself, even if his blog should go down one day. I can alternately ping his presence on other social media (say, by means of Twitter) so he’ll be aware of the continued conversational ripples he’s caused.
Social media has become ubiquitous in large part because those corporate sites are dead simple for Harry and Mary Beercan to use. Even my own mother’s primary online presence begins with http://facebook.com/. But not so for me. I’ve taken the reigns of my online life back.
My Own Hub
My blog remains my primary online hub, and some very simple IndieWeb tools enable it by bringing all the conversation back to me. I joined Facebook over a decade ago, and you’ll notice by the date on the photo that it didn’t take me long to complain about the growing and overwhelming social media problem I had.
I’m glad I can finally be at the center of my own social graph, and it was everything I thought it could be.