Notes from Day 1 of Dodging the Memory Hole: Saving Online News | Thursday, October 13, 2016

Today I spent most of the majority of the day attending the first of a two day conference at UCLA’s Charles Young Research Library entitled “Dodging the Memory Hole: Saving Online News.” While I knew mostly what I was getting into, it hadn’t really occurred to me how much of what is on the web is not backed up or archived in any meaningful way. As a part of human nature, people neglect to back up any of their data, but huge swaths of really important data with newsworthy and historic value is being heavily neglected. Fortunately it’s an interesting enough problem to draw the 100 or so scholars, researchers, technologists, and journalists who showed up for the start of an interesting group being conglomerated through the Reynolds Journalism Institute and several sponsors of the event.

What particularly strikes me is how many of the philosophies of the IndieWeb movement and tools developed by it are applicable to some of the problems that online news faces. I suspect that if more journalists were practicing members of the IndieWeb and used their sites not only for collecting and storing the underlying data upon which they base their stories, but to publish them as well, then some of the (future) archival process may be easier to accomplish. I’ve got so many disparate thoughts running around my mind after the first day that it’ll take a bit of time to process before I write out some more detailed thoughts.

Twitter List for the Conference

As a reminder to those attending, I’ve accumulated a list of everyone who’s tweeted with the hashtag #DtMH2016, so that attendees can more easily follow each other as well as communicate online following our few days together in Los Angeles. Twitter also allows subscribing to entire lists too if that’s something in which people have interest.

Archiving the day

It seems only fitting that an attendee of a conference about saving and archiving digital news, would make a reasonable attempt to archive some of his experience right?! Toward that end, below is an archive of my tweetstorm during the day marked up with microformats and including hovercards for the speakers with appropriate available metadata. For those interested, I used a fantastic web app called Noter Live to capture, tweet, and more easily archive the stream.

Note that in many cases my tweets don’t reflect direct quotes of the attributed speaker, but are often slightly modified for clarity and length for posting to Twitter. I have made a reasonable attempt in all cases to capture the overall sentiment of individual statements while using as many original words of the participant as possible. Typically, for speed, there wasn’t much editing of these notes. I’m also attaching .m4a audio files of most of the audio for the day (apologies for shaky quality as it’s unedited) which can be used for more direct attribution if desired. The Reynolds Journalism Institute videotaped the entire day and livestreamed it. Presumably they will release the video on their website for a more immersive experience.

If you prefer to read the stream of notes in the original Twitter format, so that you can like/retweet/comment on individual pieces, this link should give you the entire stream. Naturally, comments are also welcome below.

Audio Files

Below are the audio files for several sessions held throughout the day.

Greetings and Keynote


Greetings: Edward McCain, digital curator of journalism, Donald W. Reynolds Journalism Institute (RJI) and University of Missouri Libraries and Ginny Steel, university librarian, UCLA
Keynote: Digital salvage operations — what’s worth saving? given by Hjalmar Gislason, vice president of data, Qlik

Why save online news? and NewsScape


Panel: “Why save online news?” featuring Chris Freeland, Washington University; Matt Weber, Ph.D., Rutgers, The State University of New Jersey; Laura Wrubel, The George Washington University; moderator Ana Krahmer, Ph.D., University of North Texas
Presentation: “NewsScape: preserving TV news” given by Tim Groeling, Ph.D., UCLA Communication Studies Department

Born-digital news preservation in perspective


Speaker: Clifford Lynch, Ph.D., executive director, Coalition for Networked Information on “Born-digital news preservation in perspective”

Live Tweet Archive

ChrisAldrich:

Getting Noter Live fired up for Dodging the Memory Hole 2016: Saving Online News https://www.rjionline.org/dtmh2016

Ginny Steel:

I’m glad I’m not at NBC trying to figure out the details for releasing THE APPRENTICE tapes.

Edward McCain:

Let’s thank @UCLA and the library for hosting us all.

While you’re here, don’t forget to vote/provide feedback throughout the day for IMLS

Someone once pulled up behind me and said “Hi Tiiiigeeerrr!” #Mizzou

A server at the Missourian crashed as the system was obsolete and running on baling wire. We lost 15 years of archives

The dean & head of Libraries created a position to save born digital news.

We’d like to help define stake-holder roles in relation to the problem.

Newspaper is really an outmoded term now.

I’d like to celebrate that we have 14 student scholars here today.

We’d like to have you identify specific projects that we can take to funding sources to begin work after the conference

We’ll be going to our first speaker who will be introduced by Martin Klein from Los Alamos.

Martin Klein:

Hjalmar Gislason is a self-described digital nerd. He’s the Vice President of Data.

I wonder how one becomes the President of Data?

Hjalmar Gislason:

My Icelandic name may be the most complicated part of my talk this morning.

Speaking on Digital Salvage Operations: What’s worth Saving”

My father in law accidentally threw away my wife’s favorite stuffed animal. #DeafTeddy

Some people just throw everything away because they’re not being used. Others keep everything and don’t throw it away.

The fundamental question: Do you want to save everything or do you want to get rid of everything?

I joined @qlik two years ago and moved to Boston.

Before that I was with spurl.net which was about saving copies of webpages they’d previously visited.

I had also previously invested in kjarninn which is translated as core.

We used to have little data, now we’re with gigantic data and moving to gargantuan data soon.

One of my goals today is to broaden our perspective about what data needs saving.

There’s the Web, the “Deep” Web, then there’s “Other” data which is at the bottom of the pyramid.

I got to see into the process of #panamapapers but I’d like to discuss the consequences from April 3rd.

The amount of meetings were almost more than could have been covered in real time in Iceland.

The #panamapapers were a soap opera, much like US politics.

Looking back at the process is highly interesting, but it’s difficult to look at all the data as they unfoldedd

How can we capture all the media minute by minute as a story unfolds.

You can’t trust that you can go back to a story at a certain time and know that it hasn’t been changed. #1984 #Orwell

There was a relatively pro-HRC piece earlier this year @NYTimes that was changed.

Newsdiffs tracks changes in news over time. The HRC article had changed a lot.

Let’s say you referenced @CNN 10 years ago, likely now, the CMS and the story have both changed.

8 years ago, I asked, wouldn’t we like to have the social media from Iceland’s only Nobel Laureate as a teenager?

What is private/public, ethical/unethical when dealing with data?

Much data is hidden behind passwords or on systems which are not easily accessed from a database perspective.

Most of the content published on Facebook isn’t public. It’s hard to archive in addition to being big.

We as archivists have no claim on the hidden data within Facebook.

ChrisAldrich:

The could help archivists in the future in accessing more personal data.

Hjalmar Gislason:

Then there’s “other” data: 500 hours of video us uploaded to YouTube per minute.

No organization can go around watching all of this video data. Which parts are newsworthy?

Content could surface much later or could surface through later research.

Hornbjargsviti lighthouse recorded the weather every three hours for years creating lots of data.

And that was just one of hundreds of sites that recorded this type of data in Iceland.

Lots of this data is lost. Much that has been found was by coincidence. It was never thought to archive it.

This type of weather data could be very valuable to researchers later on.

There was also a large archive of Icelandic data that was found.

Showing a timelapse of Icelandic earthquakes https://vimeo.com/24442762

You can watch the magma working it’s way through the ground before it makes it’s way up through the land.

National Geographic featured this video in a documentary.

Sometimes context is important when it comes to data. What is archived today may be more important later.

As the economic crisis unfolded in Greece, it turned out the data that was used to allow them into EU was wrong.

The data was published at the time of the crisis, but there was no record of what the data looked like 5 years earlier.

Only way to recreate the data was to take prior printed sources. This is usu only done in extraordinary cirucumstances.

We captured 150k+ data sets with more than 8 billion “facts” which was just a tiny fraction of what exists.

How can we delve deeper into large data sets, all with different configurations and proprietary systems.

“There’s a story in every piece of data.”

Once a year energy consumption seems to dip because February has fewer days than other months. Plotting it matters.

Year over year comparisons can be difficult because of things like 3 day weekends which shift over time.

Here’s a graph of the population of Iceland. We’ve had our fair share of diseases and volcanic eruptions.

To compare, here’s a graph of the population of sheep. They outnumber us by an order(s) of magnitude.

In the 1780’s there was an event that killed off lots of sheep, so people had the upper hand.

Do we learn more from reading today’s “newspaper” or one from 30, 50, or 100 years ago?

There was a letter to the editor about an eruption and people had to move into the city.

letter: “We can’t have all these people come here, we need to build for our own people first.”

This isn’t too different from our problems today with respect to Syria. In that case, the people actually lived closer.

In the born-digital age, what will the experience look like trying to capture today 40 years hence?

Will it even be possible?

Machine data connections will outnumber “people” data connections by a factor of 10 or more very quickly.

With data, we need to analyze, store, and discard data. How do we decide in a spit-second what to keep & discard?

We’re back to the father-in-law and mother-in-law question: What to get rid of and what to save?

Computing is continually beating human tasks: chess, Go, driving a car. They build on lots more experience based on data

Whoever has the most data on driving cars and landscape will be the ultimate winner in that particular space.

Data is valuable, sometimes we just don’t know which yet.

Hoarding is not a strategy.

You can only guess at what will be important.

“Commercial use in Doubt” The third sub-headline in a newspaper about an early test of television.

There’s more to it than just the web.

Kate Zwaard:

Hoarding isn’t a strategy really resonates with librarians, what could that relationship look like?

Hjalmar Gislason:

One should bring in data science, industry may be ahead of libraries.

Cross-disciplinary approaches may be best. How can you get a data scientist to look at your problem? Get their attention?

Peter Arnett:

There’s 60K+ books about the Viet Nam War. How do we learn to integrate what we learn after an event (like that)?

Hjalmar Gislason:

Perspective always comes with time, as additional information arrives.

Scientific papers are archived in a good way, but the underlying data is a problem.

In the future you may have the ability to add supplementary data as a supplement what appears in a book (in a better way)

Archives can give the ability to have much greater depth on many topics.

Are there any centers of excellence on the topics we’re discussing today? This conference may be IT.

We need more people that come from the technical side of things to be watching this online news problem.

Hacks/Hackers is a meetup group that takes place all over the world.

It brings the journalists and computer scientists together regularly for beers. It’s some of the outreach we need.

Edward McCain:

If you’re not interested in money, this is a good area to explore. 10 minute break.

Don’t forget to leave your thoughts on the questions at the back of the room.

We’re going to get started with our first panel. Why is it important to save online news?

Matthew Weber:

I’m Matt Weber from Rugters University and in communications.

I’ll talk about web archives and news media and how they interact.

I worked at Tribune Corp. for several years and covered politics in DC.

I wanted to study the way in which the news media is changing.

We’re increadingly seeing digital only media with no offline surrogate.

It’s becomign increasingly difficult to do anything but look at it now as it exists.

There was no large scale online repository of online news to do research.

#OccupyWallStreet is one of the first examples of stories that exist online in ocurence and reportage.

There’s a growing need to archive content around local news particularly politics and democracy.

When there is a rich and vibrant local news environment, people are more likely to become engaged.

Local news is one of the least thought about from an archive perspective.

Laura Wrubel:

I’m at GWU Librarys in the scholarly technology group.

I’m involved in social feed manager which allows archivists to put together archives from social services.

Kimberly Gross, a faculty member, studies tweets of news outlets and journalists.

We created a prototype tool to allow them to collect data from social media.

Journalists were 2011 primarily using their Twitter presences to direct people to articles rather than for conversation

We collect data of political candidates.

Chris Freeland:

I’m an associate library and representing “Documenting the Now” with WashU, UCRiverside, & UofMd

Documenting the Now revolves around Twitter documentation.

It started with the Ferguson story and documenting media, videos during the protests in the community.

What can we as memory institutions do to capture the data?

We gathered 14million tweets relating to Ferguson within two weeks.

We tried to build a platform that others could use in the future for similar data capture relating to social.

Ethics is important in archiving this type of news data.

Ana Krahmer:

Digitally preserving pdfs from news organizations and hyper-local news in Texas.

We’re approaching 5million pages of archived local news.

What is news that needs to be archived, and why?

Matthew Weber:

First, what is news? The definition is unique to each individual.

We need to capture as much of the social news and social representation of news which is fragmented.

It’s an important part of society today.

We no longer produce hard copies like we did a decade ago. We need to capture the online portion.

Laura Wrubel:

We’d like to get the perspective of journalists, and don’t have one on the panel today.

We looked at how midterm election candidates used Twitter. Is that news itself? What tools do we use to archive it?

What does it mean to archive news by private citizens?

Chris Freeland:

Twitter was THE place to find information in St. Louis during the Ferguson protests.

Local news outlets weren’t as good as Twitter during the protests.

I could hear the protest from 5 blocks away and only found news about it on Twitter.

The story was bing covered very differently on Twitter than the local (mainstream) news.

Alternate voices in the mix were very interesting and important.

Twitter was in the moment and wasn’t being edited and causing a delay.

What can we learn from this massive number of Ferguson tweets.

It gives us information about organizing, and what language was being used.

Ana Krahmer:

I think about the archival portion of this question. By whom does it need to be archived?

What do we archive next?

How are we representing the current population now?

Who is going to take on the burden of archiving? Should it be corporate? Cultural memory institution?

Someone needs to currate it, who does that?

our next question: What do you view as primary barriers to news archiving?

Laura Wrubel:

How do we organize and staff? There’s no shortage of work.

Tools and software can help the process, but libraries are usually staffed very thinly.

No single institution can do this type of work alone. Collaboration is important.

Chris Freeland:

Two barriers we deal with: terms of service are an issue with archiving. We don’t own it, but can use it.

Libraries want to own the data in perpetuity. We don’t own our data.

There’s a disconnect in some of the business models for commercialization and archiving.

Issues with accessing data.

People were worried about becoming targets or losing jobs because of participation.

What is role of ethics of archiving this type of data? Allowing opting out?

What about redacting portions? anonymizing the contributions?

Ana Krahmer:

Publishers have a responsibility for archiving their product. Permission from publishers can be difficult.

We have a lot of underserved communities. What do we do with comments on stories?

Corporations may not continue to exist in the future and data will be lost.

Matthew Weber:

There’s a balance to be struck between the business side and the public good.

It’s hard to convince for profit about the value of archiving for the social good.

Chris Freeland:

Next Q: What opportunities have revealed themselves in preserving news?

Finding commonalities and differences in projects is important.

What does it mean to us to archive different media types? (think diversity)

What’s happening in my community? in the nation? across the world?

The long-history in our archives will help us learn about each other.

Ana Krahmer:

We can only do so much with the resources we have.

We’ve worked on a cyber cemetery product in the past.

Someone else can use the tools we create within their initiatives.

Chris Freeland:

repeating ?: What are issues in archiving longerform video data with regard to stories on Periscope?

Audience Question:

How do you channel the energy around archiving news archiving?

Matthew Weber:

Research in the area is all so new.

Audience Question:

Does anyone have any experience with legal wrangling with social services?

Chris Freeland:

The ACLU is waging a lawsuit against Twitter about archived tweets.

Ana Krahmer:

Outreach to community papers is very rhizomic.

Audience Question:

How do you take local examples and make them a national model?

Ana Krahmer:

We’re teenagers now in the evolution of what we’re doing.

Edward McCain:

Peter Arnett just said “This is all ore interesting than I thought it would be.”

Next Presentation: NewsScape: preserving TV news

Tim Groeling:

I’ll be talking about the NewsScape project of Francis Steen, Director, Communication Studies Archive

I’m leading the archiving of the analog portion of the collection.

The oldest of our collection dates from the 1950’s. We’ve hosted them on YouTube which has created some traction.

Commenters have been an issue with posting to YouTube as well as copyright.

NewsScape is the largest collecction of TV news and public affairs programs (local & national)

Prior to 2006, we don’t know what we’ve got.

Paul said “Ill record everytihing I can and someone in the future can deal with it.”

We have 50K hours of Betamax.

VHS are actually most threatened, despite being newest tapes.

Our budget was seriously strapped.

Maintaining closed captioning is important to our archiving efforts.

We’ve done 36k hours of encoding this year.

We use a layer of dead VCR’s over our good VCR’s to prevent RF interference and audio buzzing. 🙂

Post-2006 We’re now doing straight to digital

Preservation is the first step, but we need to be more than the world’s best DVR.

Searching the news is important too.

Showing a data visualization of news analysis with regard to the Heathcare Reform movement.

We’re doing facial analysis as well.

We have interactive tools at viz2016.com.

We’ve tracked how often candidates have smiled in election 2016. Hillary > Trump

We want to share details within our collection, but don’t have tools yet.

Having a good VCR repairman has helped us a lot.

Edward McCain:

Breaking for lunch…

Clifford Lynch:

Talk “Born-digital news preservation in perspective”

There’s a shared consensus that preserving scholarly publications is important.

While delivery models have shifted, there must be some fall back to allow content to survive publisher failure.

Preservation was a joint investment between memory institutions and publishers.

Keepers register their coverage of journals for redundancy.

In studying coverage, we’ve discovered Elsevier is REALLY well covered, but they’re not what we’re worried about.

It’s the small journals as edge cases that really need more coverage.

Smaller journals don’t have resources to get into the keeper services and it’s more expensive.

Many Open Access Journals are passion projects and heavily underfunded and they are poorly covered.

Being mindful of these business dynamics is key when thinking about archiving news.

There are a handful of large news outlets that are “too big to fail.”

There are huge numbers of small outlets like subject verticals, foreign diasporas, etc. that need to be watched

Different strategies should be used for different outlets.

The material on lots of links (as sources) disappears after a short period of time.

While Archive.org is a great resource, it can’t do everything.

Preserving underlying evidence is really important.

How we deal with massive databases and queries against them are a difficult problem.

I’m not aware of studies of link rot with relationship to online news.

Who steps up to preserve major data dumps like Snowden, PanamaPapers, or email breaches?

Social media is a collection of observations and small facts without necessarily being journalism.

Journalism is a deliberate act and is meant to be public while social media is not.

We need to come up with a consensus about what parts of social media should be preserved as news..

News does often delve into social media as part of its evidence base now.

Responsible journalism should include archival storage, but it doesn’t yet.

Under current law, we can’t protect a lot of this material without the permission of the creator(s).

The Library of Congress can demand deposit, but doesn’t.

With funding issues, I’m not wild about the Library of Congress being the only entity [for storage.]

In the UK, there are multiple repositories.

ChrisAldrich:

testing to see if I’m still live

What happens if you livetweet too much in one day.
password-change-required

Homebrew Website Club — Los Angeles

In an effort to provide easier commuting access for a broader cross-section of Homebrew members we met last night at Yahoo’s Yahoo’s primary offices at 11995 W. Bluff Creek Drive, Playa Vista, CA 90094. We hope to alternate meetings of the Homebrew Website Club between the East and West sides of Los Angeles as we go forward. If anyone has additional potential meeting locations, we’re always open to suggestions as well as assistance.

We had our largest RSVP list to date, though some had last minute issues pop up and one sadly had trouble finding the location (likely due to a Google map glitch).

Angelo and Chris met before the quiet writing hour to discuss some general planning for future meetings as well as the upcoming IndieWebCamp in LA in November. Details and help for arrangements for out of town attendees should be posted shortly.

Notes from the “broadcast” portion of the meetup

Chris Aldrich (co-organizer)

Angelo Gladding (co-organizer)

  • Work is proceeding nicely on the overall build of Canopy
  • Discussed an issue with expanding data for social network in relation to events and potentially expanding contacts based on event attendees

Srikanth Bangalore (our host at Yahoo!)

  • Discussed some of his background in coding and work with Drupal and WordPress.
  • His personal site is https://srib.us/

Notes from the “working” portion of the meetup

We sketched out a way to help Srikanth IndieWeb-ify not only his own site, but to potentially help do so for Katie Couric’s Yahoo! based news site along with the pros/cons of workflows for journalists in general. We also considered some potential pathways for potentially bolting on webmentions for websites (like Tumblr/WordPress) which utilize Disqus for their commenting system. We worked through the details of webmentions and a bit of micropub for his benefit.

Srikanth discussed some of the history and philosophy behind why Tumblr didn’t have a more “traditional” native commenting system. The point was generally to socially discourage negativity, spamming, and abuse by forcing people to post their comments front and center on their own site (and not just in the “comments” of the receiving site) thereby making the negativity be front and center and redound to their own reputation rather than just the receiving page of the target. Most social media related sites hide (or make hard to search/find) the abusive nature of most users, while allowing them to appear better/nicer on their easier-to-find public facing persona.

Before closing out the meeting officially, we stopped by the front lobby where two wonderful and personable security guards (one a budding photographer) not only helped us with a group photo, but managed to help us escape the parking lot!

I think it’s agreed we all had a great time and look forward to more progress on projects, more good discussion, and more interested folks at the next meeting. Srikanth was so amazed at some of the concepts, it’s possible that all of Yahoo! may be IndieWeb-ified by the end of the week. 🙂

We hope you’ll join us next month on 10/05! (Details forthcoming…)

Live Tweets Archive


Ever with grand aspirations to do as good a job as the illustrious Kevin Marks, we tried some livetweeting with Noterlive. Alas the discussion quickly became so consuming that the effort was abandoned in lieu of both passion and fun. Hopefully some of the salient points were captured above in better form anyway.

Srikanth Bangalore:

I only use @drupal when I want to make money. (Replying to why his personal site was on @wordpress.) #

(This CMS comment may have been the biggest laugh of the night, though the tone captured here (and the lack of context), doesn’t do the comment any justice at all.)

Angelo Gladding:

I’m a hobby-ist programmer, but I also write code to make money. #

I’m into python which is my language of choice. #

Chris Aldrich:

Thanks again @themarketeng for hosting Homebrew Website Club at Yahoo tonight! We really appreciate the hospitality. #

My first pull request

Replied to My first pull request by Clint LalondeClint Lalonde (ClintLalonde.net)
Crazy to think that, even though I have had a GitHub account for 5 years and have poked, played and forked things, I have never made a pull request and contributed something to another project unti…
Clint, first, congratulations on your first PR!

Oddly, I had seen the VERY same post/repo a few weeks back and meant to add a readme too! (You’ll notice I got too wrapped up in reading through the code and creating some usability issues after installing the plugin instead.)

Given that you’ve got your own domain and website (and playing in ed/tech like many of us are), and you’re syndicating your blog posts out to Medium for additional reach, I feel compelled to mention some interesting web tech and philosophy in the movement. You can find some great resources and tools at their website.

In particular, you might take a look at their WordPress pages which includes some plugins and resources you’ll be sure to appreciate. One of their sets of resources is allowing you to not only syndicate your WP posts (what they call POSSE), but by using the new W3C webmention spec, you can connect many of your social media resources to brid.gy and have services like twitter, facebook, G+, instagram and others send the comments and likes on your posts there back to your blog directly, thereby allowing you to own all of your data (as well as the commentary that occurs elsewhere). I can see a lot of use for education in some of the infrastructure they’re building and aggregating there. (If you’re familiar with Known, they bake a lot of Indieweb goodness into their system from the start, but there’s no reason you shouldn’t have it for your WordPress site as well.)

If you need any help/guidance in following/installing anything there, I’m happy to help.

Congratulations again. Keep on pullin’!

Instagram Single Photo Bookmarklet

Ever wanted a simple and quick way to extract the primary details from an Instagram photo to put it on your own website?

The following javascript-based bookmarklet is courtesy of Tantek Çelik as an Indieweb tool he built at IndieWebCamp NYC2:

If you view a single photo permalink page, the following bookmarklet will extract the permalink (trimmed), photo jpg URL, and photo caption and copy them into a text note, suitable for posting as a photo that’s auto-linked:

javascript:n=document.images.length-1;s=document.images[n].src;s=s.split('?');s=s[0];u=document.location.toString().substring(0,39);prompt('Choose "Copy ⌘C" to copy photo post:',s+' '+u+'\n'+document.images[n].alt.toString().replace(RegExp(/\.\n(\.\n)+/),'\n'))

Any questions, let me know! –Tantek

If you want an easy drag-and-drop version, just drag the button below into your browser’s bookmark bar.

✁ Instagram

Editor’s note: Though we’ll try to keep the code in this bookmarklet updated, the most recent version can be found on the Indieweb wiki thought the link above.

Reply to Scott Kingery about Wallabag and Reading

Replied to a post by Scott KingeryScott Kingery (TechLifeWeb)
Chris, as a kind of sidebar to this, we talk about hosting things on our own site. I’ve always kind of thought this should be 1 piece of software we use for everything. I think that way becau…
Scott, as someone who’s studied evolutionary biology, I know that specialists in particular areas are almost always exponentially better at what they do than non-specialists.  This doesn’t mean that we don’t need alternate projects or new ideas which may result in new “Cambrian explosions,” and even better products.

I also feel that one needs the right tool for the right job. While I like WordPress for many things, it’s not always the best thing to solve the problem. In some cases Drupal or even lowly Wix may be the best solution. The key is to find the right balance of time, knowledge, capability and other variables to find the optimal solution for the moment, while maintaining the ability to change in the future if necessary. By a similar analogy there are hundreds of programming languages and all have their pros and cons.  Often the one you know is better than nothing, but if you heard about one that did everything better and faster, it would be a shame not to check it out.

This said, I often prefer to go with specialist software, though I do usually have a few requirements which overlap or align with Indieweb principles, including, but not limited to:

  • It should be open, so I can modify/change/share it with others
  • I should be able to own all the related/resultant data
  • I should be able to self-host it (if I want)
  • It should fit into my workflow and solve a problem I have while not creating too many new problems

In this case, I suspect that Wallabag is far better than anything I might have time to build and maintain myself. If there are bits of functionality that are missing, I can potentially request them or build/add them myself and contribute back to the larger good.

Naturally I do also worry about usability and maintenance as well, so if the general workflow and overhead doesn’t dovetail in with my other use cases, all bets may be off. If large pieces of my data, functionality, and workflow are housed in WordPress, for example, and something like this isn’t easily integrateable or very difficult to keep updated and maintain, then I’ll pass and look for (or build) a different solution. (Not every tool is right for just any job.) On larger projects like this, there’s also the happy serendipity that they’re big enough that WordPress (Drupal, Jekyll, other) developers can better shoehorn the functionality in to a bigger project or create a simple API thereby making the whole more valuable than the sum of the parts.

In this particular situation, it appears to be a 1-1 replacement for a closed silo version of something I’ve been using regularly, but which provides more of the benefits above than the silo does, so it seems like a no-brainer to switch.

 
To reply to this comment preferably do so on the original at: A New Reading Post-type for Bookmarking and Reading Workflow

Homebrew Website Club Meetup Pasadena/Los Angeles Notes from 8-24-16

Last night, shy a few regulars at the tail end of a slow August and almost on the eve of IndieWebCamp NY2, Angelo Gladding and I continued our biweekly Homebrew Website Club meetings.

We met at Charlie’s Coffee House, 266 Monterey Road, South Pasadena, CA, where we stayed until closing at 8:00. Deciding that we hadn’t had enough, we moved the party (South Pasadena rolls up their sidewalks early) over to the local Starbucks, 454 Fair Oaks Ave, South Pasadena, CA where we stayed until they closed at 11:00pm.

Quiet Writing Hour

Angelo manned the fort alone with aplomb while building intently. If I’m not mistaken, he did use my h-card to track down my phone number to see what was holding me up, so as they say in IRC: h-card++!

Introductions and Demonstrations

Participants included:

Needing no introductions this week, Angelo launched us off with a relatively thorough demo of his Canopy platform which he’s built from the ground up in python! Starting from an empty folder on a host with a domain name, he downloaded and installed his code directly from Github and spun up a completely new version of his site in under 2 minutes. In under 20 minutes of some simple additional downloads and configuration of a few files, he also had locations, events, people and about modules up and running. Despite the currently facile appearance of his website, there’s really a lot of untapped power in what he’s built so far. It’s all available on Github for those interested in playing around; I’m sure he’d appreciate pull requests.

Along the way, I briefly demoed some of the functionality of Kevin Marks’ deceptively powerful Noterlive web app for not only live tweeting, but also owning those tweets on one’s own site in a simple way after the fact (while also automatically including proper markup and microformats)! I also ran through some of the overall functionality of my Known install with a large number of additional plugins to compare and contrast UX/UI with respect to Canopy.

We also discussed a bit of Angelo’s recent Indieweb Graph network crawling project, and I took the opportunity to fix a bit of the representative h-card on my site. (Angelo, does a new crawl appear properly on lahacker.net now?)

Before leaving Charlie’s we did manage to remember to take a group photo this time around. Not having spent enough time chatting over the past few weeks, we decamped to a local Starbucks and continued our conversation along with some addition brief demos and discussion of other itches for future building.

We also spent a few minutes discussing the upcoming IndieWebCamp LA logistics for November as well as outreach to the broader Los Angeles area dev communities. If you’re interested in attending, please RSVP. If you’d like to volunteer or help sponsor the camp, please don’t hesitate to contact either of us. I’m personally hoping to attend DrupalCamp LA this weekend while wearing a stylish IndieWebCamp t-shirt that’s already on its way to me.

IndieWebCamp T-shirt
IndieWebCamp T-shirt

Next Meeting

In keeping with the schedule of the broader Homebrew movement, so we’re already committed to our next meeting on September 7. It’s tentatively at the same location unless a more suitable one comes along prior to then. Details will be posted to the wiki in the next few days.

Thanks for coming everyone! We’ll see you next time.

Live Tweets Archive


Though not as great as the notes that Kevin Marks manages to put together, we did manage to make good use of noterlive for a few supplementary thoughts:

Chris Aldrich:

On my way to Homebrew Website Club Los Angeles in moments. http://stream.boffosocko.com/2016/homebrew-website-club-la-2016-08-24 #

Angelo Gladding:

I’ve torn some things down, but slowly rebuilding. I’m just minutes away from rel-me to be able to log into wiki #

ChrisAldrich:

Explaining briefly how @kevinmarksnoterlive.com works for live tweeting events… #

Angelo Gladding:

My github was receiving some autodumps from a short-lived indieweb experiment. #

is describing his canopy system used to build his site #

Canopy builds in a minute and 52 secs… inside are folders roots and trunk w/ internals #

Describing how he builds in locations to Canopy #

Apparently @t has a broken certificate for https, so my parser gracefully falls back to http instead. #

 

Reply to: Getting started owning your digital home by Chris Hardie

Replied to Getting started owning your digital home by Chris HardieChris Hardie (Chris Hardie)
My recent post about owning our digital homes prompted some good feedback and discussion. When I talk about this topic with the people in my life who don't work daily in the world of websites, domain names and content management, the most common reaction I get is, "that's sounds good in theory, I'm not sure … Continue reading Getting started owning your digital home
Chris, I came across your post today by way of Bob Waldron’s post WordPress: Default Personal Digital Home (PDH).

Both his concept and that of your own post fit right into the broader themes and goals of the Indieweb community. If you weren’t aware of the movement, I highly recommend you take a look at its philosophies and goals.

There’s already a pretty strong beachhead established for WordPress within the Indieweb community including a suite of plugins for helping to improve your personal web presence, but we’d certainly welcome your additional help as the idea seems right at home with your own philosophy.

I’m happy to chat with you about the group via website, phone, email, IRC, or social media at your leisure if you’re interested in more information. I’m imminently findable via details on my homepage.


A New Reading Post-type for Bookmarking and Reading Workflow

This morning while breezing through my Woodwind feed reader, I ran across a post by Rick Mendes with the hashtags and which put me down a temporary rabbit hole of thought about reading-related post types on the internet.

I’m obviously a huge fan of reading and have accounts on GoodReads, Amazon, Pocket, Instapaper, Readability, and literally dozens of other services that support or assist the reading endeavor. (My affliction got so bad I started my own publishing company last year.)

READ LATER is an indication on (or relating to) a website that one wants to save the URL to come back and read the content at a future time.

I started a page on the IndieWeb wiki to define read later where I began writing some philosophical thoughts. I decided it would be better to post them on my own site instead and simply link back to them. As a member of the Indieweb my general goal over time is to preferentially quit using these web silos (many of which are listed on the referenced page) and, instead, post my reading related work and progress here on my own site. Naturally, the question becomes, how does one do this in a simple and usable manner with pretty and reasonable UX/UI for both myself and others?

Current Use

Currently I primarily use a Pocket bookmarklet to save things (mostly newspaper articles, magazine pieces, blog posts) for reading later and/or the like/favorite functionality in Twitter in combination with an IFTTT recipe to save the URL in the tweet to my Pocket account. I then regularly visit Pocket to speed read though articles. While Pocket allows downloading of (some) of one’s data in this regard, I’m exploring options to bring in the ownership of this workflow into my own site.

For more academic leaning content (read journal articles), I tend to rely on an alternate Mendeley-based workflow which also starts with an easy-to-use bookmarklet.

I’ve also experimented with bookmarking a journal article and using hypothes.is to import my highlights from that article, though that workflow has a way to go to meet my personal needs in a robust way while still allowing me to own all of my own data. The benefit is that fixing it can help more than just myself while still fitting into a larger personal workflow.

Brainstorming

A Broader Reading (Parent) Post-type

Philosophically a read later post-type could be considered similar to a (possibly) unshared or private bookmark with potential possible additional meta-data like: progress, date read, notes, and annotations to be added after the fact, which then technically makes it a read post type.

A potential workflow viewed over time might be: read later >> bookmark >> notes/annotations/marginalia >> read >> review. This kind of continuum of workflow might be able to support a slightly more complex overall UI for a more simplified reading post-type in which these others are all sub-types. One could then make a single UI for a reading post type with fields and details for all of the sub-cases. Being updatable, the single post could carry all the details of one’s progress.

Indieweb encourages simplicity (DRY) and having the fewest post-types possible, which I generally agree with, but perhaps there’s a better way of thinking of these several types. Concatenating them into one reading type with various data fields (and the ability of them to be public/private) could allow all of the subcategories to be included or not on one larger and more comprehensive post-type.

Examples
  1. Not including one subsection (or making it private), would simply prevent it from showing, thus one could have a traditional bookmark post by leaving off the read later, read, and review sub-types and/or data.
  2. As another example, I could include the data for read later, bookmark, and read, but leave off data about what I highlighted and/or sub-sections of notes I prefer to remain private.

A Primary Post with Webmention Updates

Alternately, one could create a primary post (potentially a bookmark) for the thing one is reading, and then use further additional posts with webmentions on each (to the original) thereby adding details to the original post about the ongoing progress. In some sense, this isn’t too far from the functionality provided by GoodReads with individual updates on progress with brief notes and their page that lists the overall view of progress. Each individual post could be made public/private to allow different viewerships, though private webmentions may be a hairier issue. I know some are also experimenting with pushing updates to posts via micropub and other methods, which could be appealing as well.

This may be cumbersome over time, but could potentially be made to look something like the GoodReads UI below, which seems very intuitive. (Note that it’s missing any review text as I’m currently writing it, and it’s not public yet.)

Overview of reading progress
Overview of reading progress

Other Thoughts

Ideally, better distinguishing between something that has been bookmarked and read/unread with dates for both the bookmarking and reading, as well as potentially adding notes and highlights relating to the article is desired. Something potentially akin to Devon Zuegel‘s “Notes” tab (built on a custom script for Evernote and Tumblr) seems somewhat promising in a cross between a simple reading list (or linkblog) and a commonplace book for academic work, but doesn’t necessarily leave room for longer book reviews.

I’ll also need to consider the publishing workflow, in some sense as it relates to the reverse chronological posting of updates on typical blogs. Perhaps a hybrid approach of the two methods mentioned would work best?

Potentially having an interface that bolts together the interface of GoodReads (picture above) and Amazon’s notes/highlights together would be excellent. I recently noticed (and updated an old post) that they’re already beta testing such a beast.

Kindle Notes and Highlights are now shoing up as a beta feature in GoodReads
Kindle Notes and Highlights are now shoing up as a beta feature in GoodReads

Comments

I’ll keep thinking about the architecture for what I’d ultimately like to have, but I’m always open to hearing what other (heavy) readers have to say about the subject and the usability of such a UI.

Please feel free to comment below, or write something on your own site (which includes the URL of this post) and submit your URL in the field provided below to create a webmention in which your post will appear as a comment.

 

I now proudly own all of the data from my Tumbr posts on my own domain. #Indieweb #ownyourdata #PESOS

I now proudly own all of the data from my Tumbr posts on my own domain. #Indieweb #ownyourdata #PESOS

Reply to Something the NIH can learn from NASA

Replied to Something the NIH can learn from NASA by Lior Pachter (& Comments by Donald Forsdyke)Lior Pachter (& Comments by Donald Forsdyke) (Bits of DNA)
Pubmed Commons provides a forum, independent of a journal, where comments on articles in that journal can be posted. Why not air your displeasure there? The article is easily found (see PMID: 27467019) and, so far, there are no comments.
I’m hoping that one day (in the very near future) that scientific journals and other science communications on the web will support the W3C’s Webmention candidate specification so that when commentators [like Lior, in this case, above] post something about an article on their site, that the full comment is sent to the original article to appear there automatically. This means that one needn’t go to the site directly to comment (and if the comment isn’t approved, then at least it still lives somewhere searchable on the web).

Some journals already count tweets, and blog mentions (generally for PR reasons) but typically don’t allow access to finding them on the web to see if they indicate positive or negative sentiment or to further the scientific conversation.

I’ve also run into cases in which scientific journals who are “moderating” comments, won’t approve reasoned thought, but will simultaneously allow (pre-approved?) accounts to flame every comment that is approved [example on Sciencemag.org: http://boffosocko.com/2016/04/29/some-thoughts-on-academic-publishing/ — see also comments there], so having the original comment live elsewhere may be useful and/or necessary depending on whether the publisher is a good or bad actor, or potentially just lazy.

I’ve also seen people use commenting layers like hypothes.is or genius.com to add commentary directly on journals, but these layers are often hidden to most. The community certainly needs a more robust commenting interface. I would hope that a decentralized version using web standards like Webmentions might be a worthwhile and robust solution.

Homebrew Website Club Meetup Pasadena/Los Angeles 8/10/16

Last night we continued the blossoming group of indiewebbers meeting up on the East side of the Los Angeles Area, leading up to IndieWeb Camp Los Angeles in November.

We met at Charlie’s Coffee House, 266 Monterey Road, Pasadena, CA.

Quiet Writing Hour

The quiet writing hour started off quiet with Angelo holding down the fort while others were stuck in interminable traffic, but if the IRC channel is any indication, he got some productive work done.

Introductions and Quick Demonstrations

Participants included:

Following introductions, I did a demo of the browser-based push notifications I enabled on this site about a week ago and discussed some pathways to help others explore options for doing so on theirs. Coincidentally, WordPress.com just unveiled some functionality like this yesterday that is more site-owner oriented than user oriented, so I’ll be looking into that functionality shortly.

Angelo showed off some impressive python code which he’s preparing to opensource, but just before the meeting had managed to completely bork his site, so everyone got a stunning example of a “502 Bad Gateway” notice.

At the break, we were so engaged we all completely forgot to either take a break or do the usual group photo. My 1 minute sketch gives a reasonable facsimile of what a photo would have looked like.

Peer-to-Peer Building and Help

With a new group, we spent some time discussing some general Indieweb principles, outlining ideas, and example projects.

Since Michael was very new to the group, we helped him install the WordPress IndieWeb plugin and configure a few of the sub-plugins to get him started. We discussed some basic next steps and pointers to the WordPress documentation to provide him some direction for building until we meet again.

We spent a few minutes discussing the upcoming IndieWebCamp logistics as well as outreach to the broader Los Angeles area community.

Next Meeting

For a new group, there’s enough enthusiasm to do at least two meetings a month, in keeping with the broader Homebrew movement, so we’re already committed to our next meeting on August 24. It’s tentatively at the same location unless a more suitable one comes along prior to then.

Thanks for coming everyone! We’ll see you next time.