The Facebook Algorithm Mom Problem

How I temporarily cut my mom out of my social media life to reach a larger audience.

POSSE

For quite a while now, I’ve been publishing most of my content to my personal website first and syndicating copies of it to social media silos like Twitter, Instagram, Google+, and Facebook. Within the Indieweb community this process is known as POSSE an acronym for Post on your Own Site, Syndicate Elsewhere.

The Facebook Algorithm

Anecdotally most in social media have long known that doing this type of workflow causes your content to be treated like a second class citizen, particularly on Facebook which greatly prefers that users post to it manually or using one of its own apps rather than via API. [1][2][3][4] This means that the Facebook algorithm that decides how big an audience a piece of content receives, dings posts which aren’t posted manually within their system. Simply put, if you don’t post it manually within Facebook, not as many people are going to see it.

Generally I don’t care too much about this posting “tax” and happily use a plugin called Social Media Network Auto Poster (aka SNAP) to syndicate my content from my WordPress site to up to half a dozen social silos.

What I have been noticing over the past six or more months is an even more insidious tax being paid for posting to Facebook. I call it “The Facebook Algorithm Mom Problem”.

Here’s what’s happening

I write my content on my own personal site. I automatically syndicate it to Facebook. My mom, who seems to be on Facebook 24/7, immediately clicks “like” on the post. The Facebook algorithm immediately thinks that because my mom liked it, it must be a family related piece of content–even if it’s obviously about theoretical math, a subject in which my mom has no interest or knowledge. (My mom has about 180 friends on Facebook; 45 of them overlap with mine and the vast majority of those are close family members).

The algorithm narrows the presentation of the content down to very close family. Then my mom’s sister sees it and clicks “like” moments later. Now Facebook’s algorithm has created a self-fulfilling prophesy and further narrows the audience of my post. As a result, my post gets no further exposure on Facebook other than perhaps five people–the circle of family that overlaps in all three of our social graphs. Naturally, none of these people love me enough to click “like” on random technical things I think are cool. I certainly couldn’t blame them for not liking these arcane topics, but shame on Facebook for torturing them for the exposure when I was originally targeting maybe 10 other colleagues to begin with.

This would all be okay if the actual content was what Facebook was predicting it was, but 99% of the time, it’s not the case. In general I tend to post about math, science, and other random technical subjects. I rarely post about closely personal things which are of great interest to my close family members. These kinds of things are ones which I would relay to them via phone or in person and not post about publicly.

Posts only a mother could love

I can post about arcane areas like Lie algebras or statistical thermodynamics, and my mom, because she’s my mom, will like all of it–whether or not she understands what I’m talking about or not. And isn’t this what moms do?! What they’re supposed to do? Of course it is!

mom-autolike (n.)–When a mother automatically clicks “like” on a piece of content posted to social media by one of their children, not because it has any inherent value, but simply because the content came from their child.

She’s my mom, she’s supposed to love me unconditionally this way!

The problem is: Facebook, despite the fact that they know she’s my mom, doesn’t take this fact into account in their algorithm.

What does this mean? It means either I quit posting to Facebook, or I game the system to prevent these mom-autolikes.

Preventing mom-autolikes

I’ve been experimenting. But how?

Facebook allows users to specifically target their audience in a highly granular fashion from the entire public to one’s circle of “friends” all the way down to even one or two specific people. Even better, they’ll let you target pre-defined circles of friends and even exclude specific people. So this is typically what I’ve been doing to end-around my Facebook Algorithm Mom problem. I have my site up set to post to either “Friends except mom” or “Public except mom”. (Sometimes I exclude my aunt just for good measure.) This means that my mom now can’t see my posts when I publish them!

Facebook will let you carefully and almost surgically define who can see your posts.

What a horrible son

Don’t jump the gun too quickly there Bubbe! I come back at the end of the day after the algorithm has run its course and my post has foreseeably reached all of the audience it’s likely to get. At that point, I change the audience of the post to completely “Public”.

You’ll never guess what happens next…

Yup. My mom “likes” it!

I love you mom. Thanks for all your unconditional love and support!!

Even better, I’m happy to report that generally the intended audience which I wanted to see the post actually sees it. Mom just gets to see it a bit later.

Dear Facebook Engineering

Could you fix this algorithm problem please? I’m sure I’m not the only son or daughter to suffer from it.

Have you noticed this problem yourself? I’d love to hear from others who’ve seen a similar effect and love their mothers (or other close loved ones) enough to not cut them out of their Facebook lives.

References

[1]
R. Tippens, “Drop the Autobot: Manual Posting to Facebook Outperforms Automated,” ReadWrite, 01-Aug-2011. [Online]. Available: https://readwrite.com/2011/08/01/manually_posting_to_facebook_significantly_outperf/. [Accessed: 11-Jul-2017]
[2]
“How to Increase Your Traffic from Facebook by 650% in 5 Seconds,” WPMUDEV, 02-Aug-2011. [Online]. Available: https://premium.wpmudev.org/blog/how-to-increase-your-traffic-from-facebook-by-650-in-5-seconds/. [Accessed: 11-Jul-2017]
[3]
J. D. Lasica, “Demystifying how Facebook’s news feeds work,” SocialMedia.biz, 11-Feb-2011. [Online]. Available: http://socialmedia.biz/2011/02/07/how-facebook-news-feeds-work/. [Accessed: 11-Jul-2017]
[4]
D. Hay, “Will auto-posting stunt the reach of your Facebook posts?,” SocialMedia.biz, 26-Jul-2011. [Online]. Available: http://socialmedia.biz/2011/07/26/will-auto-posting-stunt-the-reach-of-your-facebook-posts/. [Accessed: 11-Jul-2017]
Syndicated copies to:

👓 The IndieWeb Movement Will Help People Control Their Own Web Presence? | Future Hosting

The IndieWeb Movement Will Help People Control Their Own Web Presence? by Matthew Davis (Future Hosting)
The early vision of the web was one of a decentralized and somewhat anarchic community where we each had control over our own content and our own online presence — that’s a vision that Tim Berners-Lee still endorses, but it’s one that’s put in jeopardy by the relentless centralizing tendency of big companies. And that’s why I find the Indie Web movement so interesting — not as a rejection of the corporate influence, but as a much needed counterbalance that provides the technology for people, should they so choose, to build an online presence of their own devising without giving up the communities and the connections that they have built on existing networks.

A short and succinct definition of the movement and just a few of the positive pieces. I think the movement is further along than the author gives it credit for though.

Syndicated copies to:

Twitter List from #Domains17

Twitter List from #Domains17 by Chris Aldrich (Twitter)
Teachers, educators, researchers, technologists using open technologies in education #openEd, #edTech, #DoOO, #indieweb

I’ve compiled a twitter list of people related to #openEd, #edTech, #DoOO, #indieweb, and related topics who tweeted about #domains17 in the past week. The list has multiple views including members and by tweets.

Feel free to either subscribe to the list (useful when adding streams to things like Tweetdeck), or for quickly scanning down the list and following people on a particular topic en-masse. Hopefully it will help people to remain connected following the conference. I’ve written about some other ideas about staying in touch here.

If you or someone you know is conspicuously missing, please let me know and I’m happy to add them. Hopefully this list will free others from spending the inordinate amount of time to create similar bulk lists from the week.

Syndicated copies to:

My reply to Micro.blog Project Surges Past $65K on Kickstarter, Gains Backing from DreamHost | WordPress Tavern

Micro.blog Project Surges Past $65K on Kickstarter, Gains Backing from DreamHost by Sarah Gooding (WordPress Tavern)
With one week remaining on its Kickstarter campaign, the Micro.blog indie microblogging project has surged past its original $10K funding goal with $66,710 pledged by 2,381 backers. This puts proje…

I love that Micro.blog is doing so well on Kickstarter! I’m even more impressed that DreamHost is backing this and doubling down in this area.

I coincidentally happened to have a great conversation yesterday with Jonathan LaCour before I saw the article and we spoke about what DreamHost is doing in the realm of IndieWeb and WordPress. I love their approach and can’t wait to see what comes out of their work and infectious enthusiasm.

I’m really surprised that WordPress hasn’t more aggressively taken up technologies like Webmention, which is now a W3C recommendation, or micropub and put them directly into core. For the un-initiated, Webmention works much like @mention on Twitter, Medium, Facebook, and others, but is platform independent, which means you can use it to ping any website on the internet that supports it. Imagine if you could reply to someone on Twitter from your WordPress site? Or if you could use Facebook to reply to a post on Medium? (And I mean directly and immediately in the type @mention/hit publish sense, not doing any laborious cut and paste from one platform to another nonsense that one is forced to do now because all the social silos/walled gardens don’t inter-operate nicely, if at all.) Webmention can make all that a reality.  Micropub is a platform independent spec that allows one to write standalone web or mobile apps to create publishing interfaces to publish almost any type of content to any platform–think about the hundreds of apps that could publish to Twitter in its early days, now imagine expanding that to being able to use those to publish to any platform anywhere?

While Twitter has been floundering for a while, WordPress has the structure, ecosystem, and a huge community to completely eat Twitter’s (and even Facebook/ Instagram’s, Medium’s, & etc.) lunch not only in the microblog space, but the larger space which includes blogging, photos, music, video, audio, and social media in general. The one piece they’re missing is a best-in-class integrated feed reader, which, to be honest, is the centerpiece of both Twitter and Facebook’s services. They seem to be 98% readers and 2% dead-simple posting interface while WordPress is 98% posting interface (both more sophisticated/flexible and more complicated), and nearly non-existent (and unbundled) reader.

WordPress has already got one of the best and most ubiquitous publishing platforms out there (25+% of the web at last count). Slimming down their interface a tad to make it dead simple for my mom to post, or delegating this to UX/UI developers with micropub the way that Twitter allowed in the early days with their open API and the proliferation of apps and interfaces to post to twitter, in addition to Webmentions could create a sea-change in the social space. Quill is a good, yet simple example of an alternate posting interface which I use for posting to WordPress. Another is actually Instagram itself, which I use in conjunction with OwnYourGram which has micropub baked in for posting photos to my site with Instagram’s best-in-class mobile interface. Imagine just a handful of simple mobile apps that could be customized for dead-simple, straightforward publishing to one’s WordPress site for specific post types or content types…

With extant WordPress plugins, a lot of this is already here, it’s just not evenly distributed yet, to borrow the sentiment from William Gibson.

For just a few dollars a year, everyday people could more easily truly own all their content and have greater control over their data and their privacy.

I will note that it has been interesting and exciting seeing the Drupal community stepping on the gas on the Webmention spec (in two different plugins) since the W3C gave it recommendation status earlier this month. This portends great things for the independent web.

I haven’t been this excited about what the web can bring to the world in a long, long time.

Syndicated copies to:

A WordPress plugin for posting to IndieNews

WordPress IndieNews by Matthias Pfefferle (github.com)
Automatically send webmentions to IndieNews

I just noticed that Matthias Pfefferle has kindly created a little WordPress plugin for posting to IndieNews.

Syndicated copies to:

Reply to Manton Reece: This morning I launched the Kickstarter project for Micro.blog. Really happy with the response. Thank you, everyone!

Manton Reece by Manton Reece (manton.org)
This morning I launched the Kickstarter project for Micro.blog. Really happy with the response. Thank you, everyone!

Manton, I’ve been following your blog and your indieweb efforts for creating a microblogging platform for a while. I’m excited to see your Kickstarter effort doing so well this afternoon!

As a fellow IndieWeb proponent, and since I know how much work such an undertaking can be, I’m happy to help you with the e-book and physical book portions of your project on a voluntary basis if you’d like. I’ve got a small publishing company set up to handle the machinery of such an effort as well as being able to provide services that go above and beyond the usual low-level services most self-publishing services might provide. Let me know if/how I can help.

Syndicated copies to:

Chris Aldrich is reading “Maybe the Internet Isn’t a Fantastic Tool for Democracy After All”

Maybe the Internet Isn’t a Fantastic Tool for Democracy After All by Max Read (Select All)
Fake news is the easiest of the problems to fix.

…a new set of ways to report and share news could arise: a social network where the sources of articles were highlighted rather than the users sharing them. A platform that makes it easier to read a full story than to share one unread. A news feed that provides alternative sources and analysis beneath every shared article.

This sounds like the kind of platforms I’d like to have. Reminiscent of some of the discussion at the beginning of This Week in Google: episode 379 Ixnay on the Eet-tway.

I suspect that some of the recent coverage of “fake news” and how it’s being shared on social media has prompted me to begin using Reading.am, a bookmarking-esqe service that commands that users to:

Share what you’re reading. Not what you like. Not what you find interesting. Just what you’re reading.

Naturally, in IndieWeb fashion, I’m also posting these read articles to my site. While bookmarks are things that I would implicitly like to read in the near future (rather than “Christmas ornaments” I want to impress people with on my “social media Christmas tree”), there’s a big difference between them and things that I’ve actually read through and thought about.

I always feel like many of my family, friends, and the general public click “like” or “share” on articles in social media without actually having read them from top to bottom. Research would generally suggest that I’m not wrong. [1] [2] Some argue that the research needs to be more subtle too. [3] I generally refuse to participate in this type of behavior if I can avoid it.

Some portion of what I physically read isn’t shared, but at least those things marked as “read” here on my site are things that I’ve actually gone through the trouble to read from start to finish. When I can, I try to post a few highlights I found interesting along with any notes/marginalia (lately I’m loving the service Hypothes.is for doing this) on the piece to give some indication of its interest. I’ll also often try to post some of my thoughts on it, as I’m doing here.

Gauging Intent of Social Signals

I feel compelled to mention here that on some platforms like Twitter, that I don’t generally use the “like” functionality there to indicate that I’ve actually liked a tweet itself or any content that’s linked to in it. In fact, I’ve often not read anything related to the tweet but the simple headline presented in the tweet itself.

The majority of the time I’m liking/favoriting something on Twitter, it’s because I’m using an IFTTT.com applet which takes the tweets I “like” and saves them to my Pocket account where I come back to them later to read. It’s not the case that I actually read everything in my pocket queue, but those that I do read will generally appear on my site.

There are however, some extreme cases in which pieces of content are a bit beyond the pale for indicating a like on, and in those cases I won’t do so, but will manually add them to my reading queue. For some this may create some grey area about my intent when viewing things like my Twitter likes. Generally I’d recommend people view that feed as a generic linkblog of sorts. On Twitter, I far more preferred the nebulous star indicator over the current heart for indicating how I used and continue to use that bit of functionality.

I’ll also mention that I sometimes use the like/favorite functionality on some platforms to indicate to respondents that I’ve seen their post/reply. This type of usage could also be viewed as a digital “Thank You”, “hello”, or even “read receipt” of sorts since I know that the “like” intent is pushed into their notifications feed. I suspect that most recipients receive these intents as I intend them though the Twitter platform isn’t designed for this specifically.

I wish that there was a better way for platforms and their readers to better know exactly what the intent of the users’ was rather than trying to intuit them. It would be great if Twitter had the ability to allow users multiple options under each tweet to better indicate whether their intent was to bookmark, like, or favorite it, or to indicate that they actually read/watched the content on the other end of the link in the tweet.

In true IndieWeb fashion, because I can put these posts on my own site, I can directly control not only what I post, but I can be far more clear about why I’m posting it and give a better idea about what it means to me. I can also provide footnotes to allow readers to better see my underlying sources and judge for themselves their authenticity and actual gravitas. As a result, hopefully you’ll find no fake news here.

Of course some of the ensuing question is: “How does one scale this type of behaviour up?”

References

[1]
M. Gabielkov, A. Ramachandran, A. Chaintreau, and A. Legout, “Social Clicks: What and Who Gets Read on Twitter?,” SIGMETRICS Perform. Eval. Rev., vol. 44, no. 1, pp. 179–192, Jun. 2016 [Online]. Available: http://doi.acm.org/10.1145/2964791.2901462
[2]
C. Dewey, “6 in 10 of you will share this link without reading it, a new, depressing study says,” Washington Post, 16-Jun-2016. [Online]. Available: https://www.washingtonpost.com/news/the-intersect/wp/2016/06/16/six-in-10-of-you-will-share-this-link-without-reading-it-according-to-a-new-and-depressing-study/. [Accessed: 06-Dec-2016]
[3]
T. Cigelske  , “Why It’s OK to Share This Story Without Reading It ,” MediaShift, 24-Jun-2016. [Online]. Available: http://mediashift.org/2016/06/why-its-ok-to-share-this-story-without-reading-it/. [Accessed: 06-Dec-2016]
Syndicated copies to:

Chris Aldrich is reading “My 2017-01-01 #IndieWeb Commitment: Own All My RSVPs To Public Events” by Tantek Çelik

My 2017-01-01 #IndieWeb Commitment: Own All My RSVPs To Public Events by Tantek ÇelikTantek Çelik (tantek.com)(2016 years 10 months 25 days 43 minutes)
My commitment for 2017 is to always, 100% of the time, post RSVPs to public events on my own site first, and only secondarily (manually if I must) RSVP to silo (social media) event URLs. What’s your 2017-01-01 #indieweb commitment?

I love the idea of making an IndieWeb resolution for the New Year. Time to put my thinking cap on and decide which of the 100s of itches it’s (they’re?) going to be?

Syndicated copies to:

Chris Aldrich is reading “Self-Hosting kylewm’s Woodwind Indie Reader”

Self-Hosting kylewm's Woodwind Indie Reader by Marty McGuireMarty McGuire (martymcgui.re)
One of my favorite aspects of the IndieWeb community is that when you get things
Syndicated copies to:

Chris Aldrich is reading “Let’s replace Twitter with something much better.”

Let's replace Twitter with something much better. by Charl BothaCharl Botha (cpbotha.net)(1 day 9 hours 9 minutes 32 seconds)
I love that by following certain people, my timeline has become a stream of interesting and entertaining information. I love that sometimes I am able to fit my little publication just so into the 140 characters given to me.
Syndicated copies to:

I Voted 🇺🇸

I voted in the November 8th, 2016 Election! 🇺🇸

 


After having spent the weekend at IndieWebCamp Los Angeles, it somehow seems appropriate to have a “Voted post type” for the election today†. To do it I’m proposing the following microformats, an example of which can be found in the mark up of the post above. This post type is somewhat similar to both a note/status update and an RSVP post type with a soupçon of checkin.

  1. Basic markup

<div class="h-entry">
<span class="p-voted">I voted</span>
in the <a href="http://example.com/election" class="u-voted-in">November 8th, 2016 Election</a>
</div>

Possible Voted values: I voted, I didn’t vote, I was disenfranchised, I was intimidated, I was apathetic, I pathetically didn’t bother to register

  1. Send a Webmention to the election post of your municipality’s Registrar/Clerk/Records office as you would for a reply to any post.
  2. You should include author information in your Voted post so the registrar knows who voted (and then send another Webmention so the voting page gets the update).

Here’s another example with explicit author name and icon, in case your site or blog does not already provide that on the page.

<div class="h-entry">
<a class="p-author h-card" href="http://mysite.example.org">
<img alt="" src="http://mysite.example.org/icon.jpg"/>
Supercool Indiewebvoter</a>:
<span class="p-voted">I voted</span>
to <a href="http://example.com/election" class="u-voted-in">IndieWeb Election </a>
</div>

You can also use the data element to express the meaning behind the literal p-voted value while providing your own visible human readable language:

<data class="p-voted" value="I voted">I voted for the first female president today!

Finally, feel free to POSSE to multiple social media networks to encourage your friends and family to vote today.


† I’m being a bit facetious and doing this in fun. But it does invite some interesting speculation…

Syndicated copies to:

🔖 Want to read: Personal Archiving: Preserving Our Digital Heritage by Donald T. Hawkins

🔖 Want to read: Personal Archiving: Preserving Our Digital Heritage by Donald T. Hawkins

H/T to Sawyer Hollenshead.

This may also be of interest to those who’ve attended Dodging the Digital Memory Hole related events as well as those in the IndieWeb who may be concerned about their data living beyond them.

Personal Archiving: Preserving Our Digital Heritage by Donald T. Hawkins
Personal Archiving: Preserving Our Digital Heritage
by Donald T. Hawkins
Syndicated copies to:

Notes from Day 1 of Dodging the Memory Hole: Saving Online News | Thursday, October 13, 2016

Some quick thoughts and an archive of my Twitter notes during the day

Today I spent most of the majority of the day attending the first of a two day conference at UCLA’s Charles Young Research Library entitled “Dodging the Memory Hole: Saving Online News.” While I knew mostly what I was getting into, it hadn’t really occurred to me how much of what is on the web is not backed up or archived in any meaningful way. As a part of human nature, people neglect to back up any of their data, but huge swaths of really important data with newsworthy and historic value is being heavily neglected. Fortunately it’s an interesting enough problem to draw the 100 or so scholars, researchers, technologists, and journalists who showed up for the start of an interesting group being conglomerated through the Reynolds Journalism Institute and several sponsors of the event.

What particularly strikes me is how many of the philosophies of the IndieWeb movement and tools developed by it are applicable to some of the problems that online news faces. I suspect that if more journalists were practicing members of the IndieWeb and used their sites not only for collecting and storing the underlying data upon which they base their stories, but to publish them as well, then some of the (future) archival process may be easier to accomplish. I’ve got so many disparate thoughts running around my mind after the first day that it’ll take a bit of time to process before I write out some more detailed thoughts.

Twitter List for the Conference

As a reminder to those attending, I’ve accumulated a list of everyone who’s tweeted with the hashtag #DtMH2016, so that attendees can more easily follow each other as well as communicate online following our few days together in Los Angeles. Twitter also allows subscribing to entire lists too if that’s something in which people have interest.

Archiving the day

It seems only fitting that an attendee of a conference about saving and archiving digital news, would make a reasonable attempt to archive some of his experience right?! Toward that end, below is an archive of my tweetstorm during the day marked up with microformats and including hovercards for the speakers with appropriate available metadata. For those interested, I used a fantastic web app called Noter Live to capture, tweet, and more easily archive the stream.

Note that in many cases my tweets don’t reflect direct quotes of the attributed speaker, but are often slightly modified for clarity and length for posting to Twitter. I have made a reasonable attempt in all cases to capture the overall sentiment of individual statements while using as many original words of the participant as possible. Typically, for speed, there wasn’t much editing of these notes. I’m also attaching .m4a audio files of most of the audio for the day (apologies for shaky quality as it’s unedited) which can be used for more direct attribution if desired. The Reynolds Journalism Institute videotaped the entire day and livestreamed it. Presumably they will release the video on their website for a more immersive experience.

If you prefer to read the stream of notes in the original Twitter format, so that you can like/retweet/comment on individual pieces, this link should give you the entire stream. Naturally, comments are also welcome below.

Audio Files

Below are the audio files for several sessions held throughout the day.

Greetings and Keynote


Greetings: Edward McCain, digital curator of journalism, Donald W. Reynolds Journalism Institute (RJI) and University of Missouri Libraries and Ginny Steel, university librarian, UCLA
Keynote: Digital salvage operations — what’s worth saving? given by Hjalmar Gislason, vice president of data, Qlik

Why save online news? and NewsScape


Panel: “Why save online news?” featuring Chris Freeland, Washington University; Matt Weber, Ph.D., Rutgers, The State University of New Jersey; Laura Wrubel, The George Washington University; moderator Ana Krahmer, Ph.D., University of North Texas
Presentation: “NewsScape: preserving TV news” given by Tim Groeling, Ph.D., UCLA Communication Studies Department

Born-digital news preservation in perspective


Speaker: Clifford Lynch, Ph.D., executive director, Coalition for Networked Information on “Born-digital news preservation in perspective”

Live Tweet Archive

ChrisAldrich:

Getting Noter Live fired up for Dodging the Memory Hole 2016: Saving Online News https://www.rjionline.org/dtmh2016

Ginny Steel:

I’m glad I’m not at NBC trying to figure out the details for releasing THE APPRENTICE tapes.

Edward McCain:

Let’s thank @UCLA and the library for hosting us all.

While you’re here, don’t forget to vote/provide feedback throughout the day for IMLS

Someone once pulled up behind me and said “Hi Tiiiigeeerrr!” #Mizzou

A server at the Missourian crashed as the system was obsolete and running on baling wire. We lost 15 years of archives

The dean & head of Libraries created a position to save born digital news.

We’d like to help define stake-holder roles in relation to the problem.

Newspaper is really an outmoded term now.

I’d like to celebrate that we have 14 student scholars here today.

We’d like to have you identify specific projects that we can take to funding sources to begin work after the conference

We’ll be going to our first speaker who will be introduced by Martin Klein from Los Alamos.

Martin Klein:

Hjalmar Gislason is a self-described digital nerd. He’s the Vice President of Data.

I wonder how one becomes the President of Data?

Hjalmar Gislason:

My Icelandic name may be the most complicated part of my talk this morning.

Speaking on Digital Salvage Operations: What’s worth Saving”

My father in law accidentally threw away my wife’s favorite stuffed animal. #DeafTeddy

Some people just throw everything away because they’re not being used. Others keep everything and don’t throw it away.

The fundamental question: Do you want to save everything or do you want to get rid of everything?

I joined @qlik two years ago and moved to Boston.

Before that I was with spurl.net which was about saving copies of webpages they’d previously visited.

I had also previously invested in kjarninn which is translated as core.

We used to have little data, now we’re with gigantic data and moving to gargantuan data soon.

One of my goals today is to broaden our perspective about what data needs saving.

There’s the Web, the “Deep” Web, then there’s “Other” data which is at the bottom of the pyramid.

I got to see into the process of #panamapapers but I’d like to discuss the consequences from April 3rd.

The amount of meetings were almost more than could have been covered in real time in Iceland.

The #panamapapers were a soap opera, much like US politics.

Looking back at the process is highly interesting, but it’s difficult to look at all the data as they unfoldedd

How can we capture all the media minute by minute as a story unfolds.

You can’t trust that you can go back to a story at a certain time and know that it hasn’t been changed. #1984 #Orwell

There was a relatively pro-HRC piece earlier this year @NYTimes that was changed.

Newsdiffs tracks changes in news over time. The HRC article had changed a lot.

Let’s say you referenced @CNN 10 years ago, likely now, the CMS and the story have both changed.

8 years ago, I asked, wouldn’t we like to have the social media from Iceland’s only Nobel Laureate as a teenager?

What is private/public, ethical/unethical when dealing with data?

Much data is hidden behind passwords or on systems which are not easily accessed from a database perspective.

Most of the content published on Facebook isn’t public. It’s hard to archive in addition to being big.

We as archivists have no claim on the hidden data within Facebook.

ChrisAldrich:

The #indieweb could help archivists in the future in accessing more personal data.

Hjalmar Gislason:

Then there’s “other” data: 500 hours of video us uploaded to YouTube per minute.

No organization can go around watching all of this video data. Which parts are newsworthy?

Content could surface much later or could surface through later research.

Hornbjargsviti lighthouse recorded the weather every three hours for years creating lots of data.

And that was just one of hundreds of sites that recorded this type of data in Iceland.

Lots of this data is lost. Much that has been found was by coincidence. It was never thought to archive it.

This type of weather data could be very valuable to researchers later on.

There was also a large archive of Icelandic data that was found.

Showing a timelapse of Icelandic earthquakes https://vimeo.com/24442762

You can watch the magma working it’s way through the ground before it makes it’s way up through the land.

National Geographic featured this video in a documentary.

Sometimes context is important when it comes to data. What is archived today may be more important later.

As the economic crisis unfolded in Greece, it turned out the data that was used to allow them into EU was wrong.

The data was published at the time of the crisis, but there was no record of what the data looked like 5 years earlier.

Only way to recreate the data was to take prior printed sources. This is usu only done in extraordinary cirucumstances.

We captured 150k+ data sets with more than 8 billion “facts” which was just a tiny fraction of what exists.

How can we delve deeper into large data sets, all with different configurations and proprietary systems.

“There’s a story in every piece of data.”

Once a year energy consumption seems to dip because February has fewer days than other months. Plotting it matters.

Year over year comparisons can be difficult because of things like 3 day weekends which shift over time.

Here’s a graph of the population of Iceland. We’ve had our fair share of diseases and volcanic eruptions.

To compare, here’s a graph of the population of sheep. They outnumber us by an order(s) of magnitude.

In the 1780’s there was an event that killed off lots of sheep, so people had the upper hand.

Do we learn more from reading today’s “newspaper” or one from 30, 50, or 100 years ago?

There was a letter to the editor about an eruption and people had to move into the city.

letter: “We can’t have all these people come here, we need to build for our own people first.”

This isn’t too different from our problems today with respect to Syria. In that case, the people actually lived closer.

In the born-digital age, what will the experience look like trying to capture today 40 years hence?

Will it even be possible?

Machine data connections will outnumber “people” data connections by a factor of 10 or more very quickly.

With data, we need to analyze, store, and discard data. How do we decide in a spit-second what to keep & discard?

We’re back to the father-in-law and mother-in-law question: What to get rid of and what to save?

Computing is continually beating human tasks: chess, Go, driving a car. They build on lots more experience based on data

Whoever has the most data on driving cars and landscape will be the ultimate winner in that particular space.

Data is valuable, sometimes we just don’t know which yet.

Hoarding is not a strategy.

You can only guess at what will be important.

“Commercial use in Doubt” The third sub-headline in a newspaper about an early test of television.

There’s more to it than just the web.

Kate Zwaard:

Hoarding isn’t a strategy really resonates with librarians, what could that relationship look like?

Hjalmar Gislason:

One should bring in data science, industry may be ahead of libraries.

Cross-disciplinary approaches may be best. How can you get a data scientist to look at your problem? Get their attention?

Peter Arnett:

There’s 60K+ books about the Viet Nam War. How do we learn to integrate what we learn after an event (like that)?

Hjalmar Gislason:

Perspective always comes with time, as additional information arrives.

Scientific papers are archived in a good way, but the underlying data is a problem.

In the future you may have the ability to add supplementary data as a supplement what appears in a book (in a better way)

Archives can give the ability to have much greater depth on many topics.

Are there any centers of excellence on the topics we’re discussing today? This conference may be IT.

We need more people that come from the technical side of things to be watching this online news problem.

Hacks/Hackers is a meetup group that takes place all over the world.

It brings the journalists and computer scientists together regularly for beers. It’s some of the outreach we need.

Edward McCain:

If you’re not interested in money, this is a good area to explore. 10 minute break.

Don’t forget to leave your thoughts on the questions at the back of the room.

We’re going to get started with our first panel. Why is it important to save online news?

Matthew Weber:

I’m Matt Weber from Rugters University and in communications.

I’ll talk about web archives and news media and how they interact.

I worked at Tribune Corp. for several years and covered politics in DC.

I wanted to study the way in which the news media is changing.

We’re increadingly seeing digital only media with no offline surrogate.

It’s becomign increasingly difficult to do anything but look at it now as it exists.

There was no large scale online repository of online news to do research.

#OccupyWallStreet is one of the first examples of stories that exist online in ocurence and reportage.

There’s a growing need to archive content around local news particularly politics and democracy.

When there is a rich and vibrant local news environment, people are more likely to become engaged.

Local news is one of the least thought about from an archive perspective.

Laura Wrubel:

I’m at GWU Librarys in the scholarly technology group.

I’m involved in social feed manager which allows archivists to put together archives from social services.

Kimberly Gross, a faculty member, studies tweets of news outlets and journalists.

We created a prototype tool to allow them to collect data from social media.

Journalists were 2011 primarily using their Twitter presences to direct people to articles rather than for conversation

We collect data of political candidates.

Chris Freeland:

I’m an associate library and representing “Documenting the Now” with WashU, UCRiverside, & UofMd

Documenting the Now revolves around Twitter documentation.

It started with the Ferguson story and documenting media, videos during the protests in the community.

What can we as memory institutions do to capture the data?

We gathered 14million tweets relating to Ferguson within two weeks.

We tried to build a platform that others could use in the future for similar data capture relating to social.

Ethics is important in archiving this type of news data.

Ana Krahmer:

Digitally preserving pdfs from news organizations and hyper-local news in Texas.

We’re approaching 5million pages of archived local news.

What is news that needs to be archived, and why?

Matthew Weber:

First, what is news? The definition is unique to each individual.

We need to capture as much of the social news and social representation of news which is fragmented.

It’s an important part of society today.

We no longer produce hard copies like we did a decade ago. We need to capture the online portion.

Laura Wrubel:

We’d like to get the perspective of journalists, and don’t have one on the panel today.

We looked at how midterm election candidates used Twitter. Is that news itself? What tools do we use to archive it?

What does it mean to archive news by private citizens?

Chris Freeland:

Twitter was THE place to find information in St. Louis during the Ferguson protests.

Local news outlets weren’t as good as Twitter during the protests.

I could hear the protest from 5 blocks away and only found news about it on Twitter.

The story was bing covered very differently on Twitter than the local (mainstream) news.

Alternate voices in the mix were very interesting and important.

Twitter was in the moment and wasn’t being edited and causing a delay.

What can we learn from this massive number of Ferguson tweets.

It gives us information about organizing, and what language was being used.

Ana Krahmer:

I think about the archival portion of this question. By whom does it need to be archived?

What do we archive next?

How are we representing the current population now?

Who is going to take on the burden of archiving? Should it be corporate? Cultural memory institution?

Someone needs to currate it, who does that?

our next question: What do you view as primary barriers to news archiving?

Laura Wrubel:

How do we organize and staff? There’s no shortage of work.

Tools and software can help the process, but libraries are usually staffed very thinly.

No single institution can do this type of work alone. Collaboration is important.

Chris Freeland:

Two barriers we deal with: terms of service are an issue with archiving. We don’t own it, but can use it.

Libraries want to own the data in perpetuity. We don’t own our data.

There’s a disconnect in some of the business models for commercialization and archiving.

Issues with accessing data.

People were worried about becoming targets or losing jobs because of participation.

What is role of ethics of archiving this type of data? Allowing opting out?

What about redacting portions? anonymizing the contributions?

Ana Krahmer:

Publishers have a responsibility for archiving their product. Permission from publishers can be difficult.

We have a lot of underserved communities. What do we do with comments on stories?

Corporations may not continue to exist in the future and data will be lost.

Matthew Weber:

There’s a balance to be struck between the business side and the public good.

It’s hard to convince for profit about the value of archiving for the social good.

Chris Freeland:

Next Q: What opportunities have revealed themselves in preserving news?

Finding commonalities and differences in projects is important.

What does it mean to us to archive different media types? (think diversity)

What’s happening in my community? in the nation? across the world?

The long-history in our archives will help us learn about each other.

Ana Krahmer:

We can only do so much with the resources we have.

We’ve worked on a cyber cemetery product in the past.

Someone else can use the tools we create within their initiatives.

Chris Freeland:

repeating ?: What are issues in archiving longerform video data with regard to stories on Periscope?

Audience Question:

How do you channel the energy around archiving news archiving?

Matthew Weber:

Research in the area is all so new.

Audience Question:

Does anyone have any experience with legal wrangling with social services?

Chris Freeland:

The ACLU is waging a lawsuit against Twitter about archived tweets.

Ana Krahmer:

Outreach to community papers is very rhizomic.

Audience Question:

How do you take local examples and make them a national model?

Ana Krahmer:

We’re teenagers now in the evolution of what we’re doing.

Edward McCain:

Peter Arnett just said “This is all ore interesting than I thought it would be.”

Next Presentation: NewsScape: preserving TV news

Tim Groeling:

I’ll be talking about the NewsScape project of Francis Steen, Director, Communication Studies Archive

I’m leading the archiving of the analog portion of the collection.

The oldest of our collection dates from the 1950’s. We’ve hosted them on YouTube which has created some traction.

Commenters have been an issue with posting to YouTube as well as copyright.

NewsScape is the largest collecction of TV news and public affairs programs (local & national)

Prior to 2006, we don’t know what we’ve got.

Paul said “Ill record everytihing I can and someone in the future can deal with it.”

We have 50K hours of Betamax.

VHS are actually most threatened, despite being newest tapes.

Our budget was seriously strapped.

Maintaining closed captioning is important to our archiving efforts.

We’ve done 36k hours of encoding this year.

We use a layer of dead VCR’s over our good VCR’s to prevent RF interference and audio buzzing. 🙂

Post-2006 We’re now doing straight to digital

Preservation is the first step, but we need to be more than the world’s best DVR.

Searching the news is important too.

Showing a data visualization of news analysis with regard to the Heathcare Reform movement.

We’re doing facial analysis as well.

We have interactive tools at viz2016.com.

We’ve tracked how often candidates have smiled in election 2016. Hillary > Trump

We want to share details within our collection, but don’t have tools yet.

Having a good VCR repairman has helped us a lot.

Edward McCain:

Breaking for lunch…

Clifford Lynch:

Talk “Born-digital news preservation in perspective”

There’s a shared consensus that preserving scholarly publications is important.

While delivery models have shifted, there must be some fall back to allow content to survive publisher failure.

Preservation was a joint investment between memory institutions and publishers.

Keepers register their coverage of journals for redundancy.

In studying coverage, we’ve discovered Elsevier is REALLY well covered, but they’re not what we’re worried about.

It’s the small journals as edge cases that really need more coverage.

Smaller journals don’t have resources to get into the keeper services and it’s more expensive.

Many Open Access Journals are passion projects and heavily underfunded and they are poorly covered.

Being mindful of these business dynamics is key when thinking about archiving news.

There are a handful of large news outlets that are “too big to fail.”

There are huge numbers of small outlets like subject verticals, foreign diasporas, etc. that need to be watched

Different strategies should be used for different outlets.

The material on lots of links (as sources) disappears after a short period of time.

While Archive.org is a great resource, it can’t do everything.

Preserving underlying evidence is really important.

How we deal with massive databases and queries against them are a difficult problem.

I’m not aware of studies of link rot with relationship to online news.

Who steps up to preserve major data dumps like Snowden, PanamaPapers, or email breaches?

Social media is a collection of observations and small facts without necessarily being journalism.

Journalism is a deliberate act and is meant to be public while social media is not.

We need to come up with a consensus about what parts of social media should be preserved as news..

News does often delve into social media as part of its evidence base now.

Responsible journalism should include archival storage, but it doesn’t yet.

Under current law, we can’t protect a lot of this material without the permission of the creator(s).

The Library of Congress can demand deposit, but doesn’t.

With funding issues, I’m not wild about the Library of Congress being the only entity [for storage.]

In the UK, there are multiple repositories.

ChrisAldrich:

testing to see if I’m still live

What happens if you livetweet too much in one day.
password-change-required

Syndicated copies to: