Statistical Physics, Information Processing, and Biology Workshop at Santa Fe Institute

Bookmarked Information Processing and Biology by John Carlos Baez (Azimuth)
The Santa Fe Institute, in New Mexico, is a place for studying complex systems. I’ve never been there! Next week I’ll go there to give a colloquium on network theory, and also to participate in this workshop.

I just found out about this from John Carlos Baez and wish I could go! How have I not managed to have heard about it?

Stastical Physics, Information Processing, and Biology

Workshop

November 16, 2016 – November 18, 2016
9:00 AM
Noyce Conference Room

Abstract.
This workshop will address a fundamental question in theoretical biology: Does the relationship between statistical physics and the need of biological systems to process information underpin some of their deepest features? It recognizes that a core feature of biological systems is that they acquire, store and process information (i.e., perform computation). However to manipulate information in this way they require a steady flux of free energy from their environments. These two, inter-related attributes of biological systems are often taken for granted; they are not part of standard analyses of either the homeostasis or the evolution of biological systems. In this workshop we aim to fill in this major gap in our understanding of biological systems, by gaining deeper insight in the relation between the need for biological systems to process information and the free energy they need to pay for that processing.

The goal of this workshop is to address these issues by focusing on a set three specific question:

  1. How has the fraction of free energy flux on earth that is used by biological computation changed with time?;
  2. What is the free energy cost of biological computation / function?;
  3. What is the free energy cost of the evolution of biological computation / function.

In all of these cases we are interested in the fundamental limits that the laws of physics impose on various aspects of living systems as expressed by these three questions.

Purpose: Research Collaboration
SFI Host: David Krakauer, Michael Lachmann, Manfred Laubichler, Peter Stadler, and David Wolpert

Syndicated copies to:

#DTMH2016: Saving Online News | NPR RAD recap

Liked #DTMH2016: Saving Online News (RAD recap) (with images, tweets) by NPR Research, Archives, & Data StrategyNPR Research, Archives, & Data Strategy (Storify)
Dodging The Memory Hole is an action-oriented conference and event series that brings together journalists, technologists, and information specialists to strategize solutions for organizing and preserving born-digital news.
Syndicated copies to:

Web Science and Digital Libraries Research Group: 2016-10-13: Dodging The Memory Hole 2016 Trip Report (#dtmh2016)

Liked 2016-10-13: Dodging The Memory Hole 2016 Trip Report (#dtmh2016) by John BerlinJohn Berlin (Web Science and Digital Libraries Research Group: ws-dl.blogspot.com)
A summary/recap of the Dodging the Memory Hole 2016 conference held at UCLA's Charles Young Research Library in Los Angeles, California over two days in October to discuss and highlight potential solutions to the issue of preserving born-digital news.
Syndicated copies to:

Photo Gallery from Dodging the Memory Hole 2016

Images from a conference at UCLA concerned with saving born digital news

Details for the conference can be found at Dodging the Memory Hole 2016.

The Journalism Digital News Archive has posted a nice bunch of photos as well.

My previous posts and notes about the conference:

Syndicated copies to:

Notes from Day 2 of Dodging the Memory Hole: Saving Online News | Friday, October 14, 2016

Some quick thoughts and an archive of the audio and my Twitter notes during the day

If you missed the notes from Day 1, see this post.

It may take me a week or so to finish putting some general thoughts and additional resources together based on the two day conference so that I might give a more thorough accounting of my opinions as well as next steps. Until then, I hope that the details and mini-archive of content below may help others who attended, or provide a resource for those who couldn’t make the conference.

Overall, it was an incredibly well programmed and run conference, so kudos to all those involved who kept things moving along. I’m now certainly much more aware at the gaping memory hole the internet is facing despite the heroic efforts of a small handful of people and institutions attempting to improve the situation. I’ll try to go into more detail later about a handful of specific topics and next steps as well as a listing of resources I came across which may provide to be useful tools for both those in the archiving/preserving and IndieWeb communities.

Archive of materials for Day 2

Audio Files

Below are the recorded audio files embedded in .m4a format (using a Livescribe Pulse Pen) for several sessions held throughout the day. To my knowledge, none of the breakout sessions were recorded except for the one which appears below.

Summarizing archival collections using storytelling techniques


Presentation: Summarizing archival collections using storytelling techniques by Michael Nelson, Ph.D., Old Dominion University

Saving the first draft of history


Special guest speaker: Saving the first draft of history: The unlikely rescue of the AP’s Vietnam War files by Peter Arnett, winner of the Pulitzer Prize for journalism
Peter Arnett talking about news reporting in Vietnam in  60s.

Kiss your app goodbye: the fragility of data journalism


Panel: Kiss your app goodbye: the fragility of data journalism
Featuring Meredith Broussard, New York University; Regina Lee Roberts, Stanford University; Ben Welsh, The Los Angeles Times; moderator Martin Klein, Ph.D., Los Alamos National Laboratory

The future of the past: modernizing The New York Times archive


Panel: The future of the past: modernizing The New York Times archive
Featuring The New York Times Technology Team: Evan Sandhaus, Jane Cotler and Sophia Van Valkenburg; moderated by Edward McCain, RJI and MU Libraries

Lightning Rounds: Six Presenters



Lightning rounds (in two parts)
Six + one presenters: Jefferson Bailey, Terry Britt, Katherine Boss (and team), Cynthia Joyce, Mark Graham, Jennifer Younger and Kalev Leetaru
1: Jefferson Bailey, Internet Archive, “Supporting Data-Driven Research using News-Related Web Archives” 2: Terry Britt, University of Missouri, “News archives as cornerstones of collective memory” 3: Katherine Boss, Meredith Broussard and Eva Revear, New York University: “Challenges facing preservation of born-digital news applications” 4: Cynthia Joyce, University of Mississippi, “Keyword ‘Katrina’: Re-collecting the unsearchable past” 5: Mark Graham, Internet Archive/The Wayback Machine, “Archiving news at the Internet Archive” 6: Jennifer Younger, Catholic Research Resources Alliance: “Digital Preservation, Aggregated, Collaborative, Catholic” 7. Kalev Leetaru, senior fellow, The George Washington University and founder of the GDELT Project: A Look Inside The World’s Largest Initiative To Understand And Archive The World’s News

Technology and Community


Presentation: Technology and community: Why we need partners, collaborators, and friends by Kate Zwaard, Library of Congress

Breakout: Working with CMS


Working with CMS, led by Eric Weig, University of Kentucky

Alignment and reciprocity


Alignment & reciprocity by Katherine Skinner, Ph.D., executive director, the Educopia Institute

Closing remarks


Closing remarks by Edward McCain, RJI and MU Libraries and Todd Grappone, associate university librarian, UCLA

Live Tweet Archive

Reminder: In many cases my tweets don’t reflect direct quotes of the attributed speaker, but are often slightly modified for clarity and length for posting to Twitter. I have made a reasonable attempt in all cases to capture the overall sentiment of individual statements while using as many original words of the participant as possible. Typically, for speed, there wasn’t much editing of these notes. Below I’ve changed the attribution of one or two tweets to reflect the proper person(s). Fore convenience, I’ve also added a few hyperlinks to useful resources after the fact that didn’t have time to make the original tweets. I’ve attached .m4a audio files of most of the audio for the day (apologies for shaky quality as it’s unedited) which can be used for more direct attribution if desired. The Reynolds Journalism Institute videotaped the entire day and livestreamed it. Presumably they will release the video on their website for a more immersive experience.

Peter Arnett:

Condoms were required issue in Vietnam–we used them to waterproof film containers in the field.

Do not stay close to the head of a column, medics, or radiomen. #warreportingadvice

I told the AP I would undertake the task of destroying all the reporters’ files from the war.

Instead the AP files moved around with me.

Eventually the 10 trunks of material went back to the AP when they hired a brilliant archivist.

“The negatives can outweigh the positives when you’re in trouble.”

Edward McCain:

Our first panel:Kiss your app goodbye: the fragility of data jornalism

Meredith Broussard:

I teach data journalism at NYU

A news app is not what you’d install on your phone

Dollars for Docs is a good example of a news app

A news app is something that allows the user to put themself into the story.

Often there are three CMSs: web, print, and video.

News apps don’t live in any of the CMSs. They’re bespoke and live on a separate data server.

This has implications for crawlers which can’t handle them well.

Then how do we save news apps? We’re looking at examples and then generalizing.

Everyblock.com was a good example based on chicagocrime and later bought by NBC and shut down.

What?! The internet isn’t forever? Databases need to be save differently than web pages.

Reprozip was developed by NYU Center for Data and we’re using it to save the code, data, and environment.

Ben Welsh:

My slides will be at http://bit.ly/frameworkfix. I work on the data desk @LATimes

We make apps that serve our audience.

We also make internal tools that empower the newsroom.

We also use our nerdy skills to do cool things.

Most of us aren’t good programmers, we “cheat” by using frameworks.

Frameworks do a lot of basic things for you, so you don’t have to know how to do it yourself.

Archiving tools often aren’t built into these frameworks.

Instagram, Pinterest, Mozilla, and the LA Times use django as our framework.

Memento for WordPress is a great way to archive pages.

We must do more. We need archiving baked into the systems from the start.

Slides at http://bit.ly/frameworkfix

Regina Roberts:

Got data? I’m a librarian at Stanford University.

I’ll mention Christine Borgman’s book Big Data, Little Data, No data.

Journalists are great data liberators: FOIA requests, cleaning data, visualizing, getting stories out of data.

But what happens to the data once the story is published?

BLDR: Big Local Digital Repository, an open repository for sharing open data.

Solutions that exist: Hydra at http://projecthydra.org or Open ICPSR www.openicpsr.org

For metadata: www.ddialliance.org, RDF, International Image Interoperability Framework (iiif) and MODS

Martin Klein:

We’ll open up for questions.

Audience Question:

What’s more important: obey copyright laws or preserving the content?

Regina Roberts:

The new creative commons licenses are very helpful, but we have to be attentive to many issues.

Perhaps archiving it and embargoing for later?

Ben Welsh:

Saving the published work is more important to me, and the rest of the byproduct is gravy.

Evan Sandhaus:

I work for the New York Times, you may have heard of it…

Doing a quick demo of Times Machine from @NYTimes

Sophia van Valkenburg:

Talking about modernizing the born-digital legacy content.

Our problem was how to make an article from 2004 look like it had been published today.

There were 100’s of thousands of articles missing.

There was no one definitive list of missing articles.

Outlining the workflow for reconciling the archive XML and the definitive list of URLs for conversion.

It’s important to use more than one source for building an archive.

Jane Cotler:

I’m going to talk about all of “the little things” that came up along the way..

Article Matching: Fusion – How to convert print XML with web HTML that was scraped.

Primarily, we looked at common phrases between the corpus of the two different data sets.

We prioritized the print data over the digital data.

We maintain a system called switchboard that redirects from old URLs to the new ones to prevent link rot.

The case of the missing sections: some sections of the content were blank and not transcribed.

We made the decision of taking out data we had in lieu of making a better user experience for missing sections.

In the future, we’d also like to put photos back into the articles.

Evan Sandhaus:

Modernizing and archiving the @NYTimes archives is an ongoing challenge.

Edward McCain:

Can you discuss the decision to go with a more modern interface rather than a traditional archive of how it looked?

Evan Sandhaus:

Some of the decision was to get the data into an accessible format for modern users.

We do need to continue work on preserving the original experience.

Edward McCain:

Is there a way to distinguish between the print version and the online versions in the archive?

Audience Question:

Could a researcher do work on the entire corpora? Is it available for subscription?

Edward McCain:

We do have a sub-section of data availalbe, but don’t have it prior to 1960.

Audience Question:

Have you documented the process you’ve used on this preservation project?

Sophia van Valkenburg:

We did save all of the code for the project within GitHub.

Jane Cotler:

We do have meeting notes which provide some documentation, though they’re not thorough.

ChrisAldrich:

Oh dear. Of roughly 1,155 tweets I counted about #DtMH2016 in the last week, roughly 25% came from me. #noisy

Opensource tool I had mentioned to several: @wallabagapp A self-hostable application for saving web pages https://www.wallabag.org

Syndicated copies to:

Notes from Day 1 of Dodging the Memory Hole: Saving Online News | Thursday, October 13, 2016

Some quick thoughts and an archive of my Twitter notes during the day

Today I spent most of the majority of the day attending the first of a two day conference at UCLA’s Charles Young Research Library entitled “Dodging the Memory Hole: Saving Online News.” While I knew mostly what I was getting into, it hadn’t really occurred to me how much of what is on the web is not backed up or archived in any meaningful way. As a part of human nature, people neglect to back up any of their data, but huge swaths of really important data with newsworthy and historic value is being heavily neglected. Fortunately it’s an interesting enough problem to draw the 100 or so scholars, researchers, technologists, and journalists who showed up for the start of an interesting group being conglomerated through the Reynolds Journalism Institute and several sponsors of the event.

What particularly strikes me is how many of the philosophies of the IndieWeb movement and tools developed by it are applicable to some of the problems that online news faces. I suspect that if more journalists were practicing members of the IndieWeb and used their sites not only for collecting and storing the underlying data upon which they base their stories, but to publish them as well, then some of the (future) archival process may be easier to accomplish. I’ve got so many disparate thoughts running around my mind after the first day that it’ll take a bit of time to process before I write out some more detailed thoughts.

Twitter List for the Conference

As a reminder to those attending, I’ve accumulated a list of everyone who’s tweeted with the hashtag #DtMH2016, so that attendees can more easily follow each other as well as communicate online following our few days together in Los Angeles. Twitter also allows subscribing to entire lists too if that’s something in which people have interest.

Archiving the day

It seems only fitting that an attendee of a conference about saving and archiving digital news, would make a reasonable attempt to archive some of his experience right?! Toward that end, below is an archive of my tweetstorm during the day marked up with microformats and including hovercards for the speakers with appropriate available metadata. For those interested, I used a fantastic web app called Noter Live to capture, tweet, and more easily archive the stream.

Note that in many cases my tweets don’t reflect direct quotes of the attributed speaker, but are often slightly modified for clarity and length for posting to Twitter. I have made a reasonable attempt in all cases to capture the overall sentiment of individual statements while using as many original words of the participant as possible. Typically, for speed, there wasn’t much editing of these notes. I’m also attaching .m4a audio files of most of the audio for the day (apologies for shaky quality as it’s unedited) which can be used for more direct attribution if desired. The Reynolds Journalism Institute videotaped the entire day and livestreamed it. Presumably they will release the video on their website for a more immersive experience.

If you prefer to read the stream of notes in the original Twitter format, so that you can like/retweet/comment on individual pieces, this link should give you the entire stream. Naturally, comments are also welcome below.

Audio Files

Below are the audio files for several sessions held throughout the day.

Greetings and Keynote


Greetings: Edward McCain, digital curator of journalism, Donald W. Reynolds Journalism Institute (RJI) and University of Missouri Libraries and Ginny Steel, university librarian, UCLA
Keynote: Digital salvage operations — what’s worth saving? given by Hjalmar Gislason, vice president of data, Qlik

Why save online news? and NewsScape


Panel: “Why save online news?” featuring Chris Freeland, Washington University; Matt Weber, Ph.D., Rutgers, The State University of New Jersey; Laura Wrubel, The George Washington University; moderator Ana Krahmer, Ph.D., University of North Texas
Presentation: “NewsScape: preserving TV news” given by Tim Groeling, Ph.D., UCLA Communication Studies Department

Born-digital news preservation in perspective


Speaker: Clifford Lynch, Ph.D., executive director, Coalition for Networked Information on “Born-digital news preservation in perspective”

Live Tweet Archive

ChrisAldrich:

Getting Noter Live fired up for Dodging the Memory Hole 2016: Saving Online News https://www.rjionline.org/dtmh2016

Ginny Steel:

I’m glad I’m not at NBC trying to figure out the details for releasing THE APPRENTICE tapes.

Edward McCain:

Let’s thank @UCLA and the library for hosting us all.

While you’re here, don’t forget to vote/provide feedback throughout the day for IMLS

Someone once pulled up behind me and said “Hi Tiiiigeeerrr!” #Mizzou

A server at the Missourian crashed as the system was obsolete and running on baling wire. We lost 15 years of archives

The dean & head of Libraries created a position to save born digital news.

We’d like to help define stake-holder roles in relation to the problem.

Newspaper is really an outmoded term now.

I’d like to celebrate that we have 14 student scholars here today.

We’d like to have you identify specific projects that we can take to funding sources to begin work after the conference

We’ll be going to our first speaker who will be introduced by Martin Klein from Los Alamos.

Martin Klein:

Hjalmar Gislason is a self-described digital nerd. He’s the Vice President of Data.

I wonder how one becomes the President of Data?

Hjalmar Gislason:

My Icelandic name may be the most complicated part of my talk this morning.

Speaking on Digital Salvage Operations: What’s worth Saving”

My father in law accidentally threw away my wife’s favorite stuffed animal. #DeafTeddy

Some people just throw everything away because they’re not being used. Others keep everything and don’t throw it away.

The fundamental question: Do you want to save everything or do you want to get rid of everything?

I joined @qlik two years ago and moved to Boston.

Before that I was with spurl.net which was about saving copies of webpages they’d previously visited.

I had also previously invested in kjarninn which is translated as core.

We used to have little data, now we’re with gigantic data and moving to gargantuan data soon.

One of my goals today is to broaden our perspective about what data needs saving.

There’s the Web, the “Deep” Web, then there’s “Other” data which is at the bottom of the pyramid.

I got to see into the process of #panamapapers but I’d like to discuss the consequences from April 3rd.

The amount of meetings were almost more than could have been covered in real time in Iceland.

The #panamapapers were a soap opera, much like US politics.

Looking back at the process is highly interesting, but it’s difficult to look at all the data as they unfoldedd

How can we capture all the media minute by minute as a story unfolds.

You can’t trust that you can go back to a story at a certain time and know that it hasn’t been changed. #1984 #Orwell

There was a relatively pro-HRC piece earlier this year @NYTimes that was changed.

Newsdiffs tracks changes in news over time. The HRC article had changed a lot.

Let’s say you referenced @CNN 10 years ago, likely now, the CMS and the story have both changed.

8 years ago, I asked, wouldn’t we like to have the social media from Iceland’s only Nobel Laureate as a teenager?

What is private/public, ethical/unethical when dealing with data?

Much data is hidden behind passwords or on systems which are not easily accessed from a database perspective.

Most of the content published on Facebook isn’t public. It’s hard to archive in addition to being big.

We as archivists have no claim on the hidden data within Facebook.

ChrisAldrich:

The #indieweb could help archivists in the future in accessing more personal data.

Hjalmar Gislason:

Then there’s “other” data: 500 hours of video us uploaded to YouTube per minute.

No organization can go around watching all of this video data. Which parts are newsworthy?

Content could surface much later or could surface through later research.

Hornbjargsviti lighthouse recorded the weather every three hours for years creating lots of data.

And that was just one of hundreds of sites that recorded this type of data in Iceland.

Lots of this data is lost. Much that has been found was by coincidence. It was never thought to archive it.

This type of weather data could be very valuable to researchers later on.

There was also a large archive of Icelandic data that was found.

Showing a timelapse of Icelandic earthquakes https://vimeo.com/24442762

You can watch the magma working it’s way through the ground before it makes it’s way up through the land.

National Geographic featured this video in a documentary.

Sometimes context is important when it comes to data. What is archived today may be more important later.

As the economic crisis unfolded in Greece, it turned out the data that was used to allow them into EU was wrong.

The data was published at the time of the crisis, but there was no record of what the data looked like 5 years earlier.

Only way to recreate the data was to take prior printed sources. This is usu only done in extraordinary cirucumstances.

We captured 150k+ data sets with more than 8 billion “facts” which was just a tiny fraction of what exists.

How can we delve deeper into large data sets, all with different configurations and proprietary systems.

“There’s a story in every piece of data.”

Once a year energy consumption seems to dip because February has fewer days than other months. Plotting it matters.

Year over year comparisons can be difficult because of things like 3 day weekends which shift over time.

Here’s a graph of the population of Iceland. We’ve had our fair share of diseases and volcanic eruptions.

To compare, here’s a graph of the population of sheep. They outnumber us by an order(s) of magnitude.

In the 1780’s there was an event that killed off lots of sheep, so people had the upper hand.

Do we learn more from reading today’s “newspaper” or one from 30, 50, or 100 years ago?

There was a letter to the editor about an eruption and people had to move into the city.

letter: “We can’t have all these people come here, we need to build for our own people first.”

This isn’t too different from our problems today with respect to Syria. In that case, the people actually lived closer.

In the born-digital age, what will the experience look like trying to capture today 40 years hence?

Will it even be possible?

Machine data connections will outnumber “people” data connections by a factor of 10 or more very quickly.

With data, we need to analyze, store, and discard data. How do we decide in a spit-second what to keep & discard?

We’re back to the father-in-law and mother-in-law question: What to get rid of and what to save?

Computing is continually beating human tasks: chess, Go, driving a car. They build on lots more experience based on data

Whoever has the most data on driving cars and landscape will be the ultimate winner in that particular space.

Data is valuable, sometimes we just don’t know which yet.

Hoarding is not a strategy.

You can only guess at what will be important.

“Commercial use in Doubt” The third sub-headline in a newspaper about an early test of television.

There’s more to it than just the web.

Kate Zwaard:

Hoarding isn’t a strategy really resonates with librarians, what could that relationship look like?

Hjalmar Gislason:

One should bring in data science, industry may be ahead of libraries.

Cross-disciplinary approaches may be best. How can you get a data scientist to look at your problem? Get their attention?

Peter Arnett:

There’s 60K+ books about the Viet Nam War. How do we learn to integrate what we learn after an event (like that)?

Hjalmar Gislason:

Perspective always comes with time, as additional information arrives.

Scientific papers are archived in a good way, but the underlying data is a problem.

In the future you may have the ability to add supplementary data as a supplement what appears in a book (in a better way)

Archives can give the ability to have much greater depth on many topics.

Are there any centers of excellence on the topics we’re discussing today? This conference may be IT.

We need more people that come from the technical side of things to be watching this online news problem.

Hacks/Hackers is a meetup group that takes place all over the world.

It brings the journalists and computer scientists together regularly for beers. It’s some of the outreach we need.

Edward McCain:

If you’re not interested in money, this is a good area to explore. 10 minute break.

Don’t forget to leave your thoughts on the questions at the back of the room.

We’re going to get started with our first panel. Why is it important to save online news?

Matthew Weber:

I’m Matt Weber from Rugters University and in communications.

I’ll talk about web archives and news media and how they interact.

I worked at Tribune Corp. for several years and covered politics in DC.

I wanted to study the way in which the news media is changing.

We’re increadingly seeing digital only media with no offline surrogate.

It’s becomign increasingly difficult to do anything but look at it now as it exists.

There was no large scale online repository of online news to do research.

#OccupyWallStreet is one of the first examples of stories that exist online in ocurence and reportage.

There’s a growing need to archive content around local news particularly politics and democracy.

When there is a rich and vibrant local news environment, people are more likely to become engaged.

Local news is one of the least thought about from an archive perspective.

Laura Wrubel:

I’m at GWU Librarys in the scholarly technology group.

I’m involved in social feed manager which allows archivists to put together archives from social services.

Kimberly Gross, a faculty member, studies tweets of news outlets and journalists.

We created a prototype tool to allow them to collect data from social media.

Journalists were 2011 primarily using their Twitter presences to direct people to articles rather than for conversation

We collect data of political candidates.

Chris Freeland:

I’m an associate library and representing “Documenting the Now” with WashU, UCRiverside, & UofMd

Documenting the Now revolves around Twitter documentation.

It started with the Ferguson story and documenting media, videos during the protests in the community.

What can we as memory institutions do to capture the data?

We gathered 14million tweets relating to Ferguson within two weeks.

We tried to build a platform that others could use in the future for similar data capture relating to social.

Ethics is important in archiving this type of news data.

Ana Krahmer:

Digitally preserving pdfs from news organizations and hyper-local news in Texas.

We’re approaching 5million pages of archived local news.

What is news that needs to be archived, and why?

Matthew Weber:

First, what is news? The definition is unique to each individual.

We need to capture as much of the social news and social representation of news which is fragmented.

It’s an important part of society today.

We no longer produce hard copies like we did a decade ago. We need to capture the online portion.

Laura Wrubel:

We’d like to get the perspective of journalists, and don’t have one on the panel today.

We looked at how midterm election candidates used Twitter. Is that news itself? What tools do we use to archive it?

What does it mean to archive news by private citizens?

Chris Freeland:

Twitter was THE place to find information in St. Louis during the Ferguson protests.

Local news outlets weren’t as good as Twitter during the protests.

I could hear the protest from 5 blocks away and only found news about it on Twitter.

The story was bing covered very differently on Twitter than the local (mainstream) news.

Alternate voices in the mix were very interesting and important.

Twitter was in the moment and wasn’t being edited and causing a delay.

What can we learn from this massive number of Ferguson tweets.

It gives us information about organizing, and what language was being used.

Ana Krahmer:

I think about the archival portion of this question. By whom does it need to be archived?

What do we archive next?

How are we representing the current population now?

Who is going to take on the burden of archiving? Should it be corporate? Cultural memory institution?

Someone needs to currate it, who does that?

our next question: What do you view as primary barriers to news archiving?

Laura Wrubel:

How do we organize and staff? There’s no shortage of work.

Tools and software can help the process, but libraries are usually staffed very thinly.

No single institution can do this type of work alone. Collaboration is important.

Chris Freeland:

Two barriers we deal with: terms of service are an issue with archiving. We don’t own it, but can use it.

Libraries want to own the data in perpetuity. We don’t own our data.

There’s a disconnect in some of the business models for commercialization and archiving.

Issues with accessing data.

People were worried about becoming targets or losing jobs because of participation.

What is role of ethics of archiving this type of data? Allowing opting out?

What about redacting portions? anonymizing the contributions?

Ana Krahmer:

Publishers have a responsibility for archiving their product. Permission from publishers can be difficult.

We have a lot of underserved communities. What do we do with comments on stories?

Corporations may not continue to exist in the future and data will be lost.

Matthew Weber:

There’s a balance to be struck between the business side and the public good.

It’s hard to convince for profit about the value of archiving for the social good.

Chris Freeland:

Next Q: What opportunities have revealed themselves in preserving news?

Finding commonalities and differences in projects is important.

What does it mean to us to archive different media types? (think diversity)

What’s happening in my community? in the nation? across the world?

The long-history in our archives will help us learn about each other.

Ana Krahmer:

We can only do so much with the resources we have.

We’ve worked on a cyber cemetery product in the past.

Someone else can use the tools we create within their initiatives.

Chris Freeland:

repeating ?: What are issues in archiving longerform video data with regard to stories on Periscope?

Audience Question:

How do you channel the energy around archiving news archiving?

Matthew Weber:

Research in the area is all so new.

Audience Question:

Does anyone have any experience with legal wrangling with social services?

Chris Freeland:

The ACLU is waging a lawsuit against Twitter about archived tweets.

Ana Krahmer:

Outreach to community papers is very rhizomic.

Audience Question:

How do you take local examples and make them a national model?

Ana Krahmer:

We’re teenagers now in the evolution of what we’re doing.

Edward McCain:

Peter Arnett just said “This is all ore interesting than I thought it would be.”

Next Presentation: NewsScape: preserving TV news

Tim Groeling:

I’ll be talking about the NewsScape project of Francis Steen, Director, Communication Studies Archive

I’m leading the archiving of the analog portion of the collection.

The oldest of our collection dates from the 1950’s. We’ve hosted them on YouTube which has created some traction.

Commenters have been an issue with posting to YouTube as well as copyright.

NewsScape is the largest collecction of TV news and public affairs programs (local & national)

Prior to 2006, we don’t know what we’ve got.

Paul said “Ill record everytihing I can and someone in the future can deal with it.”

We have 50K hours of Betamax.

VHS are actually most threatened, despite being newest tapes.

Our budget was seriously strapped.

Maintaining closed captioning is important to our archiving efforts.

We’ve done 36k hours of encoding this year.

We use a layer of dead VCR’s over our good VCR’s to prevent RF interference and audio buzzing. 🙂

Post-2006 We’re now doing straight to digital

Preservation is the first step, but we need to be more than the world’s best DVR.

Searching the news is important too.

Showing a data visualization of news analysis with regard to the Heathcare Reform movement.

We’re doing facial analysis as well.

We have interactive tools at viz2016.com.

We’ve tracked how often candidates have smiled in election 2016. Hillary > Trump

We want to share details within our collection, but don’t have tools yet.

Having a good VCR repairman has helped us a lot.

Edward McCain:

Breaking for lunch…

Clifford Lynch:

Talk “Born-digital news preservation in perspective”

There’s a shared consensus that preserving scholarly publications is important.

While delivery models have shifted, there must be some fall back to allow content to survive publisher failure.

Preservation was a joint investment between memory institutions and publishers.

Keepers register their coverage of journals for redundancy.

In studying coverage, we’ve discovered Elsevier is REALLY well covered, but they’re not what we’re worried about.

It’s the small journals as edge cases that really need more coverage.

Smaller journals don’t have resources to get into the keeper services and it’s more expensive.

Many Open Access Journals are passion projects and heavily underfunded and they are poorly covered.

Being mindful of these business dynamics is key when thinking about archiving news.

There are a handful of large news outlets that are “too big to fail.”

There are huge numbers of small outlets like subject verticals, foreign diasporas, etc. that need to be watched

Different strategies should be used for different outlets.

The material on lots of links (as sources) disappears after a short period of time.

While Archive.org is a great resource, it can’t do everything.

Preserving underlying evidence is really important.

How we deal with massive databases and queries against them are a difficult problem.

I’m not aware of studies of link rot with relationship to online news.

Who steps up to preserve major data dumps like Snowden, PanamaPapers, or email breaches?

Social media is a collection of observations and small facts without necessarily being journalism.

Journalism is a deliberate act and is meant to be public while social media is not.

We need to come up with a consensus about what parts of social media should be preserved as news..

News does often delve into social media as part of its evidence base now.

Responsible journalism should include archival storage, but it doesn’t yet.

Under current law, we can’t protect a lot of this material without the permission of the creator(s).

The Library of Congress can demand deposit, but doesn’t.

With funding issues, I’m not wild about the Library of Congress being the only entity [for storage.]

In the UK, there are multiple repositories.

ChrisAldrich:

testing to see if I’m still live

What happens if you livetweet too much in one day.
password-change-required

Syndicated copies to:

Twitter List for #DtMH2016 Participants | Dodging the Memory Hole 2016: Saving Online News

Some thoughts on creating conference lists, live tweeting and archiving events.

Live Tweeting and Twitter Lists

While attending the upcoming conference Dodging the Memory Hole 2016: Saving Online News later this week, I’ll make an attempt to live Tweet as much as possible. (If you’re following me on Twitter on Thursday and Friday and find me too noisy, try using QuietTime.xyz to mute me on Twitter temporarily.) I’ll be using Kevin Marks‘ excellent Noter Live web app to both send out the tweets as well as to store and archive them here on this site thereafter (kind of like my own version of Storify.)

In getting ramped up to live Tweet it, it helps significantly to have a pre-existing list of attendees (and remote participants) talking about #DtMH2016 on Twitter, so I started creating a Twitter list by hand. I realized that it would be nice to have a little bot to catch others as the week progresses. Ever lazy, I turned to IFTTT.com to see if something already existed, and sure enough there’s a Twitter search with a trigger that will allow one to add people who mention a particular hashtag to a Twitter list automatically.

Here’s the resultant list, which should grow as the event unfolds throughout the week:
🔖 People on Twitter talking about #DtMH2016

Feel free to follow or subscribe to the list as necessary. Hopefully this will make attending the conference more fruitful for those there live as well as remote.

Not on the list? Just tweet a (non-private) message with the conference hashtag: #DTMH2016 and you should be added to the list shortly.

Tweet: I'm attending #DtMH2016 @rji | Dodging the Memory Hole 2016: Saving Online News http://ctt.ec/5RKt2+ Lazy like me? Click the bird to tweet: “I’m attending #DtMH2016 @rji | Dodging the Memory Hole 2016: Saving Online News http://ctt.ec/5RKt2+”

IFTTT Recipe for Creating Twitter Lists of Conference Attendees

For those interested in creating their own Twitter lists for future conferences (and honestly the hosts of all conferences should do this as they set up their conference hashtag and announce the conference), below is a link to the ifttt.com recipe I created for this, but which can be modified for use by others.

IFTTT Recipe: Create Twitter List of Attendees from search of people using conference hashtag connects twitter to twitter

Naturally, it would also be nice if, as people registered for conferences, they were asked for their Twitter handles and websites so that the information could be used to create such online lists to help create longer lasting relationships both during the event and afterwards as well. (Naturally providing these details should be optional so that people who wish to maintain their privacy could do so.)

Syndicated copies to:

Jacob Reitzin presents: “Bootstrapping Technology and Great User Experience” | Innovate Pasadena

The founder of Dude I Need a Truck talks about his startup experience

Innovate Pasadena presents Jacob Reitzin on: “Bootstrapping Technology and Great User Experience” at Cross Campus, Old Town Pasadena, 87 N Raymond Ave., Pasadena, CA 91103 on :

Jacob gave a nice and humanizing presentation on some of the philosophy behind his startup. Though he didn’t get very deep into the topic indicated by the title of his talk, he was very engaging in exactly the manner you’d expect that a dude with a truck could be.

Syndicated copies to:

Exhibition at BC Space | Amerikan Krazy: Life Out of Balance

Bookmarked Artists take aim at their country and their county by Antoine BoessenkoolAntoine Boessenkool (The Orange County Register)
“Amerikan Krazy: Life Out of Balance” takes part of its name from the new book <a href="http://boffosockobooks.com/books/authors/henry-james-korn/amerikan-krazy/">"Amerikan Krazy”</a> by <a href="http://www.henryjameskorn.com">Henry James Korn</a>. From 2008 to 2013, Korn worked at the Orange County Great Park. He was responsible for the creation of the Palm Court arts complex and culture, music, art and history programs.<br /><br /> “The book is very much about total corporate control of public and private space,” Korn said. The story follows a wounded Marine veteran haunted after having missed the chance to assassinate a presidential candidate who later causes massive human suffering and wreaks havoc on America’s wealth and democracy.<br /><br /> It’s a way of understanding what’s happening in politics now, Korn said.<br /><br /> “Because if ever there was a recognition that our public life and politics have gone crazy, it’s this moment.”

If you haven’t manage to make it down, this exhibition is running for another week at BC Space!

Syndicated copies to:

IndieWebCamp Los Angeles 2016 Announced for November 4-6

The IndieWeb movement is a global community that is building an open set of principles and methods that empower people to take back ownership of their online identity and data instead of relying on 3rd party websites. Come learn more about the next generation of the Web.

For the first time since 2013, when it appeared in Hollywood, IndieWebCamp is coming to Los Angeles! I’m definitely going, and I invite you to join us. For the past two years or so, I’ve been delving into the wealth of tools and resources the community has been developing. I’m excited to attend a local camp, help out in any way I can, and will help anyone who’s interested in learning more.

Join us in LA (Santa Monica) for two days of a BarCamp-style gathering of web creators building and sharing open web technologies to empower users to own their own identities & content, and advance the state of the #indieweb!

The IndieWeb movement is a global community that is building an open set of principles and methods that empower people to take back ownership of their identity and data instead of relying on 3rd party websites.

At IndieWebCamp you’ll learn about ways to empower yourself to own your data, create & publish content on your own site, and only optionally syndicate to third-party silos. Along the way you’ll get a solid grounding in the history and future of Microformats, domain ownership, IndieAuth, WebMention and more!

For remote participants, tune into the live chat (tons of realtime notes!) and the video livestream (URL TBD).

General IndieWeb Principles

icon 4611.png Your content is yours
When you post something on the web, it should belong to you, not a corporation. Too many companies have gone out of business and lost all of their users’ data. By joining the IndieWeb, your content stays yours and in your control.
icon 31635.png You are better connected
Your articles and status messages can go to all services, not just one, allowing you to engage with everyone. Even replies and likes on other services can come back to your site so they’re all in one place.
icon 2003.png You are in control
You can post anything you want, in any format you want, with no one monitoring you. In addition, you share simple readable links such as example.com/ideas. These links are permanent and will always work.

 Where

Pivotal
1333 2nd Street, Suite 200
Santa Monica, CA, 90401
United States
Map

When

Friday (optional): 2016-11-04
Saturday: 2016-11-05
Sunday: 2016-11-06

RSVP

Indie Event
Eventbrite
Lanyrd
Facebook

Guest List
For more details see: IndieWebCamp LA 2016

Tentative Schedule

Day 0 Prep Night

Day 0 is an optional prep night for people that want to button up their website a little bit to get ready for the IndieWebCamp proper.
18:30 Organizer setup
19:00 Doors open
19:15 Introductions
19:30 Build session
22:00 Day 0 closed

Day 1 Discussion

Day 1 is about discussing in a BarCamp-like environment. Bring a topic you’d like to discuss or join in on topics as they are added to the board. We make the schedule together!
08:00 Organizer setup
08:30 Doors open – badges
09:15 Introductions and demos
10:00 Session scheduling
10:30 Sessions
12:00 Group photo & Lunch
13:00 Sessions on the hour
16:00 Last session
17:00 Day 1 closing session, break, meetup later for dinner

Day 2 Building

Day 2 is about making things on and for your personal site! Work with others or on your own.
09:30 Doors open – badges
10:10 Day 2 kick-off, session scheduling
10:30 Build sessions
12:00 Catered lunch
14:30 Build sessions continue
15:00 Demos
16:30 Community clean-up
17:00 Camp closed!

Sponsorship opportunities are available for those interested.

im-attending-indiewebcamp

Syndicated copies to:

“ALOHA to the Web”: Dr. Norm Abramson to give 2016 Viterbi Lecture at USC

Bookmarked USC - Viterbi School of Engineering - Dr. Norm Abramson (viterbi.usc.edu)

“ALOHA to the Web”

Dr. Norman Abramson, Professor Emeritus, University of Hawaii

Lecture Information

Thursday, April 14, 2016
Hughes Electrical Engineering Center (EEB)
Reception 3:00pm (EEB Courtyard)
Lecture 4:00pm (EEB 132)

Abstract

Wireless access to the Internet today is provided predominantly by random access ALOHA channels connecting a wide variety of user devices. ALOHA channels were first analyzed, implemented and demonstrated in the ALOHA network at the University of Hawaii in June, 1971. Information Theory has provided a constant guide for the design of more efficient channels and network architectures for ALOHA access to the web.

In this talk we examine the architecture of networks using ALOHA channels and the statistics of traffic within these channels. That traffic is composed of user and app oriented information augmented by protocol information inserted for the benefit of network operation. A simple application of basic Information Theory can provide a surprising guide to the amount of protocol information required for typical web applications.

We contrast this theoretical guide of the amount of protocol information required with measurements of protocol generated information taken on real network traffic. Wireless access to the web is not as efficient as you might guess.

Biography

Norman Abramson received an A.B. in physics from Harvard College in 1953, an M.A. in physics from UCLA in 1955, and a Ph.D. in Electrical Engineering from Stanford in 1958.

He was an assistant professor and associate professor of electrical engineering at Stanford from 1958 to 1965. From 1967 to 1995 he was Professor of Electrical Engineering, Professor of Information and Computer Science, Chairman of the Department of Information and Computer Science, and Director of the ALOHA System at the University of Hawaii in Honolulu. He is now Professor Emeritus of Electrical Engineering at the University of Hawaii. He has held visiting appointments at Berkeley (1965), Harvard (1966) and MIT (1980).

Abramson is the recipient of several major awards for his work on random access channels and the ALOHA Network, the first wireless data network. The ALOHA Network went into operation in Hawaii in June, 1971. Among these awards are the Eduard Rhein Foundation Technology Award (Munich, 2000), the IEEE Alexander Graham Bell Medal (Philadelphia, 2007) and the NEC C&C Foundation Award (Tokyo, 2011).

Syndicated copies to:

2016 North-American School of Information Theory, June 21-23

Bookmarked 2016 North-American School of Information Theory, June 21-23, 2016 (itsoc.org)

The 2016 School of information will be hosted at Duke University, June 21-23. It is sponsored by the IEEE Information Theory Society, Duke University, the Center for Science of Information, and the National Science Foundation. The school provides a venue where doctoral and postdoctoral students can learn from distinguished professors in information theory, meet with fellow researchers, and form collaborations.

Program and Lectures

The daily schedule will consist of morning and afternoon lectures separated by a lunch break with poster sessions. Students from all research areas are welcome to attend and present their own research via a poster during the school.  The school will host lectures on core areas of information theory and interdisciplinary topics. The following lecturers are confirmed:

  • Helmut Bölcskei (ETH Zurich): The Mathematics of Deep Learning
  • Natasha Devroye (University of Illinois, Chicago): The Interference Channel
  • René Vidal (Johns Hopkins University): Global Optimality in Deep Learning and Beyond
  • Tsachy Weissman (Stanford University): Information Processing under Logarithmic Loss
  • Aylin Yener (Pennsylvania State University): Information-Theoretic Security

Logistics

Applications will be available on March 15 and will be evaluated starting April 1.  Accepted students must register by May 15, 2016.  The registration fee of $200 will include food and 3 nights accommodation in a single-occupancy room.  We suggest that attendees fly into the Raleigh-Durham (RDU) airport located about 30 minutes from the Duke campus. Housing will be available for check-in on the afternoon of June 20th.  The main part of the program will conclude after lunch on June 23rd so that attendees can fly home that evening.

To Apply: click “register” here (fee will accepted later after acceptance)

Administrative Contact: Kathy Peterson, itschool2016@gmail.com

Organizing Committee

Henry Pfister (chair) (Duke University), Dror Baron (North Carolina State University), Matthieu Bloch (Georgia Tech), Rob Calderbank (Duke University), Galen Reeves (Duke University). Advisors: Gerhard Kramer (Technical University of Munich) and Andrea Goldsmith (Stanford)

Sponsors

Syndicated copies to:

Amerikan Krazy: Life Out of Balance | Exhibition at BC Space

I highly recommend everyone take the time to visit BC Space in the coming month to see their exhibition "Amerikan Krazy: Life Out of Balance"!

Yesterday, along with my friend Henry James Korn, I attended the opening of the BC Space Gallery exhibition Amerikan Krazy: Life Out of Balance, and it was fantastic! If you’ve got time to see it sometime in the next few weeks until it closes on May 20th, I guarantee you won’t be disappointed. I don’t think I’ve experienced so much shock and amazement at an exhibition in a long time.

Sadly, Henry won’t be there doing a live reading of his new novel Amerikan Krazy every day for the next month, but you’ll be continually astounded for the entire time you’re there emoting over all of the work on display in an exhibition that is not only aptly named but touches on many aspects of the cultural zeitgeist.

Jeff Gillete, Desert Debris Dismayland Castle
Jeff Gillete, Desert Debris Dismayland Castle

I walked through the gallery half a dozen times over four hours and was continually amazed by new things I’d run into that I somehow hadn’t seen on my first passes, or I’d experience new emotions in pieces I’d spent time studying after coming back to them after viewing others.

For those attending, I hope you’ll notice the experience begins almost as soon as you open the door, it continues even for those who visit the restrooms(!), and it doesn’t end until you’re dumbfounded even as you leave the gallery–in fact, I was so intrigued that I walked back up the stairs to leave a second time.

I was particularly enamored by many of the Glenn Brooks pieces, a fantastic video by Max Papeschi, and the haunting work of Tom Lamb, who I had the pleasure of meeting at the gallery.

Below is a small sampling of some snapshots I took (along with a few professional shots), but don’t let the poor quality of my photography detract from experiencing it more viscerally in person. (Click photos to enlarge and view slideshow.)

 

Here’s the original invitation from Mark Chamberlain and the BC Space Gallery in Laguna Beach:

Dear Friends of BC Space

…Here we go again, as go we must.

BC Space Gallery is proud to present Amerikan Krazy: Life Out of Balance featuring the work of over twenty notable southland artists.

There will be an opening reception on Sunday, March 20, MMXVI, from 1-5 PM in celebration of the Vernal Equinox when our planet once again achieves balance between light and dark.

At the opening, from 2-4 PM, Henry James Korn will launch his new book Amerikan Krazy after which this show was named and thematically assembled. Henry’s comic masterpiece picks up where George Orwell, Jules Verne, and Edward Abbey left off, and turns political writing into art.

Henry Korn is the former director of the Art, Culture, and Heritage program at the Orange County Great Park. At the conclusion of his reading, there will be a discussion period on how the original grand dream for the transformation of the former Marine Corps air base has changed from a public serving project into a corporate theme park, sports complex, and housing development that mirrors the “Founding Father Land” depicted in Korn’s relentless satirical novel.

Amerikan Krazy: Life Out of Balance includes work by: Jorg Dubin, Joella March, Stephen Anderson, Jeff Gillette, F. Scott Hess, Tom Lamb, Douglas McCulloh, Haley Blatte, Jerry Burchfield, Mark Chamberlain, Ricardo Duffy, Jared Milar, Max Papeschi, Jessica DeStephano, Lynn Kubasek, Glenn Brooks, Ron English, Dustin Shuler, Clayton Spada, Jacques Garnier, Pat Spakuhl, and Dan Van Clapp.

This exhibition will be on display until May 20, 2016. Gallery hours are by arrangement. The opening reception is free to the public, but seating for the book launch is limited so reservations are encouraged.

For additional information please contact the gallery or Mark Chamberlain.

Source: BC Space

The gallery can be contacted at the details below:

BC Space Gallery
235 Forest Avenue
Laguna Beach, CA 92651
949.497.1880
bcspace@cox.net

Henry Korn chats with fans after reading from Amerikan Krazy
Henry Korn chats with fans after reading from Amerikan Krazy
Syndicated copies to: