parsing | Chris Aldrich

In August 2012, I wrote a quick script to stream front-page Hackernews stories to an IRC channel on Freenode (##hackernews in case you're interested) so that I could quickly glance at popular stories there instead of needing to load Hackernews. Since IRC is my feed reader, I've always tried to pipe as much there as possible.

Very smart and reminiscent of some of the stuff Drew McLellan and Jeremy Keith were doing almost a decade before. There is a lot more power in microformats that most web developers give them credit for. Aaron has a great example and use-case.

My favorite part here:

So in 2.5 years of parsing the HTML, I never had any problems. In 2 days of parsing the JSON API, I hit a glitch where all the stories were empty.
Since more people and programs see the HTML than use the API, the HTML ends up being more reliable.

After several years of giving back no data, apparently YouTube has changed at least some of the markup and metadata on their site so that parsers are returning richer data now. I’m thrilled to see that as of this morning putting in traditional YouTube permalinks now allows the parser in David Shanske‘s awesome IndieWeb Post Kinds plugin to properly return the title, summary, site name, tags, and featured images from YouTube videos! If only they’d include an h-card to give back the author name, URL, and avatar…

Making watch posts for YouTube just got a whole lot easier!