Welcome to the ‘Anatomy of an AI System’
Tag: algorithms
👓 Farewell Social Media | James Shelley
I recently purged the data from my Facebook account. This effort was shockingly labour intensive: it took a browser script all weekend to crunch, and still many aspects of the process required manual execution. Torching years and years of old Facebook activity felt so liberating that I found another...
IndieWeb Summit 2018 Recap
The year of the Indie Reader
Last year I wrote the post Feed Reader Revolution in response to an increasingly growing need I’ve seen in the social space for a new sort of functionality in feed readers. While there have been a few interesting attempts like Woodwind which have shown a proof-of-concept, not much work had been done until some initial work by Aaron Parecki and a session at last year’s IndieWeb Summit entitled Putting it all Together.
Over the past year I’ve been closely watching Aaron Parecki; Grant Richmond and Jonathan LaCour; Eddie Hinkle; and Kristof De Jaeger’s collective progress on the microsub specification as well as their respective projects Aperture/Monocle; Together; Indigenous/Indigenous for iOS; and Indigenous for Android. As a result in early May I was overjoyed to suggest a keynote session on readers and was stupefied this week as many of them have officially launched and are open to general registration as relatively solid beta web services.
I spent a few minutes in a session at the end of Tuesday and managed to log into Aperture and create an account (#16, though I suspect I may be one of the first to use it besides the initial group of five developers). I also managed to quickly and easily add a microsub endpoint to my website as well. Sadly I’ve got some tweaks to make to my own installation to properly log into any of the reader app front ends. Based on several of the demos I’ve seen over the past months, the functionality involved is not only impressive, but it’s a properly large step ahead of some of the basic user interface provided by the now-shuttered Woodwind.xyz service (though the code is still available for self-hosting.)
Several people have committed to make attempts at creating a microsub server including Jack Jamieson who has announced an attempt at creating one for WordPress after having recently built the Yarns reader for WordPress from scratch this past year. I suspect within the coming year we’ll see one or two additional servers as well as some additional reading front ends. In fact, Ryan Barrett spent the day on Wednesday hacking away at leveraging the News Blur API and leveraging it to make News Blur a front end for Aperture’s server functionality. I’m hoping others may do the same for other popular readers like Feedly or Inoreader to expand on the plurality of offerings. Increased competition for new reader offerings can only improve the entire space.
Even more reading related support
Just before the Summit, gRegor Morrill unveiled the beta version of his micropub client Indiebookclub.biz which allows one to log in with their own website and use it to post reading updates to their own website. For those who don’t yet support micropub, the service saves the data for eventual export. His work on it continued through the summit to continue to improve an already impressive product. It’s the fist micropub client of its kind amidst a growing field of websites (including WordPress and WithKnown which both have plugins) that offer reading post support. Micro.blog has recently updated its code to allow users of the platform the ability to post reads with indiebookclub.biz as well. As a result of this spurt of reading related support there’s now a draft proposal to add read-of
and read-status
support as new Microformats. Perhaps reads will be included in future updates of the post-type-discovery algorithm as well?
Given the growth of reading post support and a new micropub read client, I suspect it won’t take long before some of the new microsub-related readers begin supporting read post micropub functionality as well.
IndieAuth Servers
In addition to David Shanske’s recent valiant update to the IndieAuth plugin for WordPress, Manton Reece managed to finish up coding work to unveil another implementation of IndieAuth at the Summit. His version is for the micro.blog platform which is a significant addition to the community and will add several hundred additional users who will have broader access to a wide assortment of functionality as a result.
The Future
While work continues apace on a broad variety of fronts, I was happy to see that my proposal for a session on IndieAlgorithms was accepted (despite my leading another topic earlier in the day). It was well attended and sparked some interesting discussion about how individuals might also be able to exert greater control over what they’re presented to consume. With the rise of Indie feed readers this year, the ability to better control and filter one’s incoming content is going to take on a greater importance in the very near future. With an increasing number of readers to choose from, more people will hopefully be able to free themselves from the vagaries of the blackbox algorithms that drive content distribution and presentation in products like Facebook, Twitter, Instagram and others. Based on the architecture of servers like Aperture, perhaps we might be able to modify some of the microsub spec to allow more freedom and flexibility in what will assuredly be the next step in the evolution of the IndieWeb?
Diversity
While there are miles and miles to go before we sleep, I was happy to have seen a session on diversity pop up at the Summit. I hope we can all take the general topic to heart to be more inclusive and actively invite friends into our fold. Thanks to Jean for suggesting and guiding the conversation and everyone else for continuing it throughout the rest of the summit and beyond.
Other Highlights
Naturally, the above are just a few of the bigger highlights as I perceive them. I’m sure others will appear in the IndieNews feed or other blogposts about the summit. The IndieWeb is something subtly different to each person, so I hope everyone takes a moment to share (on your own sites naturally) what you got out of all the sessions and discussions. There was a tremendous amount of discussion, debate, and advancement of the state of the art of the continually growing IndieWeb. Fortunately almost all of it was captured in the IndieWeb chat, on Twitter, and on video available through either the IndieWeb wiki pages for the summit or directly from the IndieWeb YouTube channel.
I suspect David Shanske and I will have more to say in what is sure to be a recap episode in our next podcast.
Photos
Finally, below I’m including a bunch of photos I took over the course of my trip. I’m far from a professional photographer, but hopefully they’ll give a small representation of some of the fun we all had at camp.
Final Thanks
People
While I’m thinking about it, I wanted to take a moment to thank everyone who came to the summit. You all really made it a fantastic event!
I’d particularly like to thank Aaron Parecki, Tantek Çelik, gRegor Morrill, Marty McGuire, and David Shanske who did a lot of the organizing and volunteer work to help make the summit happen as well as to capture it so well for others to participate remotely or even view major portions of it after-the-fact. I would be remiss if I didn’t thank Martijn van der Ven for some herculean efforts on IRC/Chat in documenting things in real time as well as for some serious wiki gardening along the way. As always, there are a huge crew of others whose contributions large and small help to make up the rich fabric of the community and we wouldn’t be who we are without your help. Thank you all! (Or as I might say in chat: community++).
And finally, a special personal thanks to Greg McVerry for kindly letting me join him at the Hotel deLuxe for some late night discussions on the intersection of IndieWeb and Domain of One’s Own philosophies as they dovetail with the education sector. With growing interest and a wealth of ideas in this area, I’m confident it’s going to be a rapidly growing one over the coming years.
Sponsors
I’d also like to take a moment to say thanks to all the sponsors who helped to make the event a success including Name.com, GoDaddy, Okta, Mozilla, DreamHost, and likely a few others who I’m missing at the moment.
I’d also like to thank the Eliot Center for letting us hosting the event at their fabulous facility.
👓 Why 2016 Was the Year of the Algorithmic Timeline | Motherboard
2016 was the year that the likes of Instagram and Twitter decided they knew better than you what content you wanted to see in your feeds.
use algorithms to decide on what individual users most wanted to see. Depending on our friendships and actions, the system might deliver old news, biased news, or news which had already been disproven.
2016 was the year of politicians telling us what we should believe, but it was also the year of machines telling us what we should want.
The only way to insure your posts gain notice is to bombard the feed and hope that some stick, which risks comprising on quality and annoying people.
Sreekumar added: “Interestingly enough, the change was made after Instagram opened the doors to brands to run ads.” But even once they pay for visibility, a brand under pressure to remain engaging: “Playing devil’s advocate for a second here: All the money in the world cannot transform shitty content into good content.”
Artificially limiting reach of large accounts to then turn around and demand extortion money? It’s the social media mafia!
It disorients the reader, and distracts them with endless, timeless content.
👓 Real People Are Turning Their Accounts Into Bots On Instagram — And Cashing In | BuzzFeed
Verified accounts turning themselves into bots, millions of fake likes and comments, a dirty world of engagement trading inside Telegram groups. Welcome to the secret underbelly of Instagram.
Worse, they’re giving away their login credentials to outsiders to do this.
📺 Zeynep Tufekci: We’re building a dystopia just to make people click on ads | TED
We're building an artificial intelligence-powered dystopia, one click at a time, says techno-sociologist Zeynep Tufekci. In an eye-opening talk, she details how the same algorithms companies like Facebook, Google and Amazon use to get you to click on ads are also used to organize your access to political and social information. And the machines aren't even the real threat. What we need to understand is how the powerful might use AI to control us -- and what we can do in response.
📺 Zeynep Tufekci: Machine intelligence makes human morals more important | TED
Machine intelligence is here, and we're already using it to make subjective decisions. But the complex way AI grows and improves makes it hard to understand and even harder to control. In this cautionary talk, techno-sociologist Zeynep Tufekci explains how intelligent machines can fail in ways that don't fit human error patterns -- and in ways we won't expect or be prepared for. "We cannot outsource our responsibilities to machines," she says. "We must hold on ever tighter to human values and human ethics."
👓 Centroid street addresses considered harmful | Nelson Minar
The underlying problem here is the database has the polygon for the building but not the exact point of the front door. So it guesses a point by filling in the centroid of the polygon. Which is kinda close but not close enough. A better heuristic may be “center of the polyline that faces the matching street”. That’s also going to be wrong sometimes, but less often.
🎧 The Daily: Russian Trolls’ Favorite Weapon | The New York Times
The indictment secured by the special counsel makes it clear that Facebook was used extensively in the campaign to disrupt the 2016 election. How did Russia do it?
👓 I Cracked Facebook’s New Algorithm And Tortured My Friends | Buzzfeed
Or, how to lose friends and influence people.
Reply to Laying the Standards for a Blogging Renaissance by Aaron Davis
A lot of your post also reminds me of Bryan Alexander’s relatively recent post I defy the world and to go back to RSS.
I completely get the concept of what you’re getting at with harkening back to the halcyon days of RSS. I certainly love, use, and rely on it heavily both for consumption as well as production. Of course there’s also still the competing standard of Atom still powering large parts of the web (including GNU Social networks like Mastodon). But almost no one looks back fondly on the feed format wars…
I think that while many are looking back on the “good old days” of the web, that we not forget the difficult and fraught history that has gotten us to where we are. We should learn from the mistakes made during the feed format wars and try to simplify things to not only move back, but to move forward at the same time.
Today, the easier pared-down standards that are better and simpler than either of these old and and difficult specs is simply adding Microformat classes to HTML (aka P.O.S.H) to create feeds. Unless one is relying on pre-existing infrastructure like WordPress, building and maintaining RSS feed infrastructure can be difficult at best, and updates almost never occur, particularly for specifications that support new social media related feeds including replies, likes, favorites, reposts, etc. The nice part is that if one knows how to write basic html, then one can create a simple feed by hand without having to learn the mark up or specifics of RSS. Most modern feed readers (except perhaps Feedly) support these new h-feeds as they’re known. Interestingly, some CMSes like WordPress support Microformats as part of their core functionality, though in WordPress’ case they only support a subsection of Microformats v1 instead of the more modern v2.
For those like you who are looking both backward and simultaneously forward there’s a nice chart of “Lost Infractructure” on the IndieWeb wiki which was created following a post by Anil Dash entitled The Lost Infrastructure of Social Media. Hopefully we can take back a lot of the ground the web has lost to social media and refashion it for a better and more flexible future. I’m not looking for just a “hipster-web”, but a new and demonstrably better web.
Some of the desire to go back to RSS is built into the problems we’re looking at with respect to algorithmic filtering of our streams (we’re looking at you Facebook.) While algorithms might help to filter out some of the cruft we’re not looking for, we’ve been ceding too much control to third parties like Facebook who have different motivations in presenting us material to read. I’d rather my feeds were closer to the model of fine dining rather than the junk food that the-McDonald’s-of-the-internet Facebook is providing. As I’m reading Cathy O’Neil’s book Weapons of Math Distraction, I’m also reminded that the black box that Facebook’s algorithm is is causing scale and visibility/transparency problems like the Russian ad buys which could have potentially heavily influenced the 2017 election in the United States. The fact that we can’t see or influence the algorithm is both painful and potentially destructive. If I could have access to tweaking a third-party transparent algorithm, I think it would provide me a lot more value.
As for OPML, it’s amazing what kind of power it has to help one find and subscribe to all sorts of content, particularly when it’s been hand curated and is continually self-dogfooded. One of my favorite tools are readers that allow one to subscribe to the OPML feeds of others, that way if a person adds new feeds to an interesting collection, the changes propagate to everyone following that feed. With this kind of simple technology those who are interested in curating things for particular topics (like the newsletter crowd) or even creating master feeds for class material in a planet-like fashion can easily do so. I can also see some worthwhile uses for this in journalism for newspapers and magazines. As an example, imagine if one could subscribe not only to 100 people writing about #edtech, but to only their bookmarked articles that have the tag edtech (thus filtering out their personal posts, or things not having to do with edtech). I don’t believe that Feedly supports subscribing to OPML (though it does support importing OPML files, which is subtly different), but other readers like Inoreader do.
I’m hoping to finish up some work on my own available OPML feeds to make subscribing to interesting curated content a bit easier within WordPress (over the built in, but now deprecated link manager functionality.) Since you mentioned it, I tried checking out the OPML file on your blog hoping for something interesting in the #edtech space. Alas… 😉 Perhaps something in the future?
📗 Started reading Weapons of Math Destruction by Cathy O’Neil
Based on the opening, I’m expecting some great examples many which are going to be as heavily biased as things like redlining seen in lending practices in the last century. They’ll come about as the result of missing data, missing assumptions, and even incorrect assumptions.
I’m aware that one of the biggest problems in so-called Big Data is that one needs to spend an inordinate amount of time cleaning up the data (often by hand) to get something even remotely usable. Even with this done I’ve heard about people not testing out their data and then relying on the results only to later find ridiculous error rates (sometimes over 100%!)
Of course there is some space here for the intelligent mathematician, scientist, or quant to create alternate models to take advantage of overlays in such areas, and particularly markets. By overlay here, I mean the gambling definition of the word in which the odds of a particular wager are higher than they should be, thus tending to favor an individual player (who typically has more knowledge or information about the game) rather than the house, which usually relies on a statistically biased game or by taking a rake off of the top of a parimutuel financial structure, or the bulk of other players who aren’t aware of the inequity. The mathematical models based on big data (aka Weapons of Math Destruction or WMDs) described here, particularly in financial markets, are going to often create such large inequities that users of alternate means can take tremendous advantage of the differences for their own benefits. Perhaps it’s the evolutionary competition that will more actively drive these differences to zero? If this is the case, it’s likely that it’s going to be a long time before they equilibrate based on current usage, especially when these algorithms are so opaque.
I suspect that some of this book will highlight uses of statistical errors and logical fallacies like cherry picking data, but which are hidden behind much more opaque mathematical algorithms thereby making them even harder to detect than simple policy decisions which use the simpler form. It’s this type of opacity that has caused major market shifts like the 2008 economic crash, which is still heavily unregulated to protect the masses.
I suspect that folks within Bryan Alexander’s book club will find that the example of Sarah Wysocki to be very compelling and damning evidence of how these big data algorithms work (or don’t work, as the case may be.) In this particular example, there are so many signals which are not only difficult to measure, if at all, that the thing they’re attempting to measure is so swamped with noise as to be unusable. Equally interesting, but not presented here, would be the alternate case of someone tremendously incompetent (perhaps who is cheating as indicated in the example) who actually scored tremendously high on the scale who was kept in their job.
Highlights, Quotes, & Marginalia
Do you see the paradox? An algorithm processes a slew of statistics and comes up with a probability that a certain person might be a bad hire, a risky borrower, a terrorist, or a miserable teacher. That probability is distilled into a score, which can turn someone’s life upside down. And yet when the person fights back, “suggestive” countervailing evidence simply won’t cut it. The case must be ironclad. The human victims of WMDs, we’ll see time and again, are held to a far higher standard of evidence than the algorithms themselves.
Highlight (yellow) – Introduction > Location xxxx
Added on Sunday, October 9, 2017
[WMDs are] opaque, unquestioned, and unaccountable, and they operate at a scale to sort, target or “optimize” millions of people. By confusing their findings with on-the-ground reality, most of them create pernicious WMD feedback loops.
Highlight (yellow) – Introduction > Location xxxx
Added on Sunday, October 9, 2017
The software is doing it’s job. The trouble is that profits end up serving as a stand-in, or proxy, for truth. We’ll see this dangerous confusion crop up again and again.
Highlight (yellow) – Introduction > Location xxxx
Added on Sunday, October 9, 2017
I’m reading this as part of Bryan Alexander’s online book club.
🔖 "Opposite-of"-information improves similarity calculations in phenotype ontologies
One of the most important use cases of ontologies is the calculation of similarity scores between a query and items annotated with classes of an ontology. The hierarchical structure of an ontology does not necessarily reflect all relevant aspects of the domain it is modelling, and this can reduce the performance of ontology-based search algorithms. For instance, the classes of phenotype ontologies may be arranged according to anatomical criteria, but individual phenotypic features may affect anatomic entities in opposite ways. Thus, "opposite" classes may be located in close proximity in an ontology; for example enlarged liver and small liver are grouped under abnormal liver size. Using standard similarity measures, these would be scored as being similar, despite in fact being opposites. In this paper, we use information about opposite ontology classes to extend two large phenotype ontologies, the human and the mammalian phenotype ontology. We also show that this information can be used to improve rankings based on similarity measures that incorporate this information. In particular, cosine similarity based measures show large improvements. We hypothesize this is due to the natural embedding of opposite phenotypes in vector space. We support the idea that the expressivity of semantic web technologies should be explored more extensively in biomedical ontologies and that similarity measures should be extended to incorporate more than the pure graph structure defined by the subclass or part-of relationships of the underlying ontologies.