Reply to Laying the Standards for a Blogging Renaissance by Aaron Davis

Laying the Standards for a Blogging Renaissance by Aaron Davis (Read Write Respond)
With the potential demise of social media, does this offer a possible rebirth of blogging communities and the standards they are built upon?

Aaron, some excellent thoughts and pointers.

A lot of your post also reminds me of Bryan Alexander’s relatively recent post I defy the world and to go back to RSS.

I completely get the concept of what you’re getting at with harkening back to the halcyon days of RSS. I certainly love, use, and rely on it heavily both for consumption as well as production. Of course there’s also still the competing standard of Atom still powering large parts of the web (including GNU Social networks like Mastodon). But almost no one looks back fondly on the feed format wars…

I think that while many are looking back on the “good old days” of the web, that we not forget the difficult and fraught history that has gotten us to where we are. We should learn from the mistakes made during the feed format wars and try to simplify things to not only move back, but to move forward at the same time.

Today, the easier pared-down standards that are better and simpler than either of these old and and difficult specs is simply adding Microformat classes to HTML (aka P.O.S.H) to create feeds. Unless one is relying on pre-existing infrastructure like WordPress, building and maintaining RSS feed infrastructure can be difficult at best, and updates almost never occur, particularly for specifications that support new social media related feeds including replies, likes, favorites, reposts, etc. The nice part is that if one knows how to write basic html, then one can create a simple feed by hand without having to learn the mark up or specifics of RSS. Most modern feed readers (except perhaps Feedly) support these new h-feeds as they’re known. Interestingly, some CMSes like WordPress support Microformats as part of their core functionality, though in WordPress’ case they only support a subsection of Microformats v1 instead of the more modern v2.

For those like you who are looking both backward and simultaneously forward there’s a nice chart of “Lost Infractructure” on the IndieWeb wiki which was created following a post by Anil Dash entitled The Lost Infrastructure of Social Media. Hopefully we can take back a lot of the ground the web has lost to social media and refashion it for a better and more flexible future. I’m not looking for just a “hipster-web”, but a new and demonstrably better web.

The Lost Infrastructure of the Web from the IndieWeb Wiki (CC0)

Some of the desire to go back to RSS is built into the problems we’re looking at with respect to algorithmic filtering of our streams (we’re looking at you Facebook.) While algorithms might help to filter out some of the cruft we’re not looking for, we’ve been ceding too much control to third parties like Facebook who have different motivations in presenting us material to read. I’d rather my feeds were closer to the model of fine dining rather than the junk food that the-McDonald’s-of-the-internet Facebook is providing. As I’m reading Cathy O’Neil’s book Weapons of Math Distraction, I’m also reminded that the black box that Facebook’s algorithm is is causing scale and visibility/transparency problems like the Russian ad buys which could have potentially heavily influenced the 2017 election in the United States. The fact that we can’t see or influence the algorithm is both painful and potentially destructive. If I could have access to tweaking a third-party transparent algorithm, I think it would provide me a lot more value.

As for OPML, it’s amazing what kind of power it has to help one find and subscribe to all sorts of content, particularly when it’s been hand curated and is continually self-dogfooded. One of my favorite tools are readers that allow one to subscribe to the OPML feeds of others, that way if a person adds new feeds to an interesting collection, the changes propagate to everyone following that feed. With this kind of simple technology those who are interested in curating things for particular topics (like the newsletter crowd) or even creating master feeds for class material in a planet-like fashion can easily do so. I can also see some worthwhile uses for this in journalism for newspapers and magazines. As an example, imagine if one could subscribe not only to 100 people writing about #edtech, but to only their bookmarked articles that have the tag edtech (thus filtering out their personal posts, or things not having to do with edtech). I don’t believe that Feedly supports subscribing to OPML (though it does support importing OPML files, which is subtly different), but other readers like Inoreader do.

I’m hoping to finish up some work on my own available OPML feeds to make subscribing to interesting curated content a bit easier within WordPress (over the built in, but now deprecated link manager functionality.) Since you mentioned it, I tried checking out the OPML file on your blog hoping for something interesting in the #edtech space. Alas… 😉 Perhaps something in the future?

Syndicated copies to:

📗 Started reading Weapons of Math Destruction by Cathy O’Neil

📖 Read introduction of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O’Neil

Based on the opening, I’m expecting some great examples many which are going to be as heavily biased as things like redlining seen in lending practices in the last century. They’ll come about as the result of missing data, missing assumptions, and even incorrect assumptions.

I’m aware that one of the biggest problems in so-called Big Data is that one needs to spend an inordinate amount of time cleaning up the data (often by hand) to get something even remotely usable. Even with this done I’ve heard about people not testing out their data and then relying on the results only to later find ridiculous error rates (sometimes over 100%!)

Of course there is some space here for the intelligent mathematician, scientist, or quant to create alternate models to take advantage of overlays in such areas, and particularly markets. By overlay here, I mean the gambling definition of the word in which the odds of a particular wager are higher than they should be, thus tending to favor an individual player (who typically has more knowledge or information about the game) rather than the house, which usually relies on a statistically biased game or by taking a rake off of the top of a parimutuel financial structure, or the bulk of other players who aren’t aware of the inequity. The mathematical models based on big data (aka Weapons of Math Destruction or WMDs) described here, particularly in financial markets, are going to often create such large inequities that users of alternate means can take tremendous advantage of the differences for their own benefits. Perhaps it’s the evolutionary competition that will more actively drive these differences to zero? If this is the case, it’s likely that it’s going to be a long time before they equilibrate based on current usage, especially when these algorithms are so opaque.

I suspect that some of this book will highlight uses of statistical errors and logical fallacies like cherry picking data, but which are hidden behind much more opaque mathematical algorithms thereby making them even harder to detect than simple policy decisions which use the simpler form. It’s this type of opacity that has caused major market shifts like the 2008 economic crash, which is still heavily unregulated to protect the masses.

I suspect that folks within Bryan Alexander’s book club will find that the example of Sarah Wysocki to be very compelling and damning evidence of how these big data algorithms work (or don’t work, as the case may be.) In this particular example, there are so many signals which are not only difficult to measure, if at all, that the thing they’re attempting to measure is so swamped with noise as to be unusable. Equally interesting, but not presented here, would be the alternate case of someone tremendously incompetent (perhaps who is cheating as indicated in the example) who actually scored tremendously high on the scale who was kept in their job.

Highlights, Quotes, & Marginalia

Introduction

Do you see the paradox? An algorithm processes a slew of statistics and comes up with a probability that a certain person might be a bad hire, a risky borrower, a terrorist, or a miserable teacher. That probability is distilled into a score, which can turn someone’s life upside down. And yet when the person fights back, “suggestive” countervailing evidence simply won’t cut it. The case must be ironclad. The human victims of WMDs, we’ll see time and again, are held to a far higher standard of evidence than the algorithms themselves.

Highlight (yellow) – Introduction > Location xxxx
Added on Sunday, October 9, 2017

[WMDs are] opaque, unquestioned, and unaccountable, and they operate at a scale to sort, target or “optimize” millions of people. By confusing their findings with on-the-ground reality, most of them create pernicious WMD feedback loops.

Highlight (yellow) – Introduction > Location xxxx
Added on Sunday, October 9, 2017

The software is doing it’s job. The trouble is that profits end up serving as a stand-in, or proxy, for truth. We’ll see this dangerous confusion crop up again and again.

Highlight (yellow) – Introduction > Location xxxx
Added on Sunday, October 9, 2017

I’m reading this as part of Bryan Alexander’s online book club.

Syndicated copies to:

🔖 "Opposite-of"-information improves similarity calculations in phenotype ontologies

"Opposite-of"-information improves similarity calculations in phenotype ontologies by Sebastian Koehler, Peter N. Robinson, Christopher J. Mungall (bioRxiv)
One of the most important use cases of ontologies is the calculation of similarity scores between a query and items annotated with classes of an ontology. The hierarchical structure of an ontology does not necessarily reflect all relevant aspects of the domain it is modelling, and this can reduce the performance of ontology-based search algorithms. For instance, the classes of phenotype ontologies may be arranged according to anatomical criteria, but individual phenotypic features may affect anatomic entities in opposite ways. Thus, "opposite" classes may be located in close proximity in an ontology; for example enlarged liver and small liver are grouped under abnormal liver size. Using standard similarity measures, these would be scored as being similar, despite in fact being opposites. In this paper, we use information about opposite ontology classes to extend two large phenotype ontologies, the human and the mammalian phenotype ontology. We also show that this information can be used to improve rankings based on similarity measures that incorporate this information. In particular, cosine similarity based measures show large improvements. We hypothesize this is due to the natural embedding of opposite phenotypes in vector space. We support the idea that the expressivity of semantic web technologies should be explored more extensively in biomedical ontologies and that similarity measures should be extended to incorporate more than the pure graph structure defined by the subclass or part-of relationships of the underlying ontologies.
Syndicated copies to: