The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions) and in structure (structural defects) trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a) sensitive to random errors and (b) restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials.
In the simplest view of transcriptional regulation, the expression of a gene is turned on or off by changes in the concentration of a transcription factor (TF). We use recent data on noise levels in gene expression to show that it should be possible to transmit much more than just one regulatory bit. Realizing this optimal information capacity would require that the dynamic range of TF concentrations used by the cell, the input/output relation of the regulatory module, and the noise in gene expression satisfy certain matching relations, which we derive. These results provide parameter-free, quantitative predictions connecting independently measurable quantities. Although we have considered only the simplified problem of a single gene responding to a single TF, we find that these predictions are in surprisingly good agreement with recent experiments on the Bicoid/Hunchback system in the early Drosophila embryo and that this system achieves ∼90% of its theoretical maximum information transmission.
To understand the structure of a large-scale biological, social, or technological network, it can be helpful to decompose the network into smaller subunits or modules. In this article, we develop an information-theoretic foundation for the concept of modularity in networks. We identify the modules of which the network is composed by finding an optimal compression of its topology, capitalizing on regularities in its structure. We explain the advantages of this approach and illustrate them by partitioning a number of real-world and model networks.
Information theory in biology by Henry Quastler, Editor. 1953. 273 pp. Urbana: University of Illinois Press
There are two kinds of scientific books worth reading. One is the monograph or treatise type, in which a more or less large field of science is presented in a systematic way, and in the form of a product as finished as possible at the given time. This kind of book may be considered a source of knowledge then available. The other type of book may present a collection of chapters or individual articles which do not claim to be a complete and systematic treatment of the subject; however the reader not only finds interesting ideas there, but the reading as such suggests new ideas. Such books are useful. For, although a rough and unfinished idea per se does not even remotely have the value of a well-elaborated scientific study, yet no elaborate study, no important theory, can be developed without first having a few rough ideas.
The book under consideration definitely belongs to the second category: it is a collection of essays. As the editor states in the Introduction (p. 2) : "The papers in this volume are of a very different degree of maturity. They range from authoritative reviews of well-known facts to hesitant and tentative formulations of embryonic ideas." He further states (p. 3): "We are aware of the fact that this volume is largely exploratory."
If the above is to be considered as a shortcoming, then the reviewer does not need to dwell on it, because the editor, and undoubtedly the authors, are fully aware of it, and duly warn the reader. If we evaluate the book from the point of view of how many ideas it suggests to the reader, then, at least so far as this reviewer is concerned, it must be considered a great success.
Background: Elucidating gene regulatory networks is crucial for understanding normal cell physiology and complex pathologic phenotypes. Existing computational methods for the genome-wide ``reverse engineering'' of such networks have been successful only for lower eukaryotes with simple genomes. Here we present ARACNE, a novel algorithm, using microarray expression profiles, specifically designed to scale up to the complexity of regulatory networks in mammalian cells, yet general enough to address a wider range of network deconvolution problems. This method uses an information theoretic approach to eliminate the majority of indirect interactions inferred by co-expression methods. Results: We prove that ARACNE reconstructs the network exactly (asymptotically) if the effect of loops in the network topology is negligible, and we show that the algorithm works well in practice, even in the presence of numerous loops and complex topologies. We assess ARACNE's ability to reconstruct transcriptional regulatory networks using both a realistic synthetic dataset and a microarray dataset from human B cells. On synthetic datasets ARACNE achieves very low error rates and outperforms established methods, such as Relevance Networks and Bayesian Networks. Application to the deconvolution of genetic networks in human B cells demonstrates ARACNE's ability to infer validated transcriptional targets of the c MYC proto-oncogene. We also study the effects of mis estimation of mutual information on network reconstruction, and show that algorithms based on mutual information ranking are more resilient to estimation errors.
MOTIVATION: Traditional sequence distances require an alignment and therefore are not directly applicable to the problem of whole genome phylogeny where events such as rearrangements make full length alignments impossible. We present a sequence distance that works on unaligned sequences using the information theoretical concept of Kolmogorov complexity and a program to estimate this distance.
RESULTS: We establish the mathematical foundations of our distance and illustrate its use by constructing a phylogeny of the Eutherian orders using complete unaligned mitochondrial genomes. This phylogeny is consistent with the commonly accepted one for the Eutherians. A second, larger mammalian dataset is also analyzed, yielding a phylogeny generally consistent with the commonly accepted one for the mammals.
AVAILABILITY: The program to estimate our sequence distance, is available at http://www.cs.cityu.edu.hk/~cssamk/gencomp/GenCompress1.htm. The distance matrices used to generate our phylogenies are available at http://www.math.uwaterloo.ca/~mli/distance.html.
MOTIVATION: As an increasing number of protein structures become available, the need for algorithms that can quantify the similarity between protein structures increases as well. Thus, the comparison of proteins' structures, and their clustering accordingly to a given similarity measure, is at the core of today's biomedical research. In this paper, we show how an algorithmic information theory inspired Universal Similarity Metric (USM) can be used to calculate similarities between protein pairs. The method, besides being theoretically supported, is surprisingly simple to implement and computationally efficient.
RESULTS: Structural similarity between proteins in four different datasets was measured using the USM. The sample employed represented alpha, beta, alpha-beta, tim-barrel, globins and serpine protein types. The use of the proposed metric allows for a correct measurement of similarity and classification of the proteins in the four datasets.
AVAILABILITY: All the scripts and programs used for the preparation of this paper are available at http://www.cs.nott.ac.uk/~nxk/USM/protocol.html. In that web-page the reader will find a brief description on how to use the various scripts and programs.
PMID: 14751983 DOI: 10.1093/bioinformatics/bth031
Living systems are distinguished in nature by their ability to maintain stable, ordered states far from equilibrium. This is despite constant buffeting by thermodynamic forces that, if unopposed, will inevitably increase disorder. Cells maintain a steep transmembrane entropy gradient by continuous application of information that permits cellular components to carry out highly specific tasks that import energy and export entropy. Thus, the study of information storage, flow and utilization is critical for understanding first principles that govern the dynamics of life. Initial biological applications of information theory (IT) used Shannon's methods to measure the information content in strings of monomers such as genes, RNA, and proteins. Recent work has used bioinformatic and dynamical systems to provide remarkable insights into the topology and dynamics of intracellular information networks. Novel applications of Fisher-, Shannon-, and Kullback-Leibler informations are promoting increased understanding of the mechanisms by which genetic information is converted to work and order. Insights into evolution may be gained by analysis of the the fitness contributions from specific segments of genetic information as well as the optimization process in which the fitness are constrained by the substrate cost for its storage and utilization. Recent IT applications have recognized the possible role of nontraditional information storage structures including lipids and ion gradients as well as information transmission by molecular flux across cell membranes. Many fascinating challenges remain, including defining the intercellular information dynamics of multicellular organisms and the role of disordered information storage and flow in disease.
PMID: 17083004 DOI: 10.1007/s11538-006-9141-5
The article covers most of the story fairly well, but leaves out some fundamental pieces of the business picture. It discusses a few particular cases of some very well known authors in the publishing world including the likes of Stephen King, Seth Godin, Paulo Coehlo, Greg Bear, and Neal Stephenson and how new digital publishing platforms are slowly changing the publishing business.
Indeed, many authors are bypassing traditional publishing routes and self-publishing their works directly online, and many are taking a much larger slice of the financial rewards in doing so.
The article, however, completely fails to mention or address how new online methods will be handling editorial and publicity functions differently than they’re handled now, and the future of the publishing business both now and in the future relies on both significantly.
It is interesting, and not somewhat ironic to note that, even in the case of this particular article, as the newspaper business in which it finds its outlet, has changed possibly more drastically than the book publishing business. If reading the article online, one is forced to click through four different pages on which a minimum of five different (and in my opinion, terrifically) intrusive ads appear per page. Without getting into the details of the subject of advertising, even more interesting, is that many of these ads are served up by Google Ads based on keywords, so three just on the first page were specifically publishing related.
Two of the ads were soliciting people to self-publish their own work. One touts how easy it is to publish, while the other glosses over the publicity portion with a glib statement offering an additional “555 Book Promotion Tips”! (I’m personally wondering if there can possibly be so many book promotion tips?)
Following the link in the third ad on the first page to its advertised site one discovers it states:
Although I find the portion about “baby steps” particularly entertaining, the first thing I’ll note is that the typical person is likely more readily equipped with the ability to distribute and market a children’s book than they might be at crafting one. Sadly however, there are very few who are capable of any of these tasks at a particularly high level, which is why there are relatively few new childrens’ books on the market each year and the majority of sales are older tried-and-true titles.
I hope the average reader sees the above come-on as the twenty-first century equivalent of the snake oil salesman who is tempting the typical wanna-be-author to call about their so-called “Free” Children’s Book Publishing Guide. I’m sure recipients of the guide end up paying the publisher to get their book out the door and more likely than not, it doesn’t end up in main stream brick-and-mortar establishments like Barnes & Noble or Borders, but only sells a handful of copies in easy to reach online venues like Amazon. I might suggest that the majority of sales will come directly from the author and his or her friends and family. I would further argue that neither now nor in the immediate or even distant future that many aspiring authors will be self-publishing much of anything and managing to make even a modest living by doing so.
Now of course all of the above begs the question of why exactly is it that people need/want a traditional publisher? What role or function do publishers actually perform for the business and why might they be around in the coming future?
The typical publishing houses perform three primary functions: filtering/editing material, distributing material, and promoting material. The current significant threat to the publishing business from online retailers like Amazon.com, Barnes & Noble, Borders, and even the recently launched Google Books is the distribution platforms themselves. It certainly doesn’t take much to strike low cost deals with online retailers to distribute books, and even less so when they’re distributing them as e-books which cuts out the most significant cost in the business — that of the paper to print them on. This leaves traditional publishing houses with two remaining functions: filtering/editing material and the promotion/publicity function.
The Los Angeles Times article certainly doesn’t state it, but everyone you meet on the street could tell you that writers like Stephen King don’t really need any more publicity than what they’ve got already. Their fan followings are so significantly large that they only need to tell two people online that they’ve got a new book and they’ll sell thousands of copies of any book they release. In fact, I might wager that Stephen King could release ten horrific (don’t mistake this for horror) novels before their low quality would likely begin to significantly erode his sales numbers. If he’s releasing them on Amazon.com and keeping 70% of the income compared to the average 6-18% most writers are receiving, he’s in phenomenally good shape. (I’m sure given his status and track record in the publishing business, he’s receiving a much larger portion of his book sales from his publisher than 18% by the way; I’d also be willing to bet if he approached Amazon directly, he could get a better distribution deal than the currently offered 70/30 split.)
What will eventually sway the majority of the industry is when completely unknown new writers can publish into these electronic platforms and receive the marketing push they need to become the next Stephen King or Neal Stephenson. At the moment, none of the major e-book publishing platforms are giving much, if any, of this type of publicity to any of their new authors, and many aren’t even giving it to the major writers. Thus, currently, even the major writers are relying primarily on their traditional publishers for publicity to push their sales.
I will admit that when 80% of all readers are online and consuming their reading material in e-book format and utilizing the full support of social media and cross-collateralization of the best portion of their word-of-mouth, that perhaps authors won’t need as much PR help. But until that day platforms will significantly need to ramp it up. Financially one wonders what a platform like Amazon.com will charge for a front and center advertisement for a new best-seller to push sales? Will they be looking for a 50/50 split on those sales? Exclusivity in their channel? This is where the business will become even more dicey. Suddenly authors who think they’re shedding the chains of their current publishers will be shackling themselves with newer and more significant manacles and leg irons.
The last piece of the business that needs to be subsumed is the editorial portion of the manufacturing process. Agents and editors serve a significant role in that they filter out thousands and thousands of terrifically unreadable books. In fact, one might argue that even now they’re letting far too many marginal books through the system and into the market.
If we consider the millions of books housed in the Library of Congress and their general circulation, one might realize that only one tenth of a percent or less of books are receiving all the attention. Certainly classics like William Shakespeare and Charles Dickens are more widely read than the millions of nearly unknown writers who take up just as much shelf space in that esteemed library.
Most houses publish on the order of ten to a hundred titles per year, but they rely heavily on only one or two of them being major hits to cover not only the cost of the total failures, but to provide the company with some semblance of profit. (This model is not unlike the same way that the feature film business works in Hollywood; if you throw enough spaghetti, something is bound to stick.)
The question then becomes: “how does the e-publishing business accomplish this editing and publicity in a better and less expensive way?” This question needs to be looked at from a pre-publication as well as a post-publication perspective.
From the pre-publication viewpoint the Los Angeles Times article interestingly mentions that many authors appreciate having a “conversation” with their readers and allowing it to inform their work. However, creators of the stature of Stephen King cannot possibly take in and consume criticism from their thousands of fans in any reasonable way not to mention the detriment to their output if they were forced to read and deal with all that criticism and feedback. Even smaller stature authors often find it overwhelming to take in criticism from their agents, editors, and even a small handful of close friends, family, and colleagues. Taking a quick look at the acknowledgement portions of a few dozen books generally reveals fewer than 10 people being thanked much less hundreds of names from their general reading public – people they neither know well, much less trust implicitly.
From the post-publication perspective, both printing on demand and e-book formats excise one of the largest costs of the supply chain management portions of the publishing world, but staff costs and salary are certainly very close in line after them. One might argue that social media is the answer here and we can rely on services like LibraryThing, GoodReads, and others to supply this editorial/publicity process and eventually broad sampling and positive and negative reviews will win the day to cross good, but unknown writers into the popular consciousness. This may sound reasonable on the surface, but take a look at similar large recommendation services in the social media space like Yelp. These services already have hundreds of thousands of users, but they’re not nearly as useful as they need to be from a recommendation perspective and they’re not terrifically reliable in that they’re very often easily gamed. (Consider the number of positive reviews that appear on Yelp that are most likely written by the proprietors of the establishments themselves.) This outlet for editorial certainly has the potential to improve in the coming years, but it will still be quite some time before it has the possibility of totally ousting the current editorial and filtering regime.
From a mathematical and game theoretical perspective one must also consider how many people are going to subject themselves (willingly and for free) to some really bad reading material and then bother to write either a good or bad review of their experience. This particularly when the vast majority of readers are more than content to ride the coattails of the “suckers” who do the majority of the review work.
There are certainly a number of other factors at play in the publishing business as it changes form, but those discussed above are certainly significant in its continuing evolution. Given the state of technology and its speed, if people feel that the tradition publishing world will collapse, then we should take its evolution to the nth degree. Using an argument like this, then even platforms like Amazon and Google Books will eventually need to narrow their financial split with authors down to infinitesimal margins as authors should be able to control every portion of their work without any interlopers taking any portion of their proceeds. We’ll leave the discussion of whether all of this might fit into the concept of the tragedy of the commons for a future date.
The Los Angeles Times published an online article entitled “Barnes & Noble says e-books outsell physical books online.” While I understand that this is a quiet holiday week, the Times should be doing better work than simply republishing press releases from corporations trying to garner post-holiday sales. Some of the thoughts they might have included:
“Customers bought or downloaded 1 million e-books on Christmas day alone”?
There is certainly no debating the continuous growth of the electronic book industry; even Amazon.com has said they’re selling more electronic books than physical books. The key word in the quoted sentence above is “or”. I seriously doubt a significant portion of the 1 million e-books were actually purchased on Christmas day. The real investigative journalism here would have discovered the percentage of free (primarily public domain) e-books that were downloaded versus those that were purchased.
Given that analysts estimate 2 million Nooks have sold (the majority within the last six months and likely the preponderance of them since Thanksgiving) this means that half of all Nook users downloaded at least one book on Christmas day. Perhaps this isn’t surprising for those who would have received a Nook as a holiday present and may have downloaded one or more e-books to test out its functionality. The real question will remain, how many of these 2 million users will actually be purchasing books in e-book format 6 months from now?
I’d also be curious to know if the analyst estimate is 2 million units sold to consumers or 2 million shipped to retail? I would bet that it is units shipped and not sold.
I hope the Times will be doing something besides transcription (or worse: cut and paste) after the holidays.
Among other interesting observations in it, he calls attention to the fact that, “according to the journal Nature, a third of all studies never even get cited, let alone repeated.”
For scholars of Fisher, Popper, and Kuhn, some of this discussion won’t be quite so novel, but for anyone designing scientific experiments, the effects discussed here are certainly worthy of notice and further study and scrutiny.
Data analytics are changing the ways to judge the influence of papers and journals.
The base question is are citations the best indicator of impact, or are there other better emerging methods of indicating the impact of scholarly work?