We describe the evolution of macromolecules as an information transmission process and apply tools from Shannon information theory to it. This allows us to isolate three independent, competing selective pressures that we term compression, transmission, and neutrality selection. The first two affect genome length: the pressure to conserve resources by compressing the code, and the pressure to acquire additional information that improves the channel, increasing the rate of information transmission into each offspring. Noisy transmission channels (replication with mutations) gives rise to a third pressure that acts on the actual encoding of information; it maximizes the fraction of mutations that are neutral with respect to the phenotype. This neutrality selection has important implications for the evolution of evolvability. We demonstrate each selective pressure in experiments with digital organisms.
To be published in J. theor. Biology 222 (2003) 477-483
This is the third in a series of three papers devoted to energy flow and entropy changes in chemical and biological processes, and their relations to the thermodynamics of computation. The previous two papers have developed reversible chemical transformations as idealizations for studying physiology and natural selection, and derived bounds from the second law of thermodynamics, between information gain in an ensemble and the chemical work required to produce it. This paper concerns the explicit mapping of chemistry to computation, and particularly the Landauer decomposition of irreversible computations, in which reversible logical operations generating no heat are separated from heat-generating erasure steps which are logically irreversible but thermodynamically reversible. The Landauer arrangement of computation is shown to produce the same entropy-flow diagram as that of the chemical Carnot cycles used in the second paper of the series to idealize physiological cycles. The specific application of computation to data compression and error-correcting encoding also makes possible a Landauer analysis of the somewhat different problem of optimal molecular recognition, which has been considered as an information theory problem. It is shown here that bounds on maximum sequence discrimination from the enthalpy of complex formation, although derived from the same logical model as the Shannon theorem for channel capacity, arise from exactly the opposite model for erasure.
This is the second in a series of three papers devoted to energy flow and entropy changes in chemical and biological processes, and to their relations to the thermodynamics of computation. In the first paper of the series, it was shown that a general-form dimensional argument from the second law of thermodynamics captures a number of scaling relations governing growth and development across many domains of life. It was also argued that models of physiology based on reversible transformations provide sensible approximations within which the second-law scaling is realized. This paper provides a formal basis for decomposing general cyclic, fixed-temperature chemical reactions, in terms of the chemical equivalent of Carnot's cycle for heat engines. It is shown that the second law relates the minimal chemical work required to perform a cycle to the Kullback–Leibler divergence produced in its chemical output ensemble from that of a Gibbs equilibrium. Reversible models of physiology are used to create reversible models of natural selection, which relate metabolic energy requirements to information gain under optimal conditions. When dissipation is added to models of selection, the second-law constraint is generalized to a relation between metabolic work and the combined energies of growth and maintenance.
This is the first of three papers analyzing the representation of information in the biosphere, and the energetic constraints limiting the imposition or maintenance of that information. Biological information is inherently a chemical property, but is equally an aspect of control flow and a result of processes equivalent to computation. The current paper develops the constraints on a theory of biological information capable of incorporating these three characterizations and their quantitative consequences. The paper illustrates the need for a theory linking energy and information by considering the problem of existence and reslience of the biosphere, and presents empirical evidence from growth and development at the organismal level suggesting that the theory developed will capture relevant constraints on real systems. The main result of the paper is that the limits on the minimal energetic cost of information flow will be tractable and universal whereas the assembly of more literal process models into a system-level description often is not. The second paper in the series then goes on to construct reversible models of energy and information flow in chemistry which achieve the idealized limits, and the third paper relates these to fundamental operations of computation.
No one can escape a sense of wonder when looking at an organism from within. From the humblest amoeba to man, from the smallest cell organelle to the amazing human brain, life presents us with example after example of highly ordered cellular matter, precisely organized and shaped to perform coordinated functions. But where does this order spring from? How does a living organism manage to do what nonliving things cannot do--bring forth and maintain all that order against the unrelenting, disordering pressures of the universe? In The Touchstone of Life, world-renowned biophysicist Werner Loewenstein seeks answers to these ancient riddles by applying information theory to recent discoveries in molecular biology. Taking us into a fascinating microscopic world, he lays bare an all-pervading communication network inside and between our cells--a web of extraordinary beauty, where molecular information flows in gracefully interlaced circles. Loewenstein then takes us on an exhilarating journey along that web and we meet its leading actors, the macromolecules, and see how they extract order out of the erratic quantum world; and through the powerful lens of information theory, we are let in on their trick, the most dazzling of magician's acts, whereby they steal form out of formlessness. The Touchstone of Life flashes with fresh insights into the mystery of life. Boldly straddling the line between biology and physics, the book offers a breathtaking view of that hidden world where molecular information turns the wheels of life. Loewenstein makes these complex scientific subjects lucid and fascinating, as he sheds light on the most fundamental aspects of our existence.
This highly interdisciplinary book discusses the phenomenon of life, including its origin and evolution (and also human cultural evolution), against the background of thermodynamics, statistical mechanics, and information theory. Among the central themes is the seeming contradiction between the second law of thermodynamics and the high degree of order and complexity produced by living systems. This paradox has its resolution in the information content of the Gibbs free energy that enters the biosphere from outside sources, as the author shows. The role of information in human cultural evolution is another focus of the book. One of the final chapters discusses the merging of information technology and biotechnology into a new discipline — bio-information technology.
Information Theory, Evolution and the Origin of Life presents a timely introduction to the use of information theory and coding theory in molecular biology. The genetical information system, because it is linear and digital, resembles the algorithmic language of computers. George Gamow pointed out that the application of Shannon's information theory breaks genetics and molecular biology out of the descriptive mode into the quantitative mode and Dr Yockey develops this theme, discussing how information theory and coding theory can be applied to molecular biology. He discusses how these tools for measuring the information in the sequences of the genome and the proteome are essential for our complete understanding of the nature and origin of life. The author writes for the computer competent reader who is interested in evolution and the origins of life.
The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions) and in structure (structural defects) trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a) sensitive to random errors and (b) restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials.
In the simplest view of transcriptional regulation, the expression of a gene is turned on or off by changes in the concentration of a transcription factor (TF). We use recent data on noise levels in gene expression to show that it should be possible to transmit much more than just one regulatory bit. Realizing this optimal information capacity would require that the dynamic range of TF concentrations used by the cell, the input/output relation of the regulatory module, and the noise in gene expression satisfy certain matching relations, which we derive. These results provide parameter-free, quantitative predictions connecting independently measurable quantities. Although we have considered only the simplified problem of a single gene responding to a single TF, we find that these predictions are in surprisingly good agreement with recent experiments on the Bicoid/Hunchback system in the early Drosophila embryo and that this system achieves ∼90% of its theoretical maximum information transmission.
Information theory in biology by Henry Quastler, Editor. 1953. 273 pp. Urbana: University of Illinois Press
There are two kinds of scientific books worth reading. One is the monograph or treatise type, in which a more or less large field of science is presented in a systematic way, and in the form of a product as finished as possible at the given time. This kind of book may be considered a source of knowledge then available. The other type of book may present a collection of chapters or individual articles which do not claim to be a complete and systematic treatment of the subject; however the reader not only finds interesting ideas there, but the reading as such suggests new ideas. Such books are useful. For, although a rough and unfinished idea per se does not even remotely have the value of a well-elaborated scientific study, yet no elaborate study, no important theory, can be developed without first having a few rough ideas.
The book under consideration definitely belongs to the second category: it is a collection of essays. As the editor states in the Introduction (p. 2) : "The papers in this volume are of a very different degree of maturity. They range from authoritative reviews of well-known facts to hesitant and tentative formulations of embryonic ideas." He further states (p. 3): "We are aware of the fact that this volume is largely exploratory."
If the above is to be considered as a shortcoming, then the reviewer does not need to dwell on it, because the editor, and undoubtedly the authors, are fully aware of it, and duly warn the reader. If we evaluate the book from the point of view of how many ideas it suggests to the reader, then, at least so far as this reviewer is concerned, it must be considered a great success.
MOTIVATION: As an increasing number of protein structures become available, the need for algorithms that can quantify the similarity between protein structures increases as well. Thus, the comparison of proteins' structures, and their clustering accordingly to a given similarity measure, is at the core of today's biomedical research. In this paper, we show how an algorithmic information theory inspired Universal Similarity Metric (USM) can be used to calculate similarities between protein pairs. The method, besides being theoretically supported, is surprisingly simple to implement and computationally efficient.
RESULTS: Structural similarity between proteins in four different datasets was measured using the USM. The sample employed represented alpha, beta, alpha-beta, tim-barrel, globins and serpine protein types. The use of the proposed metric allows for a correct measurement of similarity and classification of the proteins in the four datasets.
AVAILABILITY: All the scripts and programs used for the preparation of this paper are available at http://www.cs.nott.ac.uk/~nxk/USM/protocol.html. In that web-page the reader will find a brief description on how to use the various scripts and programs.
PMID: 14751983 DOI: 10.1093/bioinformatics/bth031
Living systems are distinguished in nature by their ability to maintain stable, ordered states far from equilibrium. This is despite constant buffeting by thermodynamic forces that, if unopposed, will inevitably increase disorder. Cells maintain a steep transmembrane entropy gradient by continuous application of information that permits cellular components to carry out highly specific tasks that import energy and export entropy. Thus, the study of information storage, flow and utilization is critical for understanding first principles that govern the dynamics of life. Initial biological applications of information theory (IT) used Shannon's methods to measure the information content in strings of monomers such as genes, RNA, and proteins. Recent work has used bioinformatic and dynamical systems to provide remarkable insights into the topology and dynamics of intracellular information networks. Novel applications of Fisher-, Shannon-, and Kullback-Leibler informations are promoting increased understanding of the mechanisms by which genetic information is converted to work and order. Insights into evolution may be gained by analysis of the the fitness contributions from specific segments of genetic information as well as the optimization process in which the fitness are constrained by the substrate cost for its storage and utilization. Recent IT applications have recognized the possible role of nontraditional information storage structures including lipids and ion gradients as well as information transmission by molecular flux across cell membranes. Many fascinating challenges remain, including defining the intercellular information dynamics of multicellular organisms and the role of disordered information storage and flow in disease.
PMID: 17083004 DOI: 10.1007/s11538-006-9141-5