One of the most important use cases of ontologies is the calculation of similarity scores between a query and items annotated with classes of an ontology. The hierarchical structure of an ontology does not necessarily reflect all relevant aspects of the domain it is modelling, and this can reduce the performance of ontology-based search algorithms. For instance, the classes of phenotype ontologies may be arranged according to anatomical criteria, but individual phenotypic features may affect anatomic entities in opposite ways. Thus, "opposite" classes may be located in close proximity in an ontology; for example enlarged liver and small liver are grouped under abnormal liver size. Using standard similarity measures, these would be scored as being similar, despite in fact being opposites. In this paper, we use information about opposite ontology classes to extend two large phenotype ontologies, the human and the mammalian phenotype ontology. We also show that this information can be used to improve rankings based on similarity measures that incorporate this information. In particular, cosine similarity based measures show large improvements. We hypothesize this is due to the natural embedding of opposite phenotypes in vector space. We support the idea that the expressivity of semantic web technologies should be explored more extensively in biomedical ontologies and that similarity measures should be extended to incorporate more than the pure graph structure defined by the subclass or part-of relationships of the underlying ontologies.
@lpachter Your cup of tea over at UCLA next week? Regulatory & Epigenetic Stochasticity in Development & Disease http://www.ipam.ucla.edu/programs/workshops/regulatory-and-epigenetic-stochasticity-in-development-and-disease
Pachter, a computational biologist, returns to CalTech to study the role and function of RNA.
Pachter, a computational biologist and Caltech alumnus, returns to the Institute to study the role and function of RNA.
Lior Pachter (BS ’94) is Caltech’s new Bren Professor of Computational Biology. Recently, he was elected a fellow of the International Society for Computational Biology, one of the highest honors in the field. We sat down with him to discuss the emerging field of applying computational methods to biology problems, the transition from mathematics to biology, and his return to Pasadena. Continue reading “👓 A Conversation with @LPachter (BS ’94) | Caltech”
The interplay between structural connections and emerging information flow in the human brain remains an open research problem. A recent study observed global patterns of directional information flow in empirical data using the measure of transfer entropy. For higher frequency bands, the overall direction of information flow was from posterior to anterior regions whereas an anterior-to-posterior pattern was observed in lower frequency bands. In this study, we applied a simple Susceptible-Infected-Susceptible (SIS) epidemic spreading model on the human connectome with the aim to reveal the topological properties of the structural network that give rise to these global patterns. We found that direct structural connections induced higher transfer entropy between two brain regions and that transfer entropy decreased with increasing distance between nodes (in terms of hops in the structural network). Applying the SIS model, we were able to confirm the empirically observed opposite information flow patterns and posterior hubs in the structural network seem to play a dominant role in the network dynamics. For small time scales, when these hubs acted as strong receivers of information, the global pattern of information flow was in the posterior-to-anterior direction and in the opposite direction when they were strong senders. Our analysis suggests that these global patterns of directional information flow are the result of an unequal spatial distribution of the structural degree between posterior and anterior regions and their directions seem to be linked to different time scales of the spreading process.
Epigenetics refers to information transmitted during cell division other than the DNA sequence per se, and it is the language that distinguishes stem cells from somatic cells, one organ from another, and even identical twins from each other. In contrast to the DNA sequence, the epigenome is relatively susceptible to modification by the environment as well as stochastic perturbations over time, adding to phenotypic diversity in the population. Despite its strong ties to the environment, epigenetics has never been well reconciled to evolutionary thinking, and in fact there is now strong evidence against the transmission of so-called “epi-alleles,” i.e. epigenetic modifications that pass through the germline.
However, genetic variants that regulate stochastic fluctuation of gene expression and phenotypes in the offspring appear to be transmitted as an epigenetic or even Lamarckian trait. Furthermore, even the normal process of cellular differentiation from a single cell to a complex organism is not understood well from a mathematical point of view. There is increasingly strong evidence that stem cells are highly heterogeneous and in fact stochasticity is necessary for pluripotency. This process appears to be tightly regulated through the epigenome in development. Moreover, in these biological contexts, “stochasticity” is hardly synonymous with “noise”, which often refers to variation which obscures a “true signal” (e.g., measurement error) or which is structural, as in physics (e.g., quantum noise). In contrast, “stochastic regulation” refers to purposeful, programmed variation; the fluctuations are random but there is no true signal to mask.
This workshop will serve as a forum for scientists and engineers with an interest in computational biology to explore the role of stochasticity in regulation, development and evolution, and its epigenetic basis. Just as thinking about stochasticity was transformative in physics and in some areas of biology, it promises to fundamentally transform modern genetics and help to explain phase transitions such as differentiation and cancer.
This workshop will include a poster session; a request for poster titles will be sent to registered participants in advance of the workshop.
Adam Arkin (Lawrence Berkeley Laboratory)
Gábor Balázsi (SUNY Stony Brook)
Domitilla Del Vecchio (Massachusetts Institute of Technology)
Michael Elowitz (California Institute of Technology)
Andrew Feinberg (Johns Hopkins University)
Don Geman (Johns Hopkins University)
Anita Göndör (Karolinska Institutet)
John Goutsias (Johns Hopkins University)
Garrett Jenkinson (Johns Hopkins University)
Andre Levchenko (Yale University)
Olgica Milenkovic (University of Illinois)
Johan Paulsson (Harvard University)
Leor Weinberger (University of California, San Francisco (UCSF))
📖 Read pages 191 – 215 of At Home in the Universe by Stuart Kauffman
In chapter 9 Kauffman applies his NK landscape model to explain the evolution seen in the Cambrian explosion and the re-population following the Permian extinction. He then follows it up with some interesting discussion which applies it to technological innovation, learning curves, and growth in areas of economics. The chapter has given me a few thoughts on the shape and structure (or “landscape”) of mathematics. I’ll come back to this section to see if I can’t extend the analogy to come up with something unique in math.
The beginning of Chapter 10 he begins discussing power laws and covering the concept of emergence from ecosystems, coevolution, and the evolution of coevolution. In one part he evokes Adam Smith’s invisible hand which seemingly benefits everyone acting for its own selfishness. Though this seems to be the case since it was written, I do wonder what timescales and conditions it works under. As an example, selfishness on the individual, corporate, nation, and other higher levels may not necessarily be so positive with respect to potential issues like climate change which may drastically affect the landscape on and in which we live.
Syndicated copies to:
This book originated from a series of papers which were published in "Die Naturwissenschaften" in 1977178. Its division into three parts is the reflection of a logic structure, which may be abstracted in the form of three theses:
A. Hypercycles are a principle of natural self-organization allowing an integration and coherent evolution of a set of functionally coupled self-replicative entities.
B. Hypercycles are a novel class of nonlinear reaction networks with unique properties, amenable to a unified mathematical treatment.
C. Hypercycles are able to originate in the mutant distribution of a single Darwinian quasi-species through stabilization of its diverging mutant genes. Once nucleated hypercycles evolve to higher complexity by a process analogous to gene duplication and specialization. In order to outline the meaning of the first statement we may refer to another principle of material self organization, namely to Darwin's principle of natural selection. This principle as we see it today represents the only understood means for creating information, be it the blue print for a complex living organism which evolved from less complex ancestral forms, or be it a meaningful sequence of letters the selection of which can be simulated by evolutionary model games.
The intimate relation between biology and cognition can be formally examined through statistical models constrained by the asymptotic limit theorems of communication theory, augmented by methods from statistical mechanics and nonequilibrium thermodynamics. Cognition, often involving submodules that act as information sources, is ubiquitous across the living state. Less metabolic free energy is consumed by permitting crosstalk between biological information sources than by isolating them, leading to evolutionary exaptations that assemble shifting, tunable cognitive arrays at multiple scales, and levels of organization to meet dynamic patterns of threat and opportunity. Cognition is thus necessary for life, but it is not sufficient: An organism represents a highly patterned outcome of path-dependent, blind, variation, selection, interaction, and chance extinction in the context of an adequate flow of free energy and an environment fit for development. Complex, interacting cognitive processes within an organism both record and instantiate those evolutionary and developmental trajectories.
A system responding to a stochastic driving signal can be interpreted as computing, by means of its dynamics, an implicit model of the environmental variables. The system’s state retains information about past environmental fluctuations, and a fraction of this information is predictive of future ones. The remaining nonpredictive information reflects model complexity that does not improve predictive power, and thus represents the ineffectiveness of the model. We expose the fundamental equivalence between this model inefficiency and thermodynamic inefficiency, measured by dissipation. Our results hold arbitrarily far from thermodynamic equilibrium and are applicable to a wide range of systems, including biomolecular machines. They highlight a profound connection between the effective use of information and efficient thermodynamic operation: any system constructed to keep memory about its environment and to operate with maximal energetic efficiency has to be predictive.
Notions like meaning, signal, intentionality, are difficult to relate to a physical word. I study a purely physical definition of "meaningful information", from which these notions can be derived. It is inspired by a model recently illustrated by Kolchinsky and Wolpert, and improves on Dretske classic work on the relation between knowledge and information. I discuss what makes a physical process into a "signal".
Understanding the emergence and robustness of life requires accounting for both chemical specificity and statistical generality. We argue that the reverse of a common observation—that life requires a source of free energy to persist—provides an appropriate principle to understand the emergence, organization, and persistence of life on earth. Life, and in particular core biochemistry, has many properties of a relaxation channel that was driven into existence by free energy stresses from the earth's geochemistry. Like lightning or convective storms, the carbon, nitrogen, and phosphorus fluxes through core anabolic pathways make sense as the order parameters in a phase transition from an abiotic to a living state of the geosphere. Interpreting core pathways as order parameters would both explain their stability over billions of years, and perhaps predict the uniqueness of specific optimal chemical pathways.
Life was long thought to obey its own set of rules. But as simple systems show signs of lifelike behavior, scientists are arguing about whether this apparent complexity is all a consequence of thermodynamics.
This is a nice little general interest article by Philip Ball that does a relatively good job of covering several of my favorite topics (information theory, biology, complexity) for the layperson. While it stays relatively basic, it links to a handful of really great references, many of which I’ve already read, though several appear to be new to me. 
While Ball has a broad area of interests and coverage in his work, he’s certainly one of the best journalists working in this subarea of interests today. I highly recommend his work to those who find this area interesting.
Scientists are uncovering how our bodies — and everything within them — tell right from left.
NIMBioS will host an Tutorial on Uncertainty Quantification for Biological Models
Uncertainty Quantification for Biological Models
Meeting dates: June 26-28, 2017
Location: NIMBioS at the University of Tennessee, Knoxville
Marisa Eisenberg, School of Public Health, Univ. of Michigan
Ben Fitzpatrick, Mathematics, Loyola Marymount Univ.
James Hyman, Mathematics, Tulane Univ.
Ralph Smith, Mathematics, North Carolina State Univ.
Clayton Webster, Computational and Applied Mathematics (CAM), Oak Ridge National Laboratory; Mathematics, Univ. of Tennessee
Mathematical modeling and computer simulations are widely used to predict the behavior of complex biological phenomena. However, increased computational resources have allowed scientists to ask a deeper question, namely, “how do the uncertainties ubiquitous in all modeling efforts affect the output of such predictive simulations?” Examples include both epistemic (lack of knowledge) and aleatoric (intrinsic variability) uncertainties and encompass uncertainty coming from inaccurate physical measurements, bias in mathematical descriptions, as well as errors coming from numerical approximations of computational simulations. Because it is essential for dealing with realistic experimental data and assessing the reliability of predictions based on numerical simulations, research in uncertainty quantification (UQ) ultimately aims to address these challenges.
Uncertainty quantification (UQ) uses quantitative methods to characterize and reduce uncertainties in mathematical models, and techniques from sampling, numerical approximations, and sensitivity analysis can help to apportion the uncertainty from models to different variables. Critical to achieving validated predictive computations, both forward and inverse UQ analysis have become critical modeling components for a wide range of scientific applications. Techniques from these fields are rapidly evolving to keep pace with the increasing emphasis on models that require quantified uncertainties for large-scale applications. This tutorial will focus on the application of these methods and techniques to mathematical models in the life sciences and will provide researchers with the basic concepts, theory, and algorithms necessary to quantify input and response uncertainties and perform sensitivity analysis for simulation models. Concepts to be covered may include: probability and statistics, parameter selection techniques, frequentist and Bayesian model calibration, propagation of uncertainties, quantification of model discrepancy, adaptive surrogate model construction, high-dimensional approximation, random sampling and sparse grids, as well as local and global sensitivity analysis.
This tutorial is intended for graduate students, postdocs and researchers in mathematics, statistics, computer science and biology. A basic knowledge of probability, linear algebra, and differential equations is assumed.
Application deadline: March 1, 2017
To apply, you must complete an application on our online registration system:
- Click here to access the system
- Login or register
- Complete your user profile (if you haven’t already)
- Find this tutorial event under Current Events Open for Application and click on Apply
Participation in NIMBioS tutorials is by application only. Individuals with a strong interest in the topic are encouraged to apply, and successful applicants will be notified within two weeks after the application deadline. If needed, financial support for travel, meals, and lodging is available for tutorial attendees.
The application process is now closed.
Summary Report. TBA
Live Stream. The Tutorial will be streamed live. Note that NIMBioS Tutorials involve open discussion and not necessarily a succession of talks. In addition, the schedule as posted may change during the Workshop. To view the live stream, visit http://www.nimbios.org/videos/livestream. A live chat of the event will take place via Twitter using the hashtag #uncertaintyTT. The Twitter feed will be displayed to the right of the live stream. We encourage you to post questions/comments and engage in discussion with respect to our Social Media Guidelines.
Abstract: Despite the obvious advantage of simple life forms capable of fast replication, different levels of cognitive complexity have been achieved by living systems in terms of their potential to cope with environmental uncertainty. Against the inevitable cost associated to detecting environmental cues and responding to them in adaptive ways, we conjecture that the potential for predicting the environment can overcome the expenses associated to maintaining costly, complex structures. We present a minimal formal model grounded in information theory and selection, in which successive generations of agents are mapped into transmitters and receivers of a coded message. Our agents are guessing machines and their capacity to deal with environments of different complexity defines the conditions to sustain more complex agents.