Personal and private Web archives are proliferating due to the increase in the tools to create them and the realization that Internet Archive and other public Web archives are unable to capture personalized (e.g., Facebook) and private (e.g., banking) Web pages. We introduce a framework to mitigate issues of aggregation in private, personal, and public Web archives without compromising potential sensitive information contained in private captures. We amend Memento syntax and semantics to allow TimeMap enrichment to account for additional attributes to be expressed inclusive of the requirements for dereferencing private Web archive captures. We provide a method to involve the user further in the negotiation of archival captures in dimensions beyond time. We introduce a model for archival querying precedence and short-circuiting, as needed when aggregating private and personal Web archive captures with those from public Web archives through Memento. Negotiation of this sort is novel to Web archiving and allows for the more seamless aggregation of various types of Web archives to convey a more accurate picture of the past Web.
Observational data about human behavior is often heterogeneous, i.e., generated by subgroups within the population under study that vary in size and behavior. Heterogeneity predisposes analysis to Simpson's paradox, whereby the trends observed in data that has been aggregated over the entire population may be substantially different from those of the underlying subgroups. I illustrate Simpson's paradox with several examples coming from studies of online behavior and show that aggregate response leads to wrong conclusions about the underlying individual behavior. I then present a simple method to test whether Simpson's paradox is affecting results of analysis. The presence of Simpson's paradox in social data suggests that important behavioral differences exist within the population, and failure to take these differences into account can distort the studies' findings.
It is argued that if the non-unitary measurement transition, as codified by Von Neumann, is a real physical process, then the "probability assumption" needed to derive the Second Law of Thermodynamics naturally enters at that point. The existence of a real, indeterministic physical process underlying the measurement transition would therefore provide an ontological basis for Boltzmann's Stosszahlansatz and thereby explain the unidirectional increase of entropy against a backdrop of otherwise time-reversible laws. It is noted that the Transactional Interpretation (TI) of quantum mechanics provides such a physical account of the non-unitary measurement transition, and TI is brought to bear in finding a physically complete, non-ad hoc grounding for the Second Law.
Logical probability theory was developed as a quantitative measure based on Boole's logic of subsets. But information theory was developed into a mature theory by Claude Shannon with no such connection to logic. But a recent development in logic changes this situation. In category theory, the notion of a subset is dual to the notion of a quotient set or partition, and recently the logic of partitions has been developed in a parallel relationship to the Boolean logic of subsets (subset logic is usually mis-specified as the special case of propositional logic). What then is the quantitative measure based on partition logic in the same sense that logical probability theory is based on subset logic? It is a measure of information that is named "logical entropy" in view of that logical basis. This paper develops the notion of logical entropy and the basic notions of the resulting logical information theory. Then an extensive comparison is made with the corresponding notions based on Shannon entropy.
Based on a cursory look of his website(s), I’m going to have to start following more of this work.
Advances in computing power, natural language processing, and digitization of text now make it possible to study our a culture's evolution through its texts using a "big data" lens. Our ability to communicate relies in part upon a shared emotional experience, with stories often following distinct emotional trajectories, forming patterns that are meaningful to us. Here, by classifying the emotional arcs for a filtered subset of 1,737 stories from Project Gutenberg's fiction collection, we find a set of six core trajectories which form the building blocks of complex narratives. We strengthen our findings by separately applying optimization, linear decomposition, supervised learning, and unsupervised learning. For each of these six core emotional arcs, we examine the closest characteristic stories in publication today and find that particular emotional arcs enjoy greater success, as measured by downloads.
This is the draft version of a textbook, which aims to introduce the quantum information science viewpoints on condensed matter physics to graduate students in physics (or interested researchers). We keep the writing in a self-consistent way, requiring minimum background in quantum information science. Basic knowledge in undergraduate quantum physics and condensed matter physics is assumed. We start slowly from the basic ideas in quantum information theory, but wish to eventually bring the readers to the frontiers of research in condensed matter physics, including topological phases of matter, tensor networks, and symmetry-protected topological phases.
Information is a precise concept that can be defined mathematically, but its relationship to what we call "knowledge" is not always made clear. Furthermore, the concepts "entropy" and "information", while deeply related, are distinct and must be used with care, something that is not always achieved in the literature. In this elementary introduction, the concepts of entropy and information are laid out one by one, explained intuitively, but defined rigorously. I argue that a proper understanding of information in terms of prediction is key to a number of disciplines beyond engineering, such as physics and biology.
Comments: 19 pages, 2 figures. To appear in Philosophical Transaction of the Royal Society A
Subjects: Adaptation and Self-Organizing Systems (nlin.AO); Information Theory (cs.IT); Biological Physics (physics.bio-ph); Quantitative Methods (q-bio.QM)
Cite as:arXiv:1601.06176 [nlin.AO] (or arXiv:1601.06176v1 [nlin.AO] for this version)
“The Mathematics Literature Project intends to survey the state of the freely accessible mathematics literature. In particular, it will index freely accessible URLs for mathematics articles. These are legitimately hosted copies of the article (i.e. at publishers, the arXiv, institutional repositories, or authors’ homepages), which are freely available in any browser, anywhere in the world.”