Entropy Is Universal Rule of Language | Wired Science

Entropy Is Universal Rule of Language (Wired)
The amount of information carried in the arrangement of words is the same across all languages, even languages that aren't related to each other. This consistency could hint at a single common ancestral language, or universal features of how human brains process speech. "It doesn't matter what language or style you take," said systems biologist…

The research this article is based on is quite interesting for those doing language research.

The amount of information carried in the arrangement of words is the same across all languages, even languages that aren’t related to each other. This consistency could hint at a single common ancestral language, or universal features of how human brains process speech.

“It doesn’t matter what language or style you take,” said systems biologist Marcelo Montemurro of England’s University of Manchester, lead author of a study May 13 in PLoS ONE. “In languages as diverse as Chinese, English and Sumerian, a measure of the linguistic order, in the way words are arranged, is something that seems to be a universal of languages.”

Language carries meaning both in the words we choose, and the order we put them in. Some languages, like Finnish, carry most of their meaning in tags on the words themselves, and are fairly free-form in how words are arranged. Others, like English, are more strict “John loves Mary” means something different from “Mary loves John.”

Montemurro realized that he could quantify the amount of information encoded in word order by computing a text’s “entropy,” or a measure of how evenly distributed the words are. Drawing on methods from information theory, Montemurro co-author Dami??n Zanette of the National Atomic Energy Commission in Argentina calculated the entropy of thousands of texts in eight different languages: English, French, German, Finnish, Tagalog, Sumerian, Old Egyptian and Chinese.

Then the researchers randomly rearranged all the words in the texts, which ranged from the complete works of Shakespeare to The Origin of Species to prayers written on Sumerian tablets.

“If we destroy the original text by scrambling all the words, we are preserving the vocabulary,” Montemurro said. “What we are destroying is the linguistic order, the patterns that we use to encode information.”

The researchers found that the original texts spanned a variety of entropy values in different languages, reflecting differences in grammar and structure.

But strangely, the difference in entropy between the original, ordered text and the randomly scrambled text was constant across languages. This difference is a way to measure the amount of information encoded in word order, Montemurro says. The amount of information lost when they scrambled the text was about 3.5 bits per word.

“We found, very interestingly, that for all languages we got almost exactly the same value,” he said. “For some reason these languages evolved to be constrained in this framework, in these patterns of word ordering.”

This consistency could reflect some cognitive constraints that all human brains run up against, or give insight into the evolution of language, Montemurro suggests.

Cognitive scientists are still debating whether languages have universal features. Some pioneering linguists suggested that languages should evolve according to a limited set of rules, which would produce similar features of grammar and structure. But a study published last month that looked at the structure and syntax of thousands of languages found no such rules.

It may be that universal properties of language show up only at a higher level of organization, suggests linguist Kenny Smith of the University of Edinburgh.

“Maybe these broad-brushed features get down to what’s really essential” about language, he said. “Having words, and having rules for how the words are ordered, maybe those are the things that help you do the really basic functions of language. And the places where linguists traditionally look to see universals are not where the fundamentals of language are.”

Image: James Morrison/Flickr.

Citation:”Universal Entropy of Word Ordering Across Linguistic Families.” Marcelo A. Montemurro and Damián H. Zanette. PLoS ONE, Vol. 6, Issue 5, May 13, 2011. DOI: 10.1371/journal.pone.0019875.

via Wired.com

 

Syndicated copies to:

Bob Frankston on Communications

Triangulation 4: Bob Frankston from TWiT Network
Computer pioneer who helped create the first spreadsheet, Bob Frankston, is this week's guest.

On a recent episode of Leo Laporte and Tom Merrit’s show Triangulation, they interviewed Bob Frankston of VisiCalc fame. They gave a great discussion of the current state of broadband in the U.S. and how it might be much better.  They get just a bit technical in places, but it’s a fantastic and very accessible discussion of the topic of communications that every American should be aware of.

Synthetic Biology’s Hunt for the Genetic Transistor | IEEE Spectrum

Synthetic Biology's Hunt for the Genetic Transistor (spectrum.ieee.org)
How genetic circuits will unlock the true potential of bioengineering

This is a great short article on bioengineering and synthetic biology written for the layperson. It’s also one of the best crash courses I’ve read on genetics in a while.

Media_httpspectrumiee_kzdjg

‘The Information’ by James Gleick – Book Review by Janet Maslin | New York Times

‘The Information’ by James Gleick - Review (nytimes.com)
“The Information,” by James Gleick, is to the nature, history and significance of data what the beach is to sand.

This book is assuredly going to have to skip up to the top of my current reading list.

“The Information” is so ambitious, illuminating and sexily theoretical that it will amount to aspirational reading for many of those who have the mettle to tackle it. Don’t make the mistake of reading it quickly. Imagine luxuriating on a Wi-Fi-equipped desert island with Mr. Gleick’s book, a search engine and no distractions. “The Information” is to the nature, history and significance of data what the beach is to sand.

In this relaxed setting, take the time to differentiate among the Brownian (motion), Bodleian (library) and Boolean (logic) while following Mr. Gleick’s version of what Einstein called “spukhafte Fernwirkung,” or “spooky action at a distance.” Einstein wasn’t precise about what this meant, and Mr. Gleick isn’t always precise either. His ambitions for this book are diffuse and far flung, to the point where providing a thumbnail description of “The Information” is impossible.

So this book’s prologue is its most slippery section. It does not exactly outline a unifying thesis. Instead it hints at the amalgam of logic, philosophy, linguistics, research, appraisal and anecdotal wisdom that will follow. If Mr. Gleick has one overriding goal it is to provide an animated history of scientific progress, specifically the progress of the technology that allows information to be recorded, transmitted and analyzed. This study’s range extends from communication by drumbeat to cognitive assault by e-mail.

As an illustration of Mr. Gleick’s versatility, consider what he has to say about the telegraph. He describes the mechanical key that made telegraphic transmission possible; the compression of language that this new medium encouraged; that it literally was a medium, a midway point between fully verbal messages and coded ones; the damaging effect its forced brevity had on civility; the confusion it created as to what a message actually was (could a mother send her son a dish of sauerkraut?) and the new conceptual thinking that it helped implement. The weather, which had been understood on a place-by-place basis, was suddenly much more than a collection of local events.

Beyond all this Mr. Gleick’s telegraph chapter, titled “A Nervous System for the Earth,” finds time to consider the kind of binary code that began to make sense in the telegraph era. It examines the way letters came to treated like numbers, the way systems of ciphers emerged. It cites the various uses to which ciphers might be put by businessmen, governments or fiction writers (Lewis Carroll, Jules Verne and Edgar Allan Poe). Most of all it shows how this phase of communication anticipated the immense complexities of our own information age.

Although “The Information” unfolds in a roughly chronological way, Mr. Gleick is no slave to linearity. He freely embarks on colorful digressions. Some are included just for the sake of introducing the great eccentrics whose seemingly marginal inventions would prove to be prophetic. Like Richard Holmes’s “Age of Wonder” this book invests scientists with big, eccentric personalities. Augusta Ada Lovelace, the daughter of Lord Byron, may have been spectacularly arrogant about what she called “my immense reasoning faculties,” claiming that her brain was “something more than merely mortal.” But her contribution to the writing of algorithms can, in the right geeky circles, be mentioned in the same breath as her father’s contribution to poetry.

The segments of “The Information” vary in levels of difficulty. Grappling with entropy, randomness and quantum teleportation is the price of enjoying Mr. Gleick’s simple, entertaining riffs on the Oxford English Dictionary’s methodology, which has yielded 30-odd spellings of “mackerel” and an enchantingly tongue-tied definition of “bada-bing” and on the cyber-battles waged via Wikipedia. (As he notes, there are people who have bothered to fight over Wikipedia’s use of the word “cute” to accompany a picture of a young polar bear.) That Amazon boasts of being able to download a book called “Data Smog” in less than a minute does not escape his keen sense of the absurd.

As it traces our route to information overload, “The Information” pays tribute to the places that made it possible. He cites and honors the great cogitation hives of yore. In addition to the Institute for Advanced Study in Princeton, N.J., the Mount Rushmore of theoretical science, he acknowledges the achievements of corporate facilities like Bell Labs and I.B.M.’s Watson Research Center in the halcyon days when many innovations had not found practical applications and progress was its own reward.

“The Information” also lauds the heroics of mathematicians, physicists and computer pioneers like Claude Shannon, who is revered in the computer-science realm for his information theory but not yet treated as a subject for full-length, mainstream biography. Mr. Shannon’s interest in circuitry using “if … then” choices conducting arithmetic in a binary system had novelty when he began formulating his thoughts in 1937. “Here in a master’s thesis by a research assistant,” Mr. Gleick writes, “was the essence of the computer revolution yet to come.”

Among its many other virtues “The Information” has the rare capacity to work as a time machine. It goes back much further than Shannon’s breakthroughs. And with each step backward Mr. Gleick must erase what his readers already know. He casts new light on the verbal flourishes of the Greek poetry that preceded the written word: these turns of phrase could be as useful for their mnemonic power as for their art. He explains why the Greeks arranged things in terms of events, not categories; how one Babylonian text that ends with “this is the procedure” is essentially an algorithm; and why the telephone and the skyscraper go hand in hand. Once the telephone eliminated the need for hand-delivered messages, the sky was the limit.

In the opinion of “The Information” the world of information still has room for expansion. We may be drowning in spam, but the sky’s still the limit today.

2011 USC Viterbi Lecture “Adventures in Coding Theory” by Elwyn Berklekamp

2011 Andrew Viterbi Lecture
Ming Hsieh Department of Electrical Engineering

“Adventures in Coding Theory”

Professor Elwyn Berlekamp
University of California, Berkeley

Gerontology Auditorium, Thursday, March 3, 4:30 to 5:30 p.m.

>> Click here for live wedcast

Abstract
The inventors of error-correcting codes were initially motivated by problems in communications engineering. But coding theory has since also influenced several other fields, including memory technology, theoretical computer science, game theory, portfolio theory, and symbolic manipulation. This talk will recall some forays into these subjects.

Biography
Elwyn Berlekamp has been professor of mathematics and of electrical engineering and computer science at UC Berkeley since 1971; halftime since 1983, and Emeritus since 2002. He also has been active in several small companies in the sectors of computers-communications and finance. He is now chairman of Berkeley Quantitative LP, a small money-management company. He was chairman of the Board of Trustees of MSRI from 1994-1998, and was at the International Computer Science Institute from 2001-2003. He is a member of the National Academy of Sciences, the National Academy of Engineering, and the American Academy of Arts and Sciences. Berlekamp has 12 patented inventions, some of which were co-authored with USC Professor Emeritus Lloyd Welch. Some of Berlekamp’s algorithms for decoding Reed-Solomon codes are widely used on compact discs; others are NASA standards for deep space communications. He has more than 100 publications, including two books on algebraic coding theory and seven books on the mathematical theory of combinatorial games, including the popular Dots-and-Boxes Game: Sophisticated Child’s Play.

 
I wish I could be at this lecture in person today, but I’ll have to live with the live webcast.

Syndicated copies to:

Poor State of Automated Machine-Based Language Translation

You know that automated machine language translation is not in good shape when the editor-in-chief of the IEEE’s Signal Processing Magazine says:

As an anecdote, during the early stage in creating the Chinese translation of the [Signal Processing] magazine, we experimented with automated machine translation first, only to quickly switch to professional human translation.  This makes us appreciate why “universal translation” is the “needs and wants” of the future rather than of the present; see [3] for a long list of of future needs and wants to be enabled by signal processing technology.

 

sipr_0111_0001-scaled1000

Syndicated copies to:

Global classical solutions of the Boltzmann equation with long-range interactions

Global classical solutions of the Boltzmann equation with long-range interactions (pnas.org)

Finally, after 140 years, Robert Strain and Philip Gressman at the University of Pennsylvania have found a mathematical proof of Boltzmann’s equation, which predicts the motion of gas molecules.

Abstract

This is a brief announcement of our recent proof of global existence and rapid decay to equilibrium of classical solutions to the Boltzmann equation without any angular cutoff, that is, for long-range interactions. We consider perturbations of the Maxwellian equilibrium states and include the physical cross-sections arising from an inverse-power intermolecular potential r-(p-1) with p > 2, and more generally. We present here a mathematical framework for unique global in time solutions for all of these potentials. We consider it remarkable that this equation, derived by Boltzmann (1) in 1872 and Maxwell (2) in 1867, grants a basic example where a range of geometric fractional derivatives occur in a physical model of the natural world. Our methods provide a new understanding of the effects due to grazing collisions.

via pnas.org