We describe the evolution of macromolecules as an information transmission process and apply tools from Shannon information theory to it. This allows us to isolate three independent, competing selective pressures that we term compression, transmission, and neutrality selection. The first two affect genome length: the pressure to conserve resources by compressing the code, and the pressure to acquire additional information that improves the channel, increasing the rate of information transmission into each offspring. Noisy transmission channels (replication with mutations) gives rise to a third pressure that acts on the actual encoding of information; it maximizes the fraction of mutations that are neutral with respect to the phenotype. This neutrality selection has important implications for the evolution of evolvability. We demonstrate each selective pressure in experiments with digital organisms.
To be published in J. theor. Biology 222 (2003) 477-483
Profound as it may be, the Internet revolution still pales in comparison to that earlier revolution that first brought screens in millions of homes: the TV revolution. Americans still spend more of their non-sleep, non-work time on watching TV than on any other activity. And now the immovable object (the couch potato) and the irresistible force (the business-model destroying Internet) are colliding.
For decades, the limitations of technology only allowed viewers to watch TV programs as they were broadcast. Although limiting, this way of watching TV has the benefit of simplicity: the viewer only has to turn on the set and select a channel. They then get to see what was deemed broadcast-worthy at that particular time. This is the exact opposite of the Web, where users type a search query or click a link and get their content whenever they want. Unsurprisingly, TV over the Internet, a combination that adds Web-like instant gratification to the TV experience, has seen an enormous growth in popularity since broadband became fast enough to deliver decent quality video. So is the Internet going to wreck TV, or is TV going to wreck the Internet? Arguments can certainly be made either way.
The process of distributing TV over a data network such as the Internet, a process often called IPTV, is a little more complex than just sending files back and forth. Unless, that is, a TV broadcast is recorded and turned into a file. The latter, file-based model is one that Apple has embraced with its iTunes Store, where shows are simply downloaded like any other file. This has the advantage that shows can be watched later, even when there is no longer a network connection available, but the download model doesn’t exactly lend itself to live broadcasts—or instant gratification, for that matter.
Most of the new IPTV services, like Netflix and Hulu, and all types of live broadcasts use a streaming model. Here, the program is set out in real time. The computer—or, usually by way of a set-top-box, the TV—decodes the incoming stream of audio and video and then displays it pretty much immediately. This has the advantage that the video starts within seconds. However, it also means that the network must be fast enough to carry the audio/video at the bitrate that it was encoded with. The bitrate can vary a lot depending on the type of program—talking heads compress a lot better than car crashes—but for standard definition (SD) video, think two megabits per second (Mbps).
To get a sense just how significant this 2Mbps number is, it’s worth placing it in the context of the history of the Internet, as it has moved from transmitting text to images to audio and video. A page of text that takes a minute to read is a few kilobytes in size. Images are tens to a few hundred kilobytes. High quality audio starts at about 128 kilobits per second (kbps), or about a megabyte per minute. SD TV can be shoehorned in some two megabits per second (Mbps), or about 15 megabytes per minute. HDTV starts around 5Mbps, 40 megabytes per minute. So someone watching HDTV over the Internet uses about the same bandwidth as half a million early-1990s text-only Web surfers. Even today, watching video uses at least ten times as much bandwidth as non-video use of the network.
In addition to raw capacity, streaming video also places other demands on the network. Most applications communicate through TCP, a layer in the network stack that takes care of retransmitting lost data and delivering data to the receiving application in the right order. This is despite the fact that the IP packets that do TCP’s bidding may arrive out of order. And when the network gets congested, TCP’s congestion control algorithms slow down the transmission rate at the sender, so the network remains usable.
However, for real-time audio and video, TCP isn’t such a good match. If a fraction of a second of audio or part of a video frame gets lost, it’s much better to just skip over the lost data and continue with what follows, rather than wait for a retransmission to arrive. So streaming audio and video tended to run on top of UDP rather than TCP. UDP is the thinnest possible layer on top of IP and doesn’t care about lost packets and such. But UDP also means that TCP’s congestion control is out the door, so a video stream may continue at full speed even though the network is overloaded and many packets—also from other users—get lost. However, more advanced streaming solutions are able to switch to lower quality video when network conditions worsen. And Apple has developed a way to stream video using standard HTTP on top of TCP, by splitting the stream into small files that are downloaded individually. Should a file fail to download because of network problems, it can be skipped, continuing playback with the next file.
Where are the servers? Follow the money
Like any Internet application, streaming of TV content can happen from across town or across the world. However, as the number of users increases, the costs of sending such large amounts of data over large distances become significant. For this reason, content delivery networks (CDNs), of which Akamai is probably the most well-known, try to place servers as close to the end-users as possible, either close to important interconnect locations where lots of Internet traffic comes together, or actually inside the networks of large ISPs.
Interestingly, it appears that CDNs are actually paying large ISPs for this privilege. This makes the IPTV business a lot like the cable TV business. On the Internet, the assumption is that both ends (the consumer and the provider of over-the-Internet services) pay their own ISPs for the traffic costs, and the ISPs just transport the bits and aren’t involved otherwise. In the cable TV world, this is very different. An ISP provides access to the entire Internet; a cable TV provider doesn’t provide access to all possible TV channels. Often, the cable companies pay for access to content.
For services like Netflix or Hulu, where everyone is watching their own movie or their own show, streaming makes a lot of sense. Not so much with live broadcasts.
So far, we’ve only been looking at IPTV over the public Internet. However, many ISPs around the world already provide cable-like service on top of ADSL or Fiber-To-The-Home (FTTH). With such complete solutions, the ISPs can control the whole service, from streaming servers to the set-top box that decodes the IPTV data and delivers it to a TV. This “walled garden” type of IPTV typically provides a better and more TV-like experience—changing channels is faster, image quality is better, and the service is more reliable.
Such an IPTV Internet access service is a lot like what cable networks provide, but there is a crucial difference: with cable, the bandwidth of the analog cable signal is split into channels, which can be used for analog or digital TV broadcasts or for data. TV and data don’t get in each other’s way. With IPTV on the other hand, TV and Internet data are communication vessels: what is used by one is unavailable to the other. And to ensure a good experience, IPTV packets are given higher priority than other packets. When bandwidth is plentiful, this isn’t an issue, but when a network fills up to the point that Internet packets regularly have to take a backseat to IPTV packets, this could easily become a network neutrality headache.
Multicast to the rescue
Speaking of networks that fill up: for services like Netflix or Hulu, where everyone is watching their own movie or their own show, streaming makes a lot of sense. Not so much with live broadcasts. If 30 million people were to tune into Dancing with the Stars using streaming, that means 30 million copies of each IPTV packet must flow down the tubes. That’s not very efficient, especially given that routers and switches have the capability to take one packet and deliver a copy to anyone who’s interested. This ability to make multiple copies of a packet is called multicast, and it occupies territory between broadcasts, which go to everyone, and regular communications (called unicast), which go to only one recipient. Multicast packets are addressed to a special group address. Only systems listening for the right group address get a copy of the packet.
Multicast is already used in some private IPTV networks, but it has never gained traction on the public Internet. Partially, this is a chicken/egg situation, where there is no demand because there is no supply and vice versa. But multicast is also hard to make work as the network gets larger and the number of multicast groups increases. However, multicast is very well suited to broadcast type network infrastructures, such as cable networks and satellite transmission. Launching multiple satellites that just send thousands of copies of the same packets to thousands of individual users would be a waste of perfectly good rockets.
Peer-to-peer and downloading
Converging to a single IP network that can carry the Web, other data services, telephony, and TV seems like a no-brainer.
Multicast works well for a relatively limited number of streams that are each watched by a reasonably sized group of people—but having very many multicast groups takes up too much memory in routers and switches. For less popular content, there’s another delivery method that requires no or few streaming servers: peer-to-peer streaming. This was the technology used by the Joost service in 2007 and 2008. With peer-to-peer streaming, all the systems interested in a given stream get blocks of audio/video data from upstream peers, and then send those on to downstream peers. This approach has two downsides: the bandwidth of the stream has to be limited to fit within the upload capacity of most peers, and changing channels is a very slow process because a whole new set of peers must be contacted.
For less time-critical content, downloading can work very well. Especially in a form like podcasts, where an RSS feed allows a computer to download new episodes of shows without user intervention. It’s possible to imagine a system where regular network TV shows are made available for download one or two days before they air—but in encrypted form. Then, “airing” the show would just entail distributing the decryption keys to viewers. This could leverage unused network capacity at night. Downloads might also happen using IP packets with a lower priority, so they don’t get in the way of interactive network use.
IP addresses and home networks
A possible issue with IPTV could be the extra IP addresses required. There are basically two approaches to handling this issue: the one where the user is in full control, and the one where an IPTV service provider (usually the ISP) has some control. In the former case, streaming and downloading happens through the user’s home network and no extra addresses are required. However, wireless home networks may not be able to provide bandwidth with enough consistency to make streaming work well, so pulling Ethernet cabling may be required.
When the IPTV provider provides a set-top box, it’s often necessary to address packets toward that set-top box, so the box must be addressable in some way. This can eat up a lot of addresses, which is a problem in these IPv4-starved times. For really large ISPs, the private address ranges in IPv4 may not even be sufficient to provide a unique address to every customer. Issues in this area are why Comcast has been working on adopting IPv6 in the non-public part of its network for many years. When an IPTV provider provides a home gateway, this gateway is often outfitted with special quality-of-service mechanisms that make (wireless) streaming work better than run-of-the-mill home gateways that treat all packets the same.
Predicting the future
Converging to a single IP network that can carry the Web, other data services, telephony, and TV seems like a no-brainer. The phone companies have been working on this for years because that will allow them to buy cheap off-the-shelf routers and switches, rather than the specialty equipment they use now. So it seems highly likely that in the future, we’ll be watching our TV shows over the Internet—or at least over an IP network of some sort. The extra bandwidth required is going to be significant, but so far, the Internet has been able to meet all challenges thrown at it in this area. Looking at the technologies, it would make sense to combine nightly pushed downloads for popular non-live content, multicast for popular live content, and regular streaming or peer-to-peer streaming for back catalog shows and obscure live content.
However, the channel flipping model of TV consumption has proven to be quite popular over the past half century, and many consumers may want to stick with it—for at least part of their TV viewing time. If nothing else, this provides an easy way to discover new shows. The networks are also unlikely to move away from this model voluntarily, because there is no way they’ll be able to sell 16 minutes of commercials per hour using most of the other delivery methods. However, we may see some innovations. For instance, if you stumble upon a show in progress, wouldn’t it be nice to be able to go back to the beginning? In the end, TV isn’t going anywhere, and neither is the Internet, so they’ll have to find a way to live together.
Correction: The original article incorrectly stated that cable providers get paid by TV networks. For broadcast networks, cable operators are required by the law’s “must carry” provisions to carry all of the TV stations broadcast in a market. Ars regrets the error.
Does where you live have an impact on your overhall health? Bill Davenhall believes that the location of our homes is critical to our medical history.
This is a great thing to think about the next time your doctor asks for your medical history. Perhaps with more data and a better visualization of it, it may bring home the messages of pollution and global warming.
“The Information,” by James Gleick, is to the nature, history and significance of data what the beach is to sand.
“The Information” is so ambitious, illuminating and sexily theoretical that it will amount to aspirational reading for many of those who have the mettle to tackle it. Don’t make the mistake of reading it quickly. Imagine luxuriating on a Wi-Fi-equipped desert island with Mr. Gleick’s book, a search engine and no distractions. “The Information” is to the nature, history and significance of data what the beach is to sand.
In this relaxed setting, take the time to differentiate among the Brownian (motion), Bodleian (library) and Boolean (logic) while following Mr. Gleick’s version of what Einstein called “spukhafte Fernwirkung,” or “spooky action at a distance.” Einstein wasn’t precise about what this meant, and Mr. Gleick isn’t always precise either. His ambitions for this book are diffuse and far flung, to the point where providing a thumbnail description of “The Information” is impossible.
So this book’s prologue is its most slippery section. It does not exactly outline a unifying thesis. Instead it hints at the amalgam of logic, philosophy, linguistics, research, appraisal and anecdotal wisdom that will follow. If Mr. Gleick has one overriding goal it is to provide an animated history of scientific progress, specifically the progress of the technology that allows information to be recorded, transmitted and analyzed. This study’s range extends from communication by drumbeat to cognitive assault by e-mail.
As an illustration of Mr. Gleick’s versatility, consider what he has to say about the telegraph. He describes the mechanical key that made telegraphic transmission possible; the compression of language that this new medium encouraged; that it literally was a medium, a midway point between fully verbal messages and coded ones; the damaging effect its forced brevity had on civility; the confusion it created as to what a message actually was (could a mother send her son a dish of sauerkraut?) and the new conceptual thinking that it helped implement. The weather, which had been understood on a place-by-place basis, was suddenly much more than a collection of local events.
Beyond all this Mr. Gleick’s telegraph chapter, titled “A Nervous System for the Earth,” finds time to consider the kind of binary code that began to make sense in the telegraph era. It examines the way letters came to treated like numbers, the way systems of ciphers emerged. It cites the various uses to which ciphers might be put by businessmen, governments or fiction writers (Lewis Carroll, Jules Verne and Edgar Allan Poe). Most of all it shows how this phase of communication anticipated the immense complexities of our own information age.
Although “The Information” unfolds in a roughly chronological way, Mr. Gleick is no slave to linearity. He freely embarks on colorful digressions. Some are included just for the sake of introducing the great eccentrics whose seemingly marginal inventions would prove to be prophetic. Like Richard Holmes’s “Age of Wonder” this book invests scientists with big, eccentric personalities. Augusta Ada Lovelace, the daughter of Lord Byron, may have been spectacularly arrogant about what she called “my immense reasoning faculties,” claiming that her brain was “something more than merely mortal.” But her contribution to the writing of algorithms can, in the right geeky circles, be mentioned in the same breath as her father’s contribution to poetry.
The segments of “The Information” vary in levels of difficulty. Grappling with entropy, randomness and quantum teleportation is the price of enjoying Mr. Gleick’s simple, entertaining riffs on the Oxford English Dictionary’s methodology, which has yielded 30-odd spellings of “mackerel” and an enchantingly tongue-tied definition of “bada-bing” and on the cyber-battles waged via Wikipedia. (As he notes, there are people who have bothered to fight over Wikipedia’s use of the word “cute” to accompany a picture of a young polar bear.) That Amazon boasts of being able to download a book called “Data Smog” in less than a minute does not escape his keen sense of the absurd.
As it traces our route to information overload, “The Information” pays tribute to the places that made it possible. He cites and honors the great cogitation hives of yore. In addition to the Institute for Advanced Study in Princeton, N.J., the Mount Rushmore of theoretical science, he acknowledges the achievements of corporate facilities like Bell Labs and I.B.M.’s Watson Research Center in the halcyon days when many innovations had not found practical applications and progress was its own reward.
“The Information” also lauds the heroics of mathematicians, physicists and computer pioneers like Claude Shannon, who is revered in the computer-science realm for his information theory but not yet treated as a subject for full-length, mainstream biography. Mr. Shannon’s interest in circuitry using “if … then” choices conducting arithmetic in a binary system had novelty when he began formulating his thoughts in 1937. “Here in a master’s thesis by a research assistant,” Mr. Gleick writes, “was the essence of the computer revolution yet to come.”
Among its many other virtues “The Information” has the rare capacity to work as a time machine. It goes back much further than Shannon’s breakthroughs. And with each step backward Mr. Gleick must erase what his readers already know. He casts new light on the verbal flourishes of the Greek poetry that preceded the written word: these turns of phrase could be as useful for their mnemonic power as for their art. He explains why the Greeks arranged things in terms of events, not categories; how one Babylonian text that ends with “this is the procedure” is essentially an algorithm; and why the telephone and the skyscraper go hand in hand. Once the telephone eliminated the need for hand-delivered messages, the sky was the limit.
In the opinion of “The Information” the world of information still has room for expansion. We may be drowning in spam, but the sky’s still the limit today.
2011 Andrew Viterbi Lecture
Ming Hsieh Department of Electrical Engineering
“Adventures in Coding Theory”
Professor Elwyn Berlekamp
University of California, Berkeley
Gerontology Auditorium, Thursday, March 3, 4:30 to 5:30 p.m.
The inventors of error-correcting codes were initially motivated by problems in communications engineering. But coding theory has since also influenced several other fields, including memory technology, theoretical computer science, game theory, portfolio theory, and symbolic manipulation. This talk will recall some forays into these subjects.
Elwyn Berlekamp has been professor of mathematics and of electrical engineering and computer science at UC Berkeley since 1971; halftime since 1983, and Emeritus since 2002. He also has been active in several small companies in the sectors of computers-communications and finance. He is now chairman of Berkeley Quantitative LP, a small money-management company. He was chairman of the Board of Trustees of MSRI from 1994-1998, and was at the International Computer Science Institute from 2001-2003. He is a member of the National Academy of Sciences, the National Academy of Engineering, and the American Academy of Arts and Sciences. Berlekamp has 12 patented inventions, some of which were co-authored with USC Professor Emeritus Lloyd Welch. Some of Berlekamp’s algorithms for decoding Reed-Solomon codes are widely used on compact discs; others are NASA standards for deep space communications. He has more than 100 publications, including two books on algebraic coding theory and seven books on the mathematical theory of combinatorial games, including the popular Dots-and-Boxes Game: Sophisticated Child’s Play.
I wish I could be at this lecture in person today, but I’ll have to live with the live webcast.
My first realization I was hooked on Oscar was when I seriously began pondering one of mankind's most profound dilemmas: whether to rent or buy a tux. That first step, as with any descent down a...
This must certainly be the quote of the week from English author Alan Bennett’s play Forty Years On: