DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with greater capacity than known error-correcting codes, which we can determine analytically for any set of parameters.
I'm a biomedical and electrical engineer with interests in information theory, complexity, evolution, genetics, signal processing, IndieWeb, theoretical mathematics, and big history.
I'm also a talent manager-producer-publisher in the entertainment industry with expertise in representation, distribution, finance, production, content delivery, and new media.
View all posts by Chris Aldrich