DNA數(shù)據(jù)存儲(chǔ)
doi: 10.11999/JEIT190852 cstr: 32379.14.JEIT190852
-
1.
上海交通大學(xué)醫(yī)學(xué)院分子醫(yī)學(xué)研究院 上海 200127
-
2.
上海交通大學(xué)醫(yī)學(xué)院附屬仁濟(jì)醫(yī)院 上海 200127
DNA Data Storage
-
1.
Institute of Molecular Medicine, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China
-
2.
Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200127, China
-
摘要: 分子數(shù)據(jù)存儲(chǔ)作為一種穩(wěn)定性強(qiáng)、存儲(chǔ)密度高的數(shù)據(jù)存儲(chǔ)方式,表現(xiàn)出巨大的潛力。它有望解決當(dāng)今日益增長(zhǎng)的巨大信息量與存儲(chǔ)能力之間差距不斷擴(kuò)大的問題。作為一種典型的分子數(shù)據(jù)存儲(chǔ)方式,DNA數(shù)據(jù)存儲(chǔ)可以作為一種替代性、變革性的存儲(chǔ)介質(zhì),用于突破現(xiàn)用存儲(chǔ)方式的物理極限,滿足不斷增加的數(shù)據(jù)存儲(chǔ)需求。該綜述將對(duì)DNA數(shù)據(jù)存儲(chǔ)的歷史、工作流程、及當(dāng)前的發(fā)展?fàn)顟B(tài)進(jìn)行概述,同時(shí)討論現(xiàn)今DNA數(shù)據(jù)存儲(chǔ)存在的問題、挑戰(zhàn)及發(fā)展趨勢(shì)。
-
關(guān)鍵詞:
- 分子數(shù)據(jù)存儲(chǔ) /
- DNA數(shù)據(jù)存儲(chǔ) /
- 編碼 /
- 解碼 /
- 讀取
Abstract: Molecular data storage has great potential as durable and high-density data-storage media, which will deal with the growing gap between produced information and the data storage ability. With storing data in molecular form, DNA can provide alternative substrates for storage to overcome the physical limits for existing medias. This review provides an overview of the history, process and the current status of the DNA data storage, and presents the problems of current data storage technology.-
Key words:
- Molecular data storage /
- DNA data storage /
- Encoding /
- Decoding /
- Read
-
表 1 體外DNA數(shù)據(jù)存儲(chǔ)比較研究
文獻(xiàn) 數(shù)據(jù)容量 合成方法 測(cè)序方法 物理冗余
(覆蓋率)重新組裝 鏈長(zhǎng)
(堿基數(shù))邏輯密度
(bit/堿基)邏輯密度
(有效載荷)是否能
隨機(jī)訪問文獻(xiàn)[31] 650 kB 亞磷酰胺(沉積) 合成測(cè)序 3000× 索引序列連接 115 0.60 0.83 否 文獻(xiàn)[32] 630 kB 亞磷酰胺(沉積) 合成測(cè)序 51× 重疊序列連接 117 0.19 0.29 否 文獻(xiàn)[17] 80 kB 亞磷酰胺(電化學(xué)) 合成測(cè)序 372× 索引序列連接 158 0.86 1.16 否 文獻(xiàn)[37,45] 3 kB 亞磷酰胺(沉積) 納米孔測(cè)序 200× 索引序列連接 880~1000 1.71 1.74 是 文獻(xiàn)[38] 2 MB 亞磷酰胺(沉積) 合成測(cè)序 10.5× 種子序列連接 152 1.18 1.55 否 文獻(xiàn)[46] 22 MB 亞磷酰胺(沉積) 合成測(cè)序 160× 索引序列連接 230 0.89 1.08 否 文獻(xiàn)[36] 150 kB 亞磷酰胺(電化學(xué)) 合成測(cè)序 40× 索引序列連接 117 0.57 0.85 是 文獻(xiàn)[12] 200 MB 亞磷酰胺(沉積) 合成測(cè)序 5× 索引序列連接 150~200 0.81 1.10 是 文獻(xiàn)[43] 8.5 MB 亞磷酰胺(沉積) 合成測(cè)序 164× 索引序列連接 194 1.94 2.64 否 文獻(xiàn)[44] 854 kB 亞磷酰胺(柱子) 合成測(cè)序 250× 索引序列連接 85 1.78 3.37 否 文獻(xiàn)[12] 33 kB 亞磷酰胺(沉積) 納米孔測(cè)序 36× 索引序列連接 150 0.81 1.10 是 文獻(xiàn)[47] 18 B 酶(柱基) 納米孔測(cè)序 175× 無(單體) 150~200 1.57 1.57 否 下載: 導(dǎo)出CSV
-
GANTZ J and REINSEL D. The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far East[R]. IDC iView, 2012: 1–16. EXTANCE A. How DNA could store all the world’s data[J]. Nature, 2016, 537(7618): 22–24. doi: 10.1038/537022a ZHIRNOV V, ZADEGAN R M, SANDHU G S, et al. Nucleic acid memory[J]. Nature Materials, 2016, 15(4): 366–370. doi: 10.1038/nmat4594 COLQUHOUN H and LUTZ J F. Information-containing macromolecules[J]. Nature Chemistry, 2014, 6(6): 455–456. doi: 10.1038/nchem.1958 王君珂, 印玨, 牛人杰, 等. DNA計(jì)算與DNA納米技術(shù)[J]. 電子與信息學(xué)報(bào), 2020, 42(6): 1313–1325. doi: 10.11999/JEIT190826.WANG Junke, YIN Jue, NIU Renjie, et al. DNA computing and DNA nanotechnology[J]. Journal of Electronics & Information Technology, 2020, 42(6): 1313–1325. doi: 10.11999/JEIT190826. 許進(jìn), 強(qiáng)小利, 張凱, 等. 基于探針圖的并行型圖頂點(diǎn)著色DNA計(jì)算模型(英文)[J]. 工程, 2018, 4(1): 61–77. doi: 10.1016/j.eng.2018.02.011XU Jin, QIANG Xiaoli, ZHANG Kai, et al. A DNA computing model for the graph vertex coloring problem based on a probe graph[J]. Engineering, 2018, 4(1): 61–77. doi: 10.1016/j.eng.2018.02.011 藍(lán)雯飛, 邢志寶, 黃俊, 等. DNA自組裝計(jì)算模型求解二部圖完美匹配問題[J]. 計(jì)算機(jī)研究與發(fā)展, 2016, 53(11): 2583–2593. doi: 10.7544/issn1000-1239.2016.20150312LAN Wenfei, XING Zhibao, HUANG Jun, et al. The DNA self-assembly computing model for solving perfect matching problem of bipartite graph[J]. Journal of Computer Research and Development, 2016, 53(11): 2583–2593. doi: 10.7544/issn1000-1239.2016.20150312 朱維軍, 周清雷, 張欽憲. 基于DNA計(jì)算的線性時(shí)序邏輯模型檢測(cè)方法[J]. 計(jì)算機(jī)學(xué)報(bào), 2016, 39(12): 2578–2597. doi: 10.11897/SP.J.1016.2016.02578ZHU Weijun, ZHOU Qinglei, and ZHANG Qinxian. A LTL model checking approach based on DNA computing[J]. Chinese Journal of Computers, 2016, 39(12): 2578–2597. doi: 10.11897/SP.J.1016.2016.02578 夏宏, 張實(shí)君. 基于分子計(jì)算的邏輯模型構(gòu)建[J]. 科技通報(bào), 2016, 32(5): 11–15. doi: 10.3969/j.issn.1001-7119.2016.05.003XIA Hong and ZHANG Shijun. Constructing the logical model based on molecular computing[J]. Bulletin of Science and Technology, 2016, 32(5): 11–15. doi: 10.3969/j.issn.1001-7119.2016.05.003 周旭, 周炎濤, 歐陽艾嘉, 等. 一種最大團(tuán)問題的tile自組裝高效模型[J]. 計(jì)算機(jī)研究與發(fā)展, 2014, 51(6): 1253–1262. doi: 10.7544/issn1000-1239.2014.20120904ZHOU Xu, ZHOU Yantao, OUYANG Aijia, et al. An efficient tile assembly model for maximum clique problem[J]. Journal of Computer Research and Development, 2014, 51(6): 1253–1262. doi: 10.7544/issn1000-1239.2014.20120904 周旭, 周炎濤, 李肯立, 等. 基于tile自組裝模型的最大匹配問題算法研究[J]. 電子學(xué)報(bào), 2015, 43(2): 262–268. doi: 10.3969/j.issn.0372-2112.2015.02.009ZHOU Xu, ZHOU Yantao, LI Kenli, et al. Efficient maximum matching problem algorithms in the tile assembly model[J]. Acta Electronica Sinica, 2015, 43(2): 262–268. doi: 10.3969/j.issn.0372-2112.2015.02.009 ORGANICK L, ANG S D, CHEN Y J, et al. Random access in large-scale DNA data storage[J]. Nature Biotechnology, 2018, 36(3): 242–248. doi: 10.1038/nbt.4079 RUTTEN M G T A, VAANDRAGER F W, ELEMANS J A A W, et al. Encoding information into polymers[J]. Nature Reviews Chemistry, 2018, 2(11): 365–381. doi: 10.1038/s41570-018-0051-5 DNA to the rescue for data storage[J]. Chemical & Engineering News, 2015, 93(35): 40-41. 陳為剛, 黃剛, 李炳志, 等. 音視頻文件的DNA信息存儲(chǔ)[J]. 中國(guó)科學(xué): 生命科學(xué), 2020, 50(1): 81–85. doi: 10.1360/SSV-2019-0211CHEN Weigang, HUANG Gang, LI Bingzhi, et al. DNA information storage for audio and video files[J]. Scientia Sinica Vitae, 2020, 50(1): 81–85. doi: 10.1360/SSV-2019-0211 GREENGARD S. Cracking the code on DNA storage[J]. Communications of the ACM, 2017, 60(7): 16–18. doi: 10.1145/3088493 GRASS R N, HECKEL R, PUDDU M, et al. Robust chemical preservation of digital information on DNA in silica with error-correcting codes[J]. Angewandte Chemie International Edition, 2015, 54(8): 2552–2555. doi: 10.1002/anie.201411378 LUNT B M. How long is long-term data storage?[C]. Archiving Conference, Society for Imaging Science and Technology, 2011: 29–33. SHRIVASTAVA S and BADLANI R. Data storage in DNA[J]. International Journal of Electrical Energy, 2014, 2(2): 119–124. GREENBERG A, HAMILTON J, MALTZ D A, et al. The cost of a cloud: Research problems in data center networks[J]. ACM SIGCOMM Computer Communication Review, 2008, 39(1): 68–73. doi: 10.1145/1496091.1496103 SHETH R U and WANG H H. DNA-based memory devices for recording cellular events[J]. Nature Reviews Genetics, 2018, 19(11): 718–732. doi: 10.1038/s41576-018-0052-8 WIENER N. Interview: Machines smarter than men[J]. US News World Report, 1964, 56: 84–86. NEIMAN M S. On the molecular memory systems and the directed mutations[J]. Radiotekhnika, 1965, 6: 1–8. DAVIS J. Microvenus[J]. Art Journal, 1996, 55(1): 70–74. doi: 10.1080/00043249.1996.10791743 CLELLAND C T, RISCA V, and BANCROFT C. Hiding messages in DNA microdots[J]. Nature, 1999, 399(6736): 533–534. doi: 10.1038/21092 BANCROFT C, BOWLER T, BLOOM B, et al. Long-term storage of information in DNA[J]. Science, 2001, 293(5536): 1763–1765. AILENBERG M and ROTSTEIN O D. An improved huffman coding method for archiving text, images, and music characters in DNA[J]. BioTechniques, 2009, 47(3): 747–754. doi: 10.2144/000113218 WONG P C, WONG K K, and FOOTE H. Organic data memory using the DNA approach[J]. Communications of the ACM, 2003, 46(1): 95–98. doi: 10.1145/602421.602426 ARITA M and OHASHI Y. Secret signatures inside genomic DNA[J]. Biotechnology Progress, 2004, 20(5): 1605–1607. doi: 10.1021/bp049917i YACHIE N, SEKIYAMA K, SUGAHARA J, et al. Alignment-based approach for durable data storage into living organisms[J]. Biotechnology Progress, 2007, 23(2): 501–505. doi: 10.1021/bp060261y CHURCH G M, GAO Yuan, and KOSURI S. Next-generation digital information storage in DNA[J]. Science, 2012, 337(6102): 1628. doi: 10.1126/science.1226355 GOLDMAN N, BERTONE P, CHEN Siyuan, et al. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA[J]. Nature, 2013, 494(7435): 77–80. doi: 10.1038/nature11875 GIBSON D G, GLASS J I, LARTIGUE C, et al. Creation of a bacterial cell controlled by a chemically synthesized genome[J]. Science, 2010, 329(5987): 52–56. doi: 10.1126/science.1190719 HECKEL R, SHOMORONY I, RAMCHANDRAN K, et al. Fundamental limits of DNA storage systems[C]. 2017 IEEE International Symposium on Information Theory, Aachen, Germany, 2017: 3130–3134. KOSURI S and CHURCH G M. Large-scale de novo DNA synthesis: Technologies and applications[J]. Nature Methods, 2014, 11(5): 499–507. doi: 10.1038/nmeth.2918 BORNHOLT J, LOPEZ R, CARMEAN D M, et al. A DNA-based archival storage system[J]. ACM SIGPLAN Notices, 2016, 50(4): 637–649. YAZDI S M H T, YUAN Yongbo, MA Jian, et al. A rewritable, random-access DNA-based storage system[J]. Scientific Reports, 2015, 5: 14138. doi: 10.1038/srep14138 ERLICH Y and ZIELINSKI D. DNA fountain enables a robust and efficient storage architecture[J]. Science, 2017, 355(6328): 950–954. doi: 10.1126/science.aaj2038 譚麗, 孫季豐, 郭禮華. 基于memetic算法的DNA序列數(shù)據(jù)壓縮方法[J]. 電子與信息學(xué)報(bào), 2014, 36(1): 121–127.TAN Li, SUN Jifeng, and GUO Lihua. DNA sequence data compression method based on memetic algorithm[J]. Journal of Electronics &Information Technology, 2014, 36(1): 121–127. SHANNON C E. A mathematical theory of communication[J]. The Bell System Technical Journal, 1948, 27(3): 379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x HECKEL R, MIKUTIS G, and GRASS R N. A characterization of the DNA data storage channel[J]. Scientific Reports, 2019, 9(1): 9663. doi: 10.1038/s41598-019-45832-6 REED I S and SOLOMON G. Polynomial codes over certain finite fields[J]. Journal of the Society for Industrial and Applied Mathematics, 1960, 8(2): 300–304. doi: 10.1137/0108018 ANAVY L, VAKNIN I, ATAR O, et al. Improved DNA based storage capacity and fidelity using composite DNA letters[J]. bioRxiv, 2018. doi: 10.1101/433524 CHOI Y, RYU T, LEE A C, et al. Addition of degenerate bases to DNA-based data storage for increased information capacity[J]. bioRxiv, 2018. doi: 10.1101/367052 YAZDI S M H T, GABRYS R, and MILENKOVIC O. Portable and error-free DNA-based data storage[J]. Scientific Reports, 2017, 7: 5011. doi: 10.1038/s41598-017-05188-1 BLAWAT M, GAEDKE K, HüTTER I, et al. Forward error correction for DNA data storage[J]. Procedia Computer Science, 2016, 80: 1011–1022. doi: 10.1016/j.procs.2016.05.398 LEE H H, KALHOR R, GOELA N, et al. Enzymatic DNA synthesis for digital information storage[J]. bioRxiv, 2018. doi: 10.1101/348987 BAUM E. Building an associative memory vastly larger than the brain[J]. Science, 1995, 268(5210): 583–585. doi: 10.1126/science.7725109 CARUTHERS M H. The chemical synthesis of DNA/RNA: Our gift to science[J]. Journal of Biological Chemistry, 2013, 288(2): 1420–1427. doi: 10.1074/jbc.X112.442855 GOODWIN S, MCPHERSON J D, and MCCOMBIE W R. Coming of age: Ten years of next-generation sequencing technologies[J]. Nature Reviews Genetics, 2016, 17(6): 333–351. doi: 10.1038/nrg.2016.49 SHENDURE J, BALASUBRAMANIAN S, CHURCH G M, et al. DNA sequencing at 40: Past, present and future[J]. Nature, 2017, 550(7676): 345–353. doi: 10.1038/nature24286 DEAMER D, AKESON M, and BRANTON D. Three decades of nanopore sequencing[J]. Nature Biotechnology, 2016, 34(5): 518–524. doi: 10.1038/nbt.3423 FONTANA JR R E and DECAD G M. Moore’s law realities for recording systems and memory storage components: HDD, tape, NAND, and optical[J]. AIP Advances, 2018, 8(5): 056506. doi: 10.1063/1.5007621 BONNET J, COLOTTE M, COUDY D, et al. Chain and conformation stability of solid-state DNA: Implications for room temperature storage[J]. Nucleic Acids Research, 2010, 38(5): 1531–1546. doi: 10.1093/nar/gkp1060 PRAKADAN S M, SHALEK A K, and WEITZ D A. Scaling by shrinking: Empowering single-cell 'omics' with microfluidic devices[J]. Nature Reviews Genetics, 2017, 18(6): 345–361. doi: 10.1038/nrg.2017.15 NEWMAN S, STEPHENSON A P, WILLSEY M, et al. High density DNA data storage library via dehydration with digital microfluidic retrieval[J]. Nature Communications, 2019, 10(1): 1706. doi: 10.1038/s41467-019-09517-y -