基于多目標(biāo)進(jìn)化策略算法的DNA核酸編碼設(shè)計
doi: 10.11999/JEIT190869 cstr: 32379.14.JEIT190869
-
1.
武漢科技大學(xué)計算機(jī)科學(xué)與技術(shù)學(xué)院 武漢 430065
-
2.
智能信息處理和實時工業(yè)系統(tǒng)湖北省重點實驗室 武漢 430065
A Multiobjective Evolution Strategy Algorithm for DNA Sequence Design
-
1.
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China
-
2.
Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan 430065, China)
-
摘要: 設(shè)計高質(zhì)量的核酸分子集合能有效提高DNA計算的可靠性、有效性和可求解問題的規(guī)模。DNA分子需要滿足熱力學(xué)約束、相似度約束、GC含量約束等多個相互沖突的目標(biāo)函數(shù),是典型的多目標(biāo)優(yōu)化問題。該文提出一種多目標(biāo)進(jìn)化策略(MOES)算法求解DNA分子序列設(shè)計問題,算法設(shè)計了隨機(jī)堿基變異算子實現(xiàn)高效的局部搜索和全局搜索。改進(jìn)的評價函數(shù)綜合考慮了候選解的支配關(guān)系和沖突目標(biāo)的平衡程度,選取符合DNA編碼約束的核酸序列。實驗結(jié)果證明,該文提出的算法具有高效的搜索效率和快速收斂能力,可以產(chǎn)生高質(zhì)量的DNA序列集合,優(yōu)于其他對比算法產(chǎn)生的DNA分子序列集合。
-
關(guān)鍵詞:
- DNA計算 /
- DNA序列設(shè)計 /
- 多目標(biāo)進(jìn)化算法 /
- 進(jìn)化策略
Abstract: It is important to design high-quality DNA sequences set, which can improve the reliability and efficiency of DNA computing. DNA sequence design problem is an multiobjective optimization problem that needs to satisfy multiple conflict objectives which are thermodynamic constraint, similarity constraint and GC content constraint simultaneously. A MultiObjective Evolutionary Strategy (MOES) is proposed to solve the DNA sequence design problem. The random base mutation operator is designed for exploration and exploitation the search space. The fitness function is improved for obtaining balanced similarity and H-measure objective functions. Some state-of-the-art approaches are chosen to evaluate the effectivity of proposed algorithm. The experiment results show that the proposed multiobjective evolution strategy algorithm obtains very promising DNA sequences and outperforms previous approaches. -
表 1 個體變異算子偽代碼
輸入: X=$5' $–x1x2$ \cdots $xn–$3' $ 輸出: Y=$5' $–y1y2$ \cdots $yn–$3' $ 1: for j=1 to n 2: List.add(j) 3: end for 4: k=random(n) 5: for j=1 to k 6: i = random(List.count) 7: yi=(xi+random(3)+1) mod 4 8: List.delete(i) 9: end for 下載: 導(dǎo)出CSV
表 2 算法總體流程偽代碼
1: Initialization 2: while (t < max iteration) 3: for i=1 to P 4: p = Pt(i) 5: q = Individual Mutation(p) 6: if q ? p then 7: Pt(i)=q 8: else if (q ? p) and (p ? q) then 9: if Fitness(q) > Fitness(p) then 10: Pt(i)=q 11: end if 12: end if 13: end for 14: t=t+1 15: end while 下載: 導(dǎo)出CSV
表 3 7條長度為20的DNA序列的結(jié)果比較
方法(序列) 連續(xù)性 發(fā)卡結(jié)構(gòu) H-measure 相似度 Tm GC(%) MGA[7] TAGACCACTGTTGCACATGG 0 0 58 52 50.2794 50 ATTCGGTCAGACTTGCTGTG 0 0 64 52 48.6650 50 ATAGTGCGGACAGTAGTTCC 0 0 66 59 50.1634 50 AATACGCGGAACGTAACCTC 0 0 61 85 50.4158 50 AATACGCGGAACGTAACCTC 0 0 61 85 50.4158 50 ACAGCCTTAAGCCTAACTCC 0 3 65 54 49.0641 50 ATGCTTCCGACATGGAATGG 0 3 63 57 49.8160 50 f (x) 0 6 438 444 1.7508 0 IWO[8] ACACCAGCACACCAGAAACA 9 0 55 55 48.4670 50 GTTCAATCGCCTCTCGGTAT 0 0 57 57 49.3935 50 GCTACCTCTTCCACCATTCT 0 0 55 55 49.2453 50 GAATCAATGGCGGTCAGAAG 0 0 66 66 49.9440 50 TTGGTCCGGTTATTCCTTCG 0 0 65 65 50.6418 50 CCATCTTCCGTACTTCACTG 0 0 56 56 51.0993 50 TTCGACTCGGTTCCTTGCTA 0 0 58 58 47.6049 50 f (x) 9 0 412 412 3.4944 0 pMO-ABC[10] TGTGGTTGGTTAGTCGGTTG 0 0 46 49 51.0421 50 GGTGGTATTGGTGGTATTGG 0 0 47 47 53.8027 50 CTTCTCTTCTCTTGCCGCTT 0 0 39 56 46.4112 50 AACAACCTCCACACCGAACA 0 0 62 32 49.1737 50 CTCTCTCTCTCACTCTCTCA 0 0 41 48 46.5220 50 CTCTCATTCCTTCTTACCCC 16 0 43 51 50.8283 50 TGGTGTTGCTGGTGTAGGTT 0 0 48 51 49.3985 50 f (x) 16 0 326 334 7.3915 0 MOES GGAGAGGAGAAGAAGAAGAG 0 0 52 25 48.1727 50 CCTCCACATCACCATTAACC 0 0 56 31 52.3807 50 CTCTCTCTCTCTCTCTCTCT 0 0 34 37 45.6658 50 TTCCTTCCTTCCTTCCTTCC 0 0 36 39 48.7500 50 TTGGTTGGTTGGTTGGTTGG 0 0 30 46 51.3054 50 TTGTTGTTGGTGGTGGTGGT 0 0 30 48 50.2236 50 TGTGTGTGGTGTGTGTTGTG 0 0 30 46 51.0025 50 f (x) 0 0 268 272 6.7149 0 下載: 導(dǎo)出CSV
表 4 14條長度為20的DNA序列的結(jié)果比較
方法(序列) 連續(xù)性 發(fā)卡結(jié)構(gòu) H-measure 相似度 Tm GC(%) MGA[7] CTCATCTAATCAGCCTCGCA 0 0 135 114 48.1554 50 CTAATAGTGACAGCTGCGTG 0 3 131 119 50.2421 50 GCATCGTTAGAGACACCTAC 0 3 134 124 50.7932 50 GCATCAATATGCGCGACTAC 0 0 131 125 50.2815 50 CATTAAGTAGACGCTGTCGG 0 3 132 114 50.9507 50 TATGGATGAGGAGGACCTAG 0 3 133 117 50.6387 50 CAGAGATGTTCTGTACCACC 0 3 128 117 51.2232 50 CGTCGAGAATTCGTAGCTCA 0 0 137 119 48.3224 50 TCTGTTACCGTATCGGATCG 0 3 129 115 50.8791 50 AGAAGAGTTCGACTTGCTGG 0 3 134 121 47.5507 50 GCAAGGAATTCACCGTCTGT 0 3 133 129 48.9881 50 CGTGTGAAGAGAGTGGTTCA 0 0 127 123 48.9355 50 CGACTGAATCATGGACCTGT 0 3 134 126 49.7624 50 TACCGAGAAGTAGGACTGCA 0 3 134 124 48.3847 50 f (x) 0 30 1852 1687 3.6725 0 INSGA-II[9] CGAGACATCGTGCATATCGT 0 4 143 124 49.6393 50 TATAGCACGAGTGCGCGTAT 0 3 137 130 48.5659 50 GATCTACGATCATGAGAGCG 0 4 135 126 49.6673 50 TCTGTACTGCTGACTCGAGT 0 3 163 124 47.1312 50 CGAGTAGTCACACGATGAGA 0 0 152 132 49.2836 50 AGATGATCAGCAGCGACACT 0 3 133 133 46.5546 50 TGTGCTCGTCTCTGCATACT 0 4 159 130 47.1507 50 AGACGAGTCGTACAGTACAG 0 0 152 134 49.9091 50 ATGTACGTGAGATGCAGCAG 0 0 139 121 48.9270 50 ATCACTACTCGCTCGTCACT 0 3 141 132 47.5190 50 TCAGAGATACTCACGTCACG 0 3 142 123 49.2836 50 GACAGAGCTATCAGCTACTG 0 3 129 124 49.2927 50 GCTGACATAGAGTGCGATAC 0 0 130 133 50.1725 50 ACATCGACACTACTACGCAC 0 3 133 144 50.1554 50 f (x) 0 33 1988 1810 3.6179 0 pMO-ABC[10] GTTATTGGTGGTGTGCGTTG 0 0 143 82 51.9305 50 ACGGAAGTAGGAAGGAGAGA 0 0 137 106 47.8089 50 GGAAGACGCAGAAGAGAAAG 9 0 121 110 48.2609 50 CCTCCTTATTGCCTTCCTTC 0 0 114 102 50.3081 50 AACTAACCACCGACCAACCA 0 0 95 118 50.1102 50 ACACACAACACACACACTCC 0 0 88 119 50.4577 50 ACACCACCACATTACCACAC 0 0 97 119 51.9161 50 CTTCCGTCTCTTCTCTCTCT 0 0 134 105 46.9561 50 AAGGAGCGAGGAAGCGAAAA 16 0 107 95 45.8306 50 AACACCAGAACATCCACACC 0 0 90 131 50.5474 50 CCAACACCATACAACAGACC 0 0 95 130 52.3720 50 AAGGCGGAAGGATAGAAGAG 0 0 128 115 48.5370 50 TCTGCCGCTTCTTCTTCTTC 0 0 118 95 46.4000 50 TCCTCTCGTCTATTCTCCTC 0 0 111 98 48.4427 50 f (x) 25 0 1578 1525 6.5414 0 MOES CATACACACTCACACTCACC 0 0 112 89 51.6792 50 TTGTTGTGGGTTGTCCGGTT 9 0 105 90 49.7949 50 ACACACACACACACACACAC 0 0 93 78 50.9244 50 TTGTGGTCCTGGTGTTCTCT 0 0 112 90 48.4957 50 GAGAGAGAGAGAGAGAGAGA 0 0 72 100 45.6568 50 TGGTGTGGTGTGGTTAGGTT 0 0 96 93 50.5325 50 TTGGTGGTGGTGGTTGTAGT 0 0 96 95 50.5325 50 CCAACCAACCAACCAACCAA 0 0 95 78 51.3054 50 AACAAGCCAGAAGCCAGAAG 0 0 94 102 47.5066 50 GTTGGTGCTGTTGTTGAGGT 0 0 101 99 49.4550 50 GAAGAAGGGAGGAGAGAAGA 9 0 77 108 47.4961 50 AATGGAAGCGAAGCGAAGTG 0 0 93 104 47.6766 50 AACCATCAACCGCCGAAGAA 0 3 104 95 48.1694 50 AAGGTGGAGAGGAAGGAGAA 0 0 82 111 47.4098 50 f (x) 18 3 1332 1332 6.0224 0 下載: 導(dǎo)出CSV
-
ADLEMAN L M. Molecular computation of solutions to combinatorial problems[J]. Science, 1994, 266(5187): 1021–1024. doi: 10.1126/science.7973651 DE SILVA P Y and GANEGODA G U. New trends of digital data storage in DNA[J]. BioMed Research International, 2016: 8072463. BRAICH R S, CHELYAPOV N, JOHNSON C, et al. Solution of a 20-variable 3-SAT problem on a DNA computer[J]. Science, 2002, 296(5567): 499–502. doi: 10.1126/science.1069528 ZIMMERMANN K H. Efficient DNA sticker algorithms for NP-complete graph problems[J]. Computer Physics Communications, 2002, 144(3): 297–309. doi: 10.1016/S0010-4655(02)00270-9 XU Jin, QIANG Xiaoli, ZHANG Kai, et al. A DNA computing model for the graph vertex coloring problem based on a probe graph[J]. Engineering, 2018, 4(1): 61–77. doi: 10.1016/j.eng.2018.02.011 CHAVES-GONZáLEZ J M and VEGA-RODRíGUEZ M A. A multiobjective approach based on the behavior of fireflies to generate reliable DNA sequences for molecular computing[J]. Applied Mathematics and Computation, 2014, 227: 291–308. doi: 10.1016/j.amc.2013.11.032 PENG Ximei, ZHENG Xuedong, WANG Bin, et al. A micro-genetic algorithm for DNA encoding sequences design[C]. The 2nd International Conference on Control Science and Systems Engineering, Singapore, 2016: 10–14. YANG Gaijing, WANG Bin, ZHENG Xuedong, et al. IWO algorithm based on niche crowding for DNA sequence design[J]. Interdisciplinary Sciences: Computational Life Sciences, 2017, 9(3): 341–349. doi: 10.1007/s12539-016-0160-0 WANG Yanfeng, SHEN Yongpeng, ZHANG Xuncai, et al. An improved non-dominated sorting genetic algorithm-Ⅱ (INSGA-Ⅱ) applied to the design of DNA codewords[J]. Mathematics and Computers in Simulation, 2018, 151: 131–139. doi: 10.1016/j.matcom.2018.03.011 CHAVES-GONZáLEZ J M and MARTíNEZ-GIL J. An efficient design for a multi-objective evolutionary algorithm to generate DNA libraries suitable for computation[J]. Interdisciplinary Sciences: Computational Life Sciences, 2018, 11(3): 542–558. YANG Shuming, SHAO Dongguo, and LUO Yangjie. A novel evolution strategy for multiobjective optimization problem[J]. Applied Mathematics and Computation, 2005, 170(2): 850–873. doi: 10.1016/j.amc.2004.12.025 ARNOLD D V and BEYER H G. Investigation of the (μ, λ)-ES in the presence of noise[C]. The 2001 Congress on Evolutionary Computation, Seoul, South Korea, 2001, 1: 332–339. EBENAU C, ROTTSCH?FER J, and THIERAUF G. An advanced evolutionary strategy with an adaptive penalty function for mixed-discrete structural optimisation[J]. Advances in Engineering Software, 2005, 36(1): 29–38. doi: 10.1016/j.advengsoft.2003.10.008 MEZURA-MONTES E and COELLO C A C. A simple multimembered evolution strategy to solve constrained optimization problems[J]. IEEE Transactions on Evolutionary Computation, 2005, 9(1): 1–17. doi: 10.1109/TEVC.2004.836819 XIAO J, XU Jin, CHEN Zhihua, et al. A hybrid quantum chaotic swarm evolutionary algorithm for DNA encoding[J]. Computers & Mathematics with Applications, 2009, 57(11/12): 1949–1958. CHAVES-GONZáLEZ J M, VEGA-RODRíGUEZ M A, and GRANADO-CRIADO J M. A multiobjective swarm intelligence approach based on artificial bee colony for reliable DNA sequence design[J]. Engineering Applications of Artificial Intelligence, 2013, 26(9): 2045–2057. doi: 10.1016/j.engappai.2013.04.011 MUHAMMAD M S, SELVAN K V, MASRA S M W, et al. An improved binary particle swarm optimization algorithm for DNA encoding enhancement[C]. 2011 IEEE Symposium on Swarm Intelligence, Paris, France, 2011: 1–8. CHAVES-GONZáLEZ J M and VEGA-RODRíGUEZ M A. DNA strand generation for DNA computing by using a multi-objective differential evolution algorithm[J]. Biosystems, 2014, 116: 49–64. doi: 10.1016/j.biosystems.2013.12.005 BUI L T and ALAM S. Multi-Objective Optimization in Computational Intelligence: Theory and Practice[M]. Hershey: IGI Global, 2008. SHIN S Y, LEE I H, KIM D, et al. Multiobjective evolutionary optimization of DNA sequences for reliable DNA computing[J]. IEEE Transactions on Evolutionary Computation, 2005, 9(2): 143–158. doi: 10.1109/TEVC.2005.844166 -