大規(guī)模并行高階矩量法的容錯算法研究
doi: 10.11999/JEIT161308 cstr: 32379.14.JEIT161308
國家自然科學(xué)基金(61301069),教育部新世紀(jì)優(yōu)秀人才支持計劃(NCET-13-0949),中央高?;究蒲袠I(yè)務(wù)費(JB160218),國家863計劃項目(2012AA01A308)
Fault Tolerant Algorithm of Higher-order Method of Moments
The National Natural Science Foundation of China (61301069), The Program for New Century Excellent Talents in University of China (NCET-13-0949), The Fundamental Research Funds for the Central Universities (JB160218), The National 863 Program of China (2012AA01A308)
-
摘要: 基于超級計算機的大規(guī)模并行電磁計算對于解決實際工程中的復(fù)雜電磁難題具有重要意義,但超級計算機中由節(jié)點故障導(dǎo)致的進程崩潰事件的概率遠遠高于普通計算機。該文針對傳統(tǒng)電磁計算難以有效應(yīng)對進程崩潰的現(xiàn)狀,提出一種高效的、適用于大規(guī)模并行高階矩量法的容錯算法。在現(xiàn)有并行高階矩量法的基礎(chǔ)上,基于硬盤緩存和直接內(nèi)存讀取設(shè)計高效率、高可靠性的現(xiàn)場保護算法,同時設(shè)計了高效的斷點恢復(fù)算法。算法的有效性主要在于固定的現(xiàn)場保護點這一特點,它使得算法在有故障的情況下仍然可以正常有序地進行;而原算法每次碰到故障,則只能從頭計算。數(shù)值仿真實驗驗證了容錯算法在應(yīng)對進程崩潰事件時的有效性,大幅提高了大規(guī)模并行高階矩量法的可靠性。
-
關(guān)鍵詞:
- 超級計算機 /
- 并行矩量法 /
- 容錯算法 /
- 現(xiàn)場保護 /
- 可靠性
Abstract: The large scale parallel electromagnetic computation based on the supercomputer is of great significance for solving complicate electromagnetic problems in practical engineering. However, the probability of the process crash event caused by node failure in the supercomputer is much higher than that in the regular computer. Considering the incapable action for traditional electromagnetic computation to overcome the process crash event, an efficient fault-tolerance algorithm for large scale parallel high order Method of Moments (MoM) is proposed in this paper. According to the parallel higher order method of moments algorithm available, a scene protection algorithm and a scene recovery algorithm with high efficiency and reliability are designed, based on the disk cache and direct memory access technique. The efficiency of this algorithm lies on the feature of the fixed site protection, which makes it possible for the algorithm to work normal and ordered even encountering crash failure, while the original algorithm can only restart from the beginning. The numerical simulations demonstrate the efficiency of the fault-tolerant algorithm in dealing with the process crash, which improves greatly the reliability of the large scale parallel high order MoM. -
王長清. 現(xiàn)代計算電磁學(xué)基礎(chǔ)[M]. 北京: 北京大學(xué)出版社, 2005: 116-157. HARRINGTON R F. Field Computation by Moment Methods[M]. New York: IEEE Press, 1993. WANG C. Computational Advanced Electromagnetics[M]. Beijing: Peking University Press, 2005: 116-157. 張玉, 趙勛旺, 陳巖, 等. 計算電磁學(xué)中的大規(guī)模并行矩量法[M]. 西安: 西安電子科技大學(xué)出版社, 2016: 112-171. ZHANG Y, ZHAO X, CHEN Y, et al. Massively Parallel Method of Moment in Computational Electromagnetics[M]. Xian: Xidian University Press, 2016: 112-171. 張玉, 王萌, 梁昌洪, 等. PC集群系統(tǒng)中MPI并行矩量法研究[J]. 電子與信息學(xué)報, 2005, 27(4): 647-650. ZHANG Y, WANG M, LIANG C H, et al. Study of parallel MoM on PC clusters[J]. Journal of Electronics Information Technology, 2005, 27(4): 647-650. 徐曉飛, 曹祥玉, 高軍, 等. 基于矩量法的電大目標(biāo)RCS核外并行計算[J]. 電子與信息學(xué)報, 2011, 33(3): 758-762. doi: 10.3724/SP.J.1146.2010.00519. XU X F, CAO X Y, GAO J, et al. Parallel out-of-core calculation of electrically large objects RCS based on MoM [J]. Journal of Electronics Information Technology, 2011, 33(3): 758-762. doi: 10.3724/SP.J.1146.2010.00519. Zhang Y and Sarkar T K. Parallel Solution of Integral Equation Based EM Problems in the Frequency Domain[M]. Hoboken, NJ: Wiley-IEEE, 2009: 107-136. doi: 10.1002/ 9780470495094. 林中朝, 陳巖, 張玉, 等. 國產(chǎn)CPU平臺中并行高階矩量法研究[J]. 西安電子科技大學(xué)學(xué)報, 2015, 42(3): 43-47. doi: 10.3969/j.issn.1001-2400.2015.03.008. LIN Z, CHEN Y, ZHANG Y, et al. Study of the parallel higher-order MoM on a domestically-made CPU platform[J]. Journal of Xidian University, 2015, 42(3): 43-47. doi: 10. 3969/j.issn.1001-2400.2015.03.008. ZHANG Y, LIN Z, ZHAO X, et al. Performance of a massively parallel higher-order method of moment code using thousands of CPUs and its applications[J]. IEEE Transactions on Antennas and Propagation, 2014, 62(12): 6317-6324. doi: 10.1109/TAP.2014.2361135. 林中朝, 陳巖, 張玉, 等. 高階矩量法的超級電磁計算研究[J]. 科研信息化技術(shù)與應(yīng)用, 2015, 6(4): 20-28. doi: 10.11871/ j.issn.1674-9480.2015.04.003. LIN Z, CHEN Y, ZHANG Y, et al. Study of super electromagnetic computing for higher-order MoM[J]. e-Science Technology Application, 2015, 6(4): 20-28. doi: 10.11871/j.issn.1674-9480.2015.04.003. CHEN Y, ZHANG Y, ZHANG G, et al. Hybrid MIC/CPU parallel implementation of MoM on MIC cluster for electromagnetic problems[J]. IEICE Transactions on Electronics, 2016, 99(7): 735-743. doi: 10.1587/transele.E99. C.735. 王少剛, 關(guān)鑫璞, 王黨衛(wèi), 等. 求解電場積分方程的高階矩量法[J]. 電子與信息學(xué)報, 2007, 29(9): 2265-2268. Wang S, Guan X, Wang D, et al. Solution of the electric field integral equation using higher-order method of moments[J]. Journal of Electronics Information Technology, 2007, 29(9): 2265-2268. -
計量
- 文章訪問數(shù): 1179
- HTML全文瀏覽量: 96
- PDF下載量: 207
- 被引次數(shù): 0