基于國產(chǎn)眾核超級計算機(jī)的6×105核并行矩量法
doi: 10.11999/JEIT180562 cstr: 32379.14.JEIT180562
-
西安電子科技大學(xué)陜西省超大規(guī)模電磁計算重點(diǎn)實(shí)驗室 ??西安 ??710071
Parallel MoM Using the Six Hundred Thousand Cores on Domestically-made and Many-core Supercomputer
-
Shaanxi Key Laboratory of Large Scale Electromagnetic Computing, Xidian University, Xi’an 710077, China
-
摘要:
為實(shí)現(xiàn)電磁計算的安全可靠和自主可控,該文基于“天河二號”國產(chǎn)眾核超級計算機(jī)平臺,開展大規(guī)模并行矩量法(MoM)的開發(fā)工作。為減輕大規(guī)模并行計算時計算機(jī)集群的通信壓力以及加速矩量法積分方程求解,通過分析矩量法電場積分方程離散生成的矩陣具有對角占優(yōu)特性,提出一種新型LU分解算法,即對角塊矩陣選主元LU分解(BDPLU)算法,該算法減少了panel列分解的計算量,更重要的是,完全消除了選主元過程的MPI通信開銷。利用BDPLU算法,并行矩量法突破了6×105 CPU核并行規(guī)模,這是目前在國產(chǎn)超級計算平臺上實(shí)現(xiàn)的最大規(guī)模的并行矩量法計算,其矩陣求解并行效率可達(dá)51.95%。數(shù)值結(jié)果表明,并行矩量法可準(zhǔn)確高效地在國產(chǎn)超級計算平臺上解決大規(guī)模電磁問題。
-
關(guān)鍵詞:
- 矩量法 /
- LU分解 /
- 國產(chǎn)超級計算機(jī) /
- 6×105核
Abstract:In order to realize safety, reliability and self-control of electromagnetic computing, the large-scale parallel MoM is studied based on domestically-made many-core supercomputer platform named " Tianhe-2”. A new LU decomposition algorithm named Block Diagonal matrix Pivoting LU decomposition (BDPLU) algorithm, is proposed by analyzing the diagonally dominant characteristics of the matrix generated through dispersing electric field integral equation of MoM, for the purpose of communication pressure reduction to computer cluster and solution acceleration to MoM integral equation during large-scale parallel computation. The BDPLU algorithm reduces the amount of calculation in the process of panel factorization. More importantly, the algorithm completely eliminates MPI communication when pivoting. Using BDPLU algorithm, the maximum number of CPU cores break through 6×105 CPU cores, which is the largest scale of parallel MoM computation in domestically-made and many-core supercomputing platform at present, and the parallel efficiency of solving matrix can reach 51.95%. Numerical results show that parallel MoM can accurately and efficiently solve large-scale electromagnetic field problems on domestic supercomputing platform.
-
表 1 CALU算法與BDPLU算法矩陣求解時間對比
FT2000+核數(shù) 矩陣求解時間(s) 并行效率(%) CALU BDPLU CALU BDPLU 2000 796.54 742.57 100 100 3000 567.78 518.68 93.53 95.44 4000 463.93 421.12 85.85 88.17 5000 386.89 338.24 82.35 87.82 10000 226.57 187.83 70.31 79.07 15000 172.91 139.05 61.42 71.20 20000 133.97 118.38 59.46 62.73 40000 72.83 64.61 54.68 57.47 下載: 導(dǎo)出CSV
表 2 BDPLU算法求解矩陣的加速比和并行效率
FT2000+核數(shù) 矩陣求解時間(s) 加速比 并行效率(%) 9600 29183.33 1 100 48000 6336.53 4.61 92.11 96000 3501.73 8.33 83.34 192000 2035.35 14.34 71.69 240000 1764.53 16.54 66.16 336000 1328.13 21.97 62.78 384000 1227.91 23.77 59.42 432000 1133.64 25.74 57.21 480000 1043.45 27.97 55.94 504000 997.94 29.24 55.70 552000 937.82 31.12 54.12 600000 898.81 32.49 51.95 下載: 導(dǎo)出CSV
-
HARRINGTION R F. Field Computation by Moment Methods[M]. New York; IEEE Press, 1993. 王長清. 現(xiàn)代計算電磁學(xué)基礎(chǔ)[M]. 北京: 北京大學(xué)出版社, 2005: 116–157.WANG Changqing. Computational Advanced Electromagnetics[M]. Beijing: PeKing University Press, 2005: 116–157. ZHANG Yu and SARKAR T K. Parallel Solution of Integral Equation Based EM Problems in the Frequency Domain [M].Hoboken, USA: Wiley-IEEE, 2009: 107–136. 張玉, 趙勛旺, 陳巖, 等. 計算電磁學(xué)中的大規(guī)模并行矩量法[M]. 西安: 西安電子科技大學(xué)出版社, 2016: 11210.ZHANG Yu, ZHAO Xunwang, CHEN Yan, et al. Massively Parallel Method of Moment in Computational Electromagnetics[M].Xi’an: Xidian University Press, 2016: 11210. 林中朝, 張爽, 王星, 等. 高階矩量法在國產(chǎn)超級計算機(jī)上的并行性能[J]. 微波學(xué)報, 2014, 30(S1): 44–47LIN Zhongchao, ZHANG Shuang, WANG Xing, et al. Parallel performance of higher-order MoM on a domestically-made supercomputer[J]. Journal of Microwaves, 2014, 30(S1): 44–47 林中朝, 陳巖, 張玉, 等. 國產(chǎn)CPU平臺中并行高階矩量法研究[J]. 西安電子科技大學(xué)學(xué)報, 2015, 42(3): 43–47 doi: 10.3969/j.issn.1001-2400.2015.03.008LIN Zhongchao, CHEN Yan, ZHANG Yu, et al. Study of the parallel higher-order MoM on a domestically-made CPU platform[J]. Journal of Xidian University, 2015, 42(3): 43–47 doi: 10.3969/j.issn.1001-2400.2015.03.008 ZHANG Yu, LIN Zhongchao, ZHAO Xunwang, et al. Performance of a massively parallel higher-order method of moment code using thousands of CPUs and its applications[J]. IEEE Transactions on Antenna Propagation, 2014, 62(12): 6317–6324 doi: 10.1109/TAP.2014.2361135 ZHAO Xunwang, CHEN Yan, ZHANG Huanhuan, et al. A New Decomposition Solver for Complex Electromagnetic Problems[J]. IEEE Antennas & Propagation Magazine, 2017, 59(3): 131–140 doi: 10.1109/MAP.2017.2687119 CHEN Yan, ZHANG Yu, ZHANG Guanghui, et al. Hybrid MIC/CPU parallel implementation of MoM on MIC cluster for electromagnetic problems[J]. IEICE Transactions on Electronics, 2016, 99(7): 735–743 doi: 10.1587/transele.E99.C.735 CHEN Yan, ZHANG Guanghui, LIN Zhongchao, et al. Solution of EM problems using hybrid parallel MIC/CPU implementation of higher-order MoM[C]. IEEE International Symposium on Microwave, Antenna, Propagation, and Emc Technologies. Shanghai, China, 2016: 789–791. 左勝, 陳巖, 張玉, 等. 一種可擴(kuò)展異構(gòu)并行核外高階矩量法[J]. 西安電子科技大學(xué)學(xué)報, 2017, 44(1): 146–151 doi: 10.3969/j.issn.1001-2400.2017.01.026ZUO Sheng, CHEN Yan, ZHANG Yu, et al. Study of the scalable heterogeneous parallel out-of-core higher order method of moments[J]. Journal of Xidian University, 2017, 44(1): 146–151 doi: 10.3969/j.issn.1001-2400.2017.01.026 CHEN Yan, ZUO Sheng, ZHANG Yu, et al. Large-scale parallel method of moments on CPU/MIC heterogeneous clusters[J]. IEEE Transactions on Antennas & Propagation, 2017, 65(7): 3782–3787 doi: 10.1109/TAP.2017.2700871 TANG Min, ZHAO Jieyi, TONG Ruofeng, et al. GPU accelerated convex hull computation[J]. Computers & Graphics, 2012, 36(5): 498–506 doi: 10.1016/j.cag.2012.03.015 陳巖. 高性能矩量法及其在復(fù)雜目標(biāo)電磁模擬中的應(yīng)用[D]. [博士論文], 西安電子科技大學(xué), 2017: 86–91.Chen Yan. High performance method of moments and its application in electromagnetic simulation of complex targets[D]. [Ph.D. dissertation], Xidian University, 2017: 86–91. ZHANG Yu, CHEN Yan, ZHANG Guanghui, et al. A highly efficient communication avoiding LU algorithm for Methods of Moments[C]. IEEE International Symposium on Antennas and Propagation & Usnc/ursi National Radio Science Meeting, Vancouver, Canada, 2015: 1672–1673. Intel® Developer Zone: Intel® Math Kernel Library [OL]. https://software.intel.com/en-us/forums/intel-math-kernel-library/, 2018. 徐曉飛, 曹祥玉, 高軍, 等. 基于矩量法的電大目標(biāo)RCS核外并行計算[J]. 電子與信息學(xué)報, 2011, 33(3): 758–762 doi: 10.3724/SP.J.1146.2010.00519XU Xiaofei, CAO Xiangyu, GAO Jun, et al. Parallel out-of-core calculation of electrically large objects’ RCS based on MoM[J]. Journal of Electronics &Information Technology, 2011, 33(3): 758–762 doi: 10.3724/SP.J.1146.2010.00519 馬驥, 龔書喜, 王興, 等. 一種快速計算目標(biāo)寬帶雷達(dá)截面的電磁算法[J]. 西安電子科技大學(xué)學(xué)報, 2012, 39(4): 98–102 doi: 10.3969/j.issn.1001-2400.2012.04.018MA Ji, GONG Shuxi, WANG Xing, et al. Fast computation of the wide-band radar cross section of arbitrary objects[J]. Journal of Xidian University, 2012, 39(4): 98–102 doi: 10.3969/j.issn.1001-2400.2012.04.018 國家超級計算廣州中心: 產(chǎn)品中心[OL]. http://www.nscc-gz.cn/Product/HighPerformanceComputingService/ServiceCharacteristics.html, 2018.6. -