基于隨機森林的流處理檢查點性能預(yù)測
doi: 10.11999/JEIT190552 cstr: 32379.14.JEIT190552
-
新疆大學(xué)信息科學(xué)與工程學(xué)院 烏魯木齊 830046
Performance Prediction Based on Random Forest for the Stream Processing Checkpoint
-
School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
-
摘要:
物聯(lián)網(wǎng)(IoT)的發(fā)展引起流數(shù)據(jù)在數(shù)據(jù)量和數(shù)據(jù)類型兩方面不斷增長。由于實時處理場景的不斷增加和基于經(jīng)驗知識的配置策略存在缺陷,流處理檢查點配置策略面臨著巨大的挑戰(zhàn),如費事費力,易導(dǎo)致系統(tǒng)異常等。為解決這些挑戰(zhàn),該文提出基于回歸算法的檢查點性能預(yù)測方法。該方法首先分析了影響檢查點性能的6種特征,然后將訓(xùn)練集的特征向量輸入到隨機森林回歸算法中進行訓(xùn)練,最后,使用訓(xùn)練好的算法對測試數(shù)據(jù)集進行預(yù)測。實驗結(jié)果表明,與其它機器學(xué)習(xí)算法相比,隨機森林回歸算法在CPU密集型基準(zhǔn)測試,內(nèi)存密集型基準(zhǔn)測試和網(wǎng)絡(luò)密集型基準(zhǔn)測試上針對檢查點性能的預(yù)測具有誤差低,準(zhǔn)確率高和運行高效的優(yōu)點。
Abstract:Since real-time processing scenarios for ever-increasing amount and type of streaming data caused by the development of the Internet of Things (IoT) keep increasing, and strategies based on empirical knowledge for checkpoint configuration are deficiencies, the strategy faces huge challenges, such as time-consuming, labor-intensive, causing system anomalies, etc. To address these challenges, regression algorithm-based prediction is proposed for checkpoint performance. Firstly, six kinds of features, which have a huge influence on the performance, are analyzed, and then feature vectors of the training set are input into the regression algorithms for training, finally, test sets are used for the checkpoint performance prediction. Compared with other machine learning algorithms, the experimental results illustrat that the Random Forest (RF) has lower errors, higher accuracy and faster execution on CPU intensive benchmark, memory intensive benchmark and network intensive benchmark.
-
表 1 動態(tài)特征總結(jié)
特征名稱 描述 本地進入記錄數(shù) 算子每秒接收的本地記錄數(shù)。 遠(yuǎn)程進入記錄數(shù) 算子每秒接收的遠(yuǎn)程記錄數(shù)。 本地緩存記錄數(shù) 算子每秒緩存的本地記錄數(shù)。 遠(yuǎn)程緩存記錄數(shù) 算子每秒緩存的遠(yuǎn)程記錄數(shù)。 下載: 導(dǎo)出CSV
表 2 數(shù)據(jù)集描述
基準(zhǔn)測試 樣本數(shù)量 特征數(shù)量 訓(xùn)練樣本數(shù)量 預(yù)測樣本數(shù)量 CKCPU 47100 332 37680 9420 CKMEM 10290 172 8232 2058 CKNET 18900 524 15120 3780 下載: 導(dǎo)出CSV
表 3 不同回歸算法預(yù)測誤差結(jié)果
基準(zhǔn)測試 回歸算法 MAE RMSE MediaAE CKCPU SVR poly 0.107006 1.900023 37.921288 SVR linear 0.095006 27.06338 37.529361 KNN 0.108006 0.323870 0.286494 BPNN 0.042380 0.070043 0.129856 RF 0.040178 0.068811 0.125560 CKMEM SVR poly 0.115007 0.037560 10.924428 SVR linear 0.178010 2.524596 4.085918 KNN 0.148008 0.370660 0.373577 BPNN 0.097356 0.199461 0.214980 RF 0.096046 0.196619 0.206272 CKMEM SVR poly 0.091005 0.645619 0.634070 SVR linear 0.301017 0.545833 0.523365 KNN 0.102006 0.742873 0.742375 BPNN 0.020343 0.103857 0.147659 RF 0.019501 0.089315 0.089082 下載: 導(dǎo)出CSV
-
彭建華, 張帥, 許曉明, 等. 物聯(lián)網(wǎng)中一種抗大規(guī)模天線陣列竊聽者的噪聲注入方案[J]. 電子與信息學(xué)報, 2019, 41(1): 67–73. doi: 10.11999/JEIT180342PENG Jianhua, ZHANG Shuai, XU Xiaoming, et al. A noise injection scheme resistant to massive MIMO eavesdropper in IoT[J]. Journal of Electronics &Information Technology, 2019, 41(1): 67–73. doi: 10.11999/JEIT180342 劉素艷, 劉元安, 吳帆, 等. 物聯(lián)網(wǎng)中基于相似性計算的傳感器搜索[J]. 電子與信息學(xué)報, 2018, 40(12): 3020–3027. doi: 10.11999/JEIT171085LIU Suyan, LIU Yuan’an, WU Fan, et al. Sensor search based on sensor similarity computing in the Internet of Things[J]. Journal of Electronics &Information Technology, 2018, 40(12): 3020–3027. doi: 10.11999/JEIT171085 CARBONE P, EWEN S, FóRA G, et al. State management in Apache Flink?: Consistent stateful distributed stream processing[J]. Proceedings of the VLDB Endowment, 2017, 10(12): 1718–1729. doi: 10.14778/3137765.3137777 VENKIVOLU D R and NALE M N. Adaptive encryption in checkpoint recovery of file transfers[P]. US, 20190306221, 2019-10-03. KIM Y, NAKAMURA J, KATAYAMA Y, et al. A cooperative partial snapshot algorithm for checkpoint-rollback recovery of large-scale and dynamic distributed systems[C]. The 6th International Symposium on Computing and Networking Workshops (CANDARW), Takayama, Japan, 2018: 285–291. doi: 10.1109/CANDARW.2018.00060. TAO Yangyang and YU Shucheng. kFHCO: Optimal VM consolidation via k-Factor horizontal checkpoint oversubscription[C]. 2019 International Conference on Computing, Networking and Communications (ICNC), Honolulu, USA, 2019: 380–384. doi: 10.1109/ICCNC.2019.8685604. GOUNTIA D and ROY S. Checkpoints assignment on cyber-physical digital microfluidic biochips for early detection of hardware Trojans[C]. The 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 2019: 16–21. doi: 10.1109/ICOEI.2019.8862598. ZHANG Hanlin, CHEN Ningjiang, TANG Yusi, et al. Multi-level container checkpoint performance optimization strategy in SDDC[C]. The 4th International Conference on Big Data and Computing, Guangzhou, China, 2019: 253–259. doi: 10.1145/3335484.3335487. TITOUNA C, MOUMEN H, and ARI A A A. Cluster head recovery algorithm for wireless sensor networks[C]. The 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, 2019: 1905–1910. doi: 10.1109/CoDIT.2019.8820414. OVENS S and WOELFEL P. Strongly linearizable implementations of snapshots and other types[C]. 2019 ACM Symposium on Principles of Distributed Computing, Toronto, Canada, 2019: 197–206. doi: 10.1145/3293611.3331632. ATHEY S, TIBSHIRANI J, WAGER S, et al. Gemeralized random ferests[J]. Annals of statistics, 2019, 47(2): 1148–1178. doi: 10.1214/18-AOS1709 CHOI J, GU B, CHIN S, et al. Machine learning predictive model based on national data for fatal accidents of construction workers[J]. Automation in Construction, 2020, 110: 102974. doi: 10.1016/j.autcon.2019.102974 LYU J and MANOOCHEHRI S. Dimensional prediction for FDM machines using artificial neural network and support vector regression[C]. ASME 2019 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. Anaheim, USA, 2019. doi: 10.1115/DETC2019-97963. DABERDAKU S, TAVAZZI E, and DI CAMILLO B. Interpolation and K-nearest neighbours combined imputation for longitudinal ICU laboratory data[C]. 2019 IEEE International Conference on Healthcare Informatics (ICHI), Xi’an, China, 2019: 1–3. doi: 10.1109/ICHI.2019.8904624. ASAAD R R and ALI R I. Back Propagation Neural Network (BPNN) and sigmoid activation function in multi-layer networks[J]. Academic Journal of Nawroz University, 2019, 8(4): 216–221. doi: 10.25007/ajnu.v8n4a464 -