基于遷移演員-評(píng)論家學(xué)習(xí)的服務(wù)功能鏈部署算法
doi: 10.11999/JEIT190542 cstr: 32379.14.JEIT190542
-
1.
重慶郵電大學(xué)通信與信息工程學(xué)院 重慶 400065
-
2.
重慶郵電大學(xué)移動(dòng)通信技術(shù)重點(diǎn)實(shí)驗(yàn)室 重慶 400065
Deployment Algorithm of Service Function Chain Based on Transfer Actor-Critic Learning
-
1.
School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
-
2.
Key Laboratory of Mobile Communication, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
-
摘要: 針對(duì)5G網(wǎng)絡(luò)切片環(huán)境下由于業(yè)務(wù)請(qǐng)求的隨機(jī)性和未知性導(dǎo)致的資源分配不合理從而引起的系統(tǒng)高時(shí)延問題,該文提出了一種基于遷移演員-評(píng)論家(A-C)學(xué)習(xí)的服務(wù)功能鏈(SFC)部署算法(TACA)。首先,該算法建立基于虛擬網(wǎng)絡(luò)功能放置、計(jì)算資源、鏈路帶寬資源和前傳網(wǎng)絡(luò)資源聯(lián)合分配的端到端時(shí)延最小化模型,并將其轉(zhuǎn)化為離散時(shí)間馬爾可夫決策過程(MDP)。而后,在該MDP中采用A-C學(xué)習(xí)算法與環(huán)境進(jìn)行不斷交互動(dòng)態(tài)調(diào)整SFC部署策略,優(yōu)化端到端時(shí)延。進(jìn)一步,為了實(shí)現(xiàn)并加速該A-C算法在其他相似目標(biāo)任務(wù)中(如業(yè)務(wù)請(qǐng)求到達(dá)率普遍更高)的收斂過程,采用遷移A-C學(xué)習(xí)算法實(shí)現(xiàn)利用源任務(wù)學(xué)習(xí)的SFC部署知識(shí)快速尋找目標(biāo)任務(wù)中的部署策略。仿真結(jié)果表明,該文所提算法能夠減小且穩(wěn)定SFC業(yè)務(wù)數(shù)據(jù)包的隊(duì)列積壓,優(yōu)化系統(tǒng)端到端時(shí)延,并提高資源利用率。
-
關(guān)鍵詞:
- 網(wǎng)絡(luò)切片 /
- 服務(wù)功能鏈部署 /
- 馬爾可夫決策過程 /
- 演員-評(píng)論家學(xué)習(xí) /
- 遷移學(xué)習(xí)
Abstract: To solve the problem of high system delay caused by unreasonable resource allocation because of randomness and unpredictability of service requests in 5G network slicing, this paper proposes a deployment scheme of Service Function Chain (SFC) based on Transfer Actor-Critic (A-C) Algorithm (TACA). Firstly, an end-to-end delay minimization model is built based on Virtual Network Function (VNF) placement, and joint allocation of computing resources, link resources and fronthaul bandwidth resources, then the model is transformed into a discrete-time Markov Decision Process (MDP). Next, A-C learning algorithm is adopted in the MDP to adjust dynamically SFC deployment scheme by interacting with environment, so as to optimize the end-to-end delay. Furthermore, in order to realize and accelerate the convergence of the A-C algorithm in similar target tasks (such as the arrival rate of service requests is generally higher), the transfer A-C algorithm is adopted to utilize the SFC deployment knowledge learned from source tasks to find quickly the deployment strategy in target tasks. Simulation results show that the proposed algorithm can reduce and stabilize the queuing length of SFC packets, optimize the system end-to-end delay, and improve resource utilization. -
表 1 基于遷移A-C學(xué)習(xí)的SFC部署算法
輸入:高斯策略${ {\pi} _\theta }(s,a)\sim N(\mu (s),{\sigma ^2})$,以及其梯度${{\text{?}} _\theta }\ln { {\pi} _\theta }(s,a)$,狀態(tài)分布${d^{\pi} }(s)$,學(xué)習(xí)率${\varepsilon _{a,t}}$和${\varepsilon _{c,t}}$,折扣因子$\beta $ (1) for ${\rm{epsoide } }= 0,1,2, ··· ,E{p_{\max} }$ do (2) 初始化:策略參數(shù)向量${{{\theta }}_t}$,狀態(tài)-動(dòng)作值函數(shù)參數(shù)向量${\omega _t}$,狀態(tài)值函數(shù)參數(shù)向量${{{\upsilon}} _t}$,初始狀態(tài)${s_0}\sim{d_{\pi} }(s)$,本地部署策略${\pi} _\theta ^n(s,a)$,外
來遷移部署策略${\pi} _\theta ^e(s,a)$(3) for 回合每一步$t = 0,1, ··· ,T$do (4) 由式(20)得到整體部署策略,遵循整體策略${ {\pi} _\theta }(s,a)$選擇動(dòng)作${a^{(t)}}$,進(jìn)行VNF放置和資源分配,而后更新環(huán)境狀態(tài)${s^{(t + 1)}}$,并得到立即
獎(jiǎng)勵(lì)${R_t} = - \tau (t)$(5) end for (6) 評(píng)論家過程: (a) 計(jì)算相容特征:由式(10)得處于狀態(tài)$s$的基函數(shù)向量,結(jié)合式(14),式(15)得相容特征 (b) 相容近似:由式(11)得狀態(tài)-動(dòng)作值函數(shù)近似,由式(16)得狀態(tài)值函數(shù)近似 (c) TD誤差計(jì)算:由式(12),式(17)分別得狀態(tài)-動(dòng)作值函數(shù)、狀態(tài)值函數(shù)的TD誤差 (d) 更新評(píng)論家參數(shù):由式(13)得狀態(tài)-動(dòng)作值函數(shù)參數(shù)向量更新,由式(18)得狀態(tài)值函數(shù)參數(shù)向量更新 (7) 演員過程: (a) 計(jì)算優(yōu)勢(shì)函數(shù) (b) 重寫策略梯度:代入優(yōu)勢(shì)函數(shù)由式(19)得策略梯度 (c) 更新演員參數(shù):由式(8)得策略參數(shù)向量更新 (8) end for 下載: 導(dǎo)出CSV
-
AGARWAL S, MALANDRINO F, CHIASSERINI C F, et al. VNF placement and resource allocation for the support of vertical services in 5G networks[J]. IEEE/ACM Transactions on Networking, 2019, 27(1): 433–446. doi: 10.1109/TNET.2018.2890631 史久根, 張徑, 徐皓, 等. 一種面向運(yùn)營成本優(yōu)化的虛擬網(wǎng)絡(luò)功能部署和路由分配策略[J]. 電子與信息學(xué)報(bào), 2019, 41(4): 973–979. doi: 10.11999/JEIT180522SHI Jiugen, ZHANG Jing, XU Hao, et al. Joint optimization of virtualized network function placement and routing allocation for operational expenditure[J]. Journal of Electronics &Information Technology, 2019, 41(4): 973–979. doi: 10.11999/JEIT180522 LI Defang, HONG Peilin, XUE Kaiping, et al. Virtual network function placement considering resource optimization and SFC requests in cloud datacenter[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(7): 1664–1677. doi: 10.1109/TPDS.2018.2802518 PEI Jianing, HONG Peilin, and LI Defang. Virtual network function selection and chaining based on deep learning in SDN and NFV-Enabled networks[C]. 2018 IEEE International Conference on Communications Workshops, Kansas City, USA, 2018: 1–6. doi: 10.1109/ICCW.2018.8403657. CAI Yibin, WANG Ying, ZHONG Xuxia, et al. An approach to deploy service function chains in satellite networks[C]. NOMS 2018–2018 IEEE/IFIP Network Operations and Management Symposium, Taipei, China, 2018: 1–7. doi: 10.1109/NOMS.2018.8406159. QU Long, ASSI C, and SHABAN K. Delay-aware scheduling and resource optimization with network function virtualization[J]. IEEE Transactions on Communications, 2016, 64(9): 3746–3758. doi: 10.1109/TCOMM.2016.2580150 陳前斌, 楊友超, 周鈺, 等. 基于隨機(jī)學(xué)習(xí)的接入網(wǎng)服務(wù)功能鏈部署算法[J]. 電子與信息學(xué)報(bào), 2019, 41(2): 417–423. doi: 10.11999/JEIT180310CHEN Qianbin, YANG Youchao, ZHOU Yu, et al. Deployment algorithm of service function chain of access network based on stochastic learning[J]. Journal of Electronics &Information Technology, 2019, 41(2): 417–423. doi: 10.11999/JEIT180310 PHAN T V, BAO N K, KIM Y, et al. Optimizing resource allocation for elastic security VNFs in the SDNFV-enabled cloud computing[C]. 2017 International Conference on Information Networking, Da Nang, Vietnam, 2017: 163–166. doi: 10.1109/ICOIN.2017.7899497. XIA Weiwei and SHEN Lianfeng. Joint resource allocation using evolutionary algorithms in heterogeneous mobile cloud computing networks[J]. China Communications, 2018, 15(8): 189–204. doi: 10.1109/CC.2018.8438283 ZHU Zhengfa, PENG Jun, GU Xin, et al. Fair resource allocation for system throughput maximization in mobile edge computing[J]. IEEE Access, 2018, 6: 5332–5340. doi: 10.1109/ACCESS.2018.2790963 MAO Yuyi, ZHANG Jun, and LETAIEF K B. Dynamic computation offloading for mobile-edge computing with energy harvesting devices[J]. IEEE Journal on Selected Areas in Communications, 2016, 34(12): 3590–3605. doi: 10.1109/JSAC.2016.2611964 MEHRAGHDAM S, KELLER M, and KARL H. Specifying and placing chains of virtual network functions[C]. The 3rd IEEE International Conference on Cloud Networking, Luxembourg, Luxembourg, 2014: 7–13. doi: 10.1109/CloudNet.2014.6968961. HAGHIGHI A A, HEYDARI S S, and SHAHBAZPANAHI S. MDP modeling of resource provisioning in virtualized content-delivery networks[C]. The 25th IEEE International Conference on Network Protocols, Toronto, Canada, 2017: 1–6. doi: 10.1109/ICNP.2017.8117600. GRONDMAN I, BUSONIU L, LOPES G A D, et al. A survey of actor-critic reinforcement learning: Standard and natural policy gradients[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , 2012, 42(6): 1291–1307. doi: 10.1109/TSMCC.2012.2218595 LEE D H and LEE J J. Incremental receptive field weighted actor-critic[J]. IEEE Transactions on Industrial Informatics, 2013, 9(1): 62–71. doi: 10.1109/TII.2012.2209660 LI Rongpeng, ZHAO Zhifeng, CHEN Xianfu, et al. TACT: A transfer actor-critic learning framework for energy saving in cellular radio access networks[J]. IEEE Transactions on Wireless Communications, 2014, 13(4): 2000–2011. doi: 10.1109/TWC.2014.022014.130840 KOUSHI A M, HU Fei, and KUMAR S. Intelligent spectrum management based on transfer actor-critic learning for rateless transmissions in cognitive radio networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(5): 1204–1215. doi: 10.1109/TMC.2017.2744620 -