基于遷移演員-評(píng)論家學(xué)習(xí)的服務(wù)功能鏈部署算法

唐倫; 賀小雨; 王曉; 陳前斌

doi:10.11999/JEIT190542

基于遷移演員-評(píng)論家學(xué)習(xí)的服務(wù)功能鏈部署算法

doi: 10.11999/JEIT190542 cstr: 32379.14.JEIT190542

1.
重慶郵電大學(xué)通信與信息工程學(xué)院重慶 400065
2.
重慶郵電大學(xué)移動(dòng)通信技術(shù)重點(diǎn)實(shí)驗(yàn)室重慶 400065

基金項(xiàng)目: 國家自然科學(xué)基金(61571073)，重慶市教委科學(xué)技術(shù)研究項(xiàng)目(KJZD-M20180601)

詳細(xì)信息

作者簡(jiǎn)介:
唐倫：男，1973年生，教授，博士生導(dǎo)師，主要研究方向?yàn)樾乱淮鸁o線通信網(wǎng)絡(luò)、異構(gòu)蜂窩網(wǎng)絡(luò)等

賀小雨：女，1995年生，碩士生，研究方向?yàn)榫W(wǎng)絡(luò)切片資源分配和強(qiáng)化學(xué)習(xí)

王曉：男，1995年生，碩士生，研究方向?yàn)榫W(wǎng)絡(luò)切片資源優(yōu)化和機(jī)器學(xué)習(xí)

陳前斌：男，1967年生，教授，博士生導(dǎo)師，主要研究方向?yàn)閭€(gè)人通信、多媒體信息處理與傳輸、下一代移動(dòng)通信網(wǎng)絡(luò)、異構(gòu)蜂窩網(wǎng)絡(luò)等

通訊作者:
賀小雨　Hexy1995@163.com

中圖分類號(hào): TN915
計(jì)量
- 文章訪問數(shù): 2749
- HTML全文瀏覽量: 1458
- PDF下載量: 106
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2019-07-18
- 修回日期: 2020-03-07
- 網(wǎng)絡(luò)出版日期: 2020-04-08
- 刊出日期: 2020-11-16

Deployment Algorithm of Service Function Chain Based on Transfer Actor-Critic Learning

1.
School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2.
Key Laboratory of Mobile Communication, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Funds: The National Natural Science Foundation of China (61571073), The Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-M20180601)

摘要

摘要: 針對(duì)5G網(wǎng)絡(luò)切片環(huán)境下由于業(yè)務(wù)請(qǐng)求的隨機(jī)性和未知性導(dǎo)致的資源分配不合理從而引起的系統(tǒng)高時(shí)延問題，該文提出了一種基于遷移演員-評(píng)論家(A-C)學(xué)習(xí)的服務(wù)功能鏈(SFC)部署算法(TACA)。首先，該算法建立基于虛擬網(wǎng)絡(luò)功能放置、計(jì)算資源、鏈路帶寬資源和前傳網(wǎng)絡(luò)資源聯(lián)合分配的端到端時(shí)延最小化模型，并將其轉(zhuǎn)化為離散時(shí)間馬爾可夫決策過程(MDP)。而后，在該MDP中采用A-C學(xué)習(xí)算法與環(huán)境進(jìn)行不斷交互動(dòng)態(tài)調(diào)整SFC部署策略，優(yōu)化端到端時(shí)延。進(jìn)一步，為了實(shí)現(xiàn)并加速該A-C算法在其他相似目標(biāo)任務(wù)中(如業(yè)務(wù)請(qǐng)求到達(dá)率普遍更高)的收斂過程，采用遷移A-C學(xué)習(xí)算法實(shí)現(xiàn)利用源任務(wù)學(xué)習(xí)的SFC部署知識(shí)快速尋找目標(biāo)任務(wù)中的部署策略。仿真結(jié)果表明，該文所提算法能夠減小且穩(wěn)定SFC業(yè)務(wù)數(shù)據(jù)包的隊(duì)列積壓，優(yōu)化系統(tǒng)端到端時(shí)延，并提高資源利用率。
- 網(wǎng)絡(luò)切片 /
- 服務(wù)功能鏈部署 /
- 馬爾可夫決策過程 /
- 演員-評(píng)論家學(xué)習(xí) /
- 遷移學(xué)習(xí)
Abstract: To solve the problem of high system delay caused by unreasonable resource allocation because of randomness and unpredictability of service requests in 5G network slicing, this paper proposes a deployment scheme of Service Function Chain (SFC) based on Transfer Actor-Critic (A-C) Algorithm (TACA). Firstly, an end-to-end delay minimization model is built based on Virtual Network Function (VNF) placement, and joint allocation of computing resources, link resources and fronthaul bandwidth resources, then the model is transformed into a discrete-time Markov Decision Process (MDP). Next, A-C learning algorithm is adopted in the MDP to adjust dynamically SFC deployment scheme by interacting with environment, so as to optimize the end-to-end delay. Furthermore, in order to realize and accelerate the convergence of the A-C algorithm in similar target tasks (such as the arrival rate of service requests is generally higher), the transfer A-C algorithm is adopted to utilize the SFC deployment knowledge learned from source tasks to find quickly the deployment strategy in target tasks. Simulation results show that the proposed algorithm can reduce and stabilize the queuing length of SFC packets, optimize the system end-to-end delay, and improve resource utilization.
- Network slice /
- Service Function Chain (SFC) deployment /
- Markov Decision Process (MDP) /
- Actor-Critic (A-C) learning /
- Transfer learning

HTML全文

圖 1 系統(tǒng)架構(gòu)

下載: 全尺寸圖片幻燈片

圖 2 A-C學(xué)習(xí)框架

下載: 全尺寸圖片幻燈片

圖 3 不同演員學(xué)習(xí)率A-C算法的收斂性

下載: 全尺寸圖片幻燈片

圖 4 不同評(píng)論家學(xué)習(xí)率A-C算法的收斂性

下載: 全尺寸圖片幻燈片

圖 5 基于不同優(yōu)化器的A-C算法的收斂性

下載: 全尺寸圖片幻燈片

圖 6 3種切片的數(shù)據(jù)包到達(dá)率與隊(duì)列積壓和變化對(duì)照?qǐng)D

下載: 全尺寸圖片幻燈片

圖 7 3個(gè)切片的VNF放置方式選擇統(tǒng)計(jì)圖

下載: 全尺寸圖片幻燈片

圖 8 不同算法的系統(tǒng)收斂時(shí)延

下載: 全尺寸圖片幻燈片

圖 9 不同算法的資源利用率

下載: 全尺寸圖片幻燈片

圖 10 不同遷移率因子的TACA算法收斂過程

下載: 全尺寸圖片幻燈片

表 1 基于遷移A-C學(xué)習(xí)的SFC部署算法

輸入：高斯策略${ {\pi} _\theta }(s,a)\sim N(\mu (s),{\sigma ^2})$，以及其梯度${{\text{?}} _\theta }\ln { {\pi} _\theta }(s,a)$，狀態(tài)分布${d^{\pi} }(s)$，學(xué)習(xí)率${\varepsilon _{a,t}}$和${\varepsilon _{c,t}}$，折扣因子$\beta $
(1) for ${\rm{epsoide } }= 0,1,2, ··· ,E{p_{\max} }$ do
(2) 初始化：策略參數(shù)向量${{{\theta }}_t}$，狀態(tài)-動(dòng)作值函數(shù)參數(shù)向量${\omega _t}$，狀態(tài)值函數(shù)參數(shù)向量${{{\upsilon}} _t}$，初始狀態(tài)${s_0}\sim{d_{\pi} }(s)$，本地部署策略${\pi} _\theta ^n(s,a)$，外　　來遷移部署策略${\pi} _\theta ^e(s,a)$
(3) for 回合每一步$t = 0,1, ··· ,T$do
(4) 由式(20)得到整體部署策略，遵循整體策略${ {\pi} _\theta }(s,a)$選擇動(dòng)作${a^{(t)}}$，進(jìn)行VNF放置和資源分配，而后更新環(huán)境狀態(tài)${s^{(t + 1)}}$，并得到立即　　獎(jiǎng)勵(lì)${R_t} = - \tau (t)$
(5) end for
(6) 評(píng)論家過程：
(a) 計(jì)算相容特征：由式(10)得處于狀態(tài)$s$的基函數(shù)向量，結(jié)合式(14)，式(15)得相容特征
(b) 相容近似：由式(11)得狀態(tài)-動(dòng)作值函數(shù)近似，由式(16)得狀態(tài)值函數(shù)近似
(c) TD誤差計(jì)算：由式(12)，式(17)分別得狀態(tài)-動(dòng)作值函數(shù)、狀態(tài)值函數(shù)的TD誤差
(d) 更新評(píng)論家參數(shù)：由式(13)得狀態(tài)-動(dòng)作值函數(shù)參數(shù)向量更新，由式(18)得狀態(tài)值函數(shù)參數(shù)向量更新
(7) 演員過程：
(a) 計(jì)算優(yōu)勢(shì)函數(shù)
(b) 重寫策略梯度：代入優(yōu)勢(shì)函數(shù)由式(19)得策略梯度
(c) 更新演員參數(shù)：由式(8)得策略參數(shù)向量更新
(8) end for

下載: 導(dǎo)出CSV

參考文獻(xiàn)(17)

AGARWAL S, MALANDRINO F, CHIASSERINI C F, et al. VNF placement and resource allocation for the support of vertical services in 5G networks[J]. IEEE/ACM Transactions on Networking, 2019, 27(1): 433–446. doi: 10.1109/TNET.2018.2890631

史久根, 張徑, 徐皓, 等. 一種面向運(yùn)營成本優(yōu)化的虛擬網(wǎng)絡(luò)功能部署和路由分配策略[J]. 電子與信息學(xué)報(bào), 2019, 41(4): 973–979. doi: 10.11999/JEIT180522

SHI Jiugen, ZHANG Jing, XU Hao, et al. Joint optimization of virtualized network function placement and routing allocation for operational expenditure[J]. Journal of Electronics &Information Technology, 2019, 41(4): 973–979. doi: 10.11999/JEIT180522

LI Defang, HONG Peilin, XUE Kaiping, et al. Virtual network function placement considering resource optimization and SFC requests in cloud datacenter[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(7): 1664–1677. doi: 10.1109/TPDS.2018.2802518

PEI Jianing, HONG Peilin, and LI Defang. Virtual network function selection and chaining based on deep learning in SDN and NFV-Enabled networks[C]. 2018 IEEE International Conference on Communications Workshops, Kansas City, USA, 2018: 1–6. doi: 10.1109/ICCW.2018.8403657.

CAI Yibin, WANG Ying, ZHONG Xuxia, et al. An approach to deploy service function chains in satellite networks[C]. NOMS 2018–2018 IEEE/IFIP Network Operations and Management Symposium, Taipei, China, 2018: 1–7. doi: 10.1109/NOMS.2018.8406159.

QU Long, ASSI C, and SHABAN K. Delay-aware scheduling and resource optimization with network function virtualization[J]. IEEE Transactions on Communications, 2016, 64(9): 3746–3758. doi: 10.1109/TCOMM.2016.2580150

陳前斌, 楊友超, 周鈺, 等. 基于隨機(jī)學(xué)習(xí)的接入網(wǎng)服務(wù)功能鏈部署算法[J]. 電子與信息學(xué)報(bào), 2019, 41(2): 417–423. doi: 10.11999/JEIT180310

CHEN Qianbin, YANG Youchao, ZHOU Yu, et al. Deployment algorithm of service function chain of access network based on stochastic learning[J]. Journal of Electronics &Information Technology, 2019, 41(2): 417–423. doi: 10.11999/JEIT180310

PHAN T V, BAO N K, KIM Y, et al. Optimizing resource allocation for elastic security VNFs in the SDNFV-enabled cloud computing[C]. 2017 International Conference on Information Networking, Da Nang, Vietnam, 2017: 163–166. doi: 10.1109/ICOIN.2017.7899497.

XIA Weiwei and SHEN Lianfeng. Joint resource allocation using evolutionary algorithms in heterogeneous mobile cloud computing networks[J]. China Communications, 2018, 15(8): 189–204. doi: 10.1109/CC.2018.8438283

ZHU Zhengfa, PENG Jun, GU Xin, et al. Fair resource allocation for system throughput maximization in mobile edge computing[J]. IEEE Access, 2018, 6: 5332–5340. doi: 10.1109/ACCESS.2018.2790963

MAO Yuyi, ZHANG Jun, and LETAIEF K B. Dynamic computation offloading for mobile-edge computing with energy harvesting devices[J]. IEEE Journal on Selected Areas in Communications, 2016, 34(12): 3590–3605. doi: 10.1109/JSAC.2016.2611964

MEHRAGHDAM S, KELLER M, and KARL H. Specifying and placing chains of virtual network functions[C]. The 3rd IEEE International Conference on Cloud Networking, Luxembourg, Luxembourg, 2014: 7–13. doi: 10.1109/CloudNet.2014.6968961.

HAGHIGHI A A, HEYDARI S S, and SHAHBAZPANAHI S. MDP modeling of resource provisioning in virtualized content-delivery networks[C]. The 25th IEEE International Conference on Network Protocols, Toronto, Canada, 2017: 1–6. doi: 10.1109/ICNP.2017.8117600.

GRONDMAN I, BUSONIU L, LOPES G A D, et al. A survey of actor-critic reinforcement learning: Standard and natural policy gradients[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , 2012, 42(6): 1291–1307. doi: 10.1109/TSMCC.2012.2218595

LEE D H and LEE J J. Incremental receptive field weighted actor-critic[J]. IEEE Transactions on Industrial Informatics, 2013, 9(1): 62–71. doi: 10.1109/TII.2012.2209660

LI Rongpeng, ZHAO Zhifeng, CHEN Xianfu, et al. TACT: A transfer actor-critic learning framework for energy saving in cellular radio access networks[J]. IEEE Transactions on Wireless Communications, 2014, 13(4): 2000–2011. doi: 10.1109/TWC.2014.022014.130840

KOUSHI A M, HU Fei, and KUMAR S. Intelligent spectrum management based on transfer actor-critic learning for rateless transmissions in cognitive radio networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(5): 1204–1215. doi: 10.1109/TMC.2017.2744620

相關(guān)文章

施引文獻(xiàn)

資源附件(0)

訪問統(tǒng)計(jì)