基于深度強化學習的異構(gòu)云無線接入網(wǎng)自適應無線資源分配算法
doi: 10.11999/JEIT190511 cstr: 32379.14.JEIT190511
-
1.
重慶郵電大學通信與信息工程學院 重慶 400065
-
2.
重慶郵電大學移動通信技術(shù)重點實驗室 重慶 400065
Deep Reinforcement Learning-based Adaptive Wireless Resource Allocation Algorithm for Heterogeneous Cloud Wireless Access Network
-
1.
School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
-
2.
Key Laboratory of Mobile Communication Technology, Chongqing University of Post and Telecommunications, Chongqing 400065, China
-
摘要:
為了滿足無線數(shù)據(jù)流量大幅增長的需求,異構(gòu)云無線接入網(wǎng)(H-CRAN)的資源優(yōu)化仍然是亟待解決的重要問題。該文在H-CRAN下行鏈路場景下,提出一種基于深度強化學習(DRL)的無線資源分配算法。首先,該算法以隊列穩(wěn)定為約束,聯(lián)合優(yōu)化擁塞控制、用戶關(guān)聯(lián)、子載波分配和功率分配,并建立網(wǎng)絡總吞吐量最大化的隨機優(yōu)化模型。其次,考慮到調(diào)度問題的復雜性,DRL算法利用神經(jīng)網(wǎng)絡作為非線性近似函數(shù),高效地解決維度災問題。最后,針對無線網(wǎng)絡環(huán)境的復雜性和動態(tài)多變性,引入遷移學習(TL)算法,利用TL的小樣本學習特性,使得DRL算法在少量樣本的情況下也能獲得最優(yōu)的資源分配策略。此外,TL通過遷移DRL模型的權(quán)重參數(shù),進一步地加快了DRL算法的收斂速度。仿真結(jié)果表明,該文所提算法可以有效地增加網(wǎng)絡吞吐量,提高網(wǎng)絡的穩(wěn)定性。
-
關(guān)鍵詞:
- 異構(gòu)云無線接入網(wǎng)絡 /
- 資源分配 /
- 深度強化學習 /
- 遷移學習
Abstract:In order to meet the demand of the substantial increase of wireless data traffic, the resource optimization of the Heterogeneous Cloud Radio Access Network (H-CRAN) is still an important problem that needs to be solved urgently. In this paper, under the H-CRAN downlink scenario, a wireless resource allocation algorithm based on Deep Reinforcement Learning (DRL) is proposed. Firstly, a stochastic optimization model for maximizing the total network throughput is established to jointly optimize the congestion control, the user association, subcarrier allocation and the power allocation under the constraint of queue stability. Secondly, considering the complexity of scheduling problem, the DRL algorithm uses neural network as nonlinear approximate function to solve the dimensional disaster problem efficiently. Finally, considering the complexity and dynamic variability of the wireless network environment, the Transfer Learning(TL) algorithm is introduced to make use of the small sample learning characteristics of TL so that the DRL algorithm can obtain the optimal resource allocation strategy in the case of insufficient samples. In addition, TL further accelerates the convergence rate of DRL algorithm by transferring the weight parameters of DRL model. Simulation results show that the proposed algorithm can effectively increase network throughput and improve network stability.
-
表 1 算法1
算法1:DQN訓練估值網(wǎng)絡參數(shù)算法 (1) 初始化經(jīng)驗回放池 (2) 隨機初始化估值網(wǎng)絡中的參數(shù)$w$,初始化目標網(wǎng)絡中的參數(shù)
${w^ - }$,權(quán)重為${w^ - } = w$(3) For episode $k = 0,1, ···,K - 1$ do (4) 隨機初始化一個狀態(tài)${s_0}$ (5) For $t = 0,1, ···, T - 1$ do (6) 隨機選擇一個概率$p$ (7) if $p \le \varepsilon $ 資源管理器隨機選擇一個動作$a(t)$ (8) else 資源管理器根據(jù)估值網(wǎng)絡選取動作
${a^*}(t) = \arg {\max _a}Q(s,a;w)$(9) 執(zhí)行動作$a(t)$,根據(jù)式(9)得到獎勵值$r(t)$,并觀察下一
個狀態(tài)$s(t + 1)$(10) 將元組$(s(t),a(t),r(t),s(t + 1))$存儲到經(jīng)驗回放池中 (11) 從經(jīng)驗回放池中隨機抽取選取一組樣本
$(s(t),a(t),r(t),s(t + 1))$(12) 通過估值網(wǎng)絡和目標網(wǎng)絡的輸出損失函數(shù),利用式(13),
(14)計算1, 2階矩(13) Adam算法通過式(15),式(16)計算1階矩和2階矩的偏差
修正項(14) 通過神經(jīng)網(wǎng)絡的反向傳播算法,利用式(17)來更新估值
網(wǎng)絡的權(quán)重參數(shù)$w$(15) 每隔$\delta $將估值網(wǎng)絡中的參數(shù)$w$復制給參數(shù)${w^ - }$ (16) End for (17) End for (18) 獲得DQN網(wǎng)絡的最優(yōu)權(quán)重參數(shù)$w$ 下載: 導出CSV
表 2 算法2
算法2:基于TLDQN的策略知識遷移算法 (1) 初始化: (2) 源基站的DQN參數(shù)$w$,策略網(wǎng)絡溫度參數(shù)$T$,目標網(wǎng)絡
的DQN參數(shù)$w'$(3) For 對于每個狀態(tài)$s \in {{S}}$,源基站的動作$\overline a $,目標基站可能采
取的動作$a$ do(4) 執(zhí)行算法1,得到估值網(wǎng)絡的參數(shù)$w$,以及輸出層對應的
$Q$值函數(shù)(5) 根據(jù)式(18)將源基站上的$Q$值函數(shù)轉(zhuǎn)化為策略網(wǎng)絡
${ {\pi} _i}(\overline a \left| s \right.)$(6) 根據(jù)式(19)將目標基站上的$Q$值函數(shù)轉(zhuǎn)化為策略網(wǎng)絡
${ {\pi} _{\rm{TG} } }(a\left| s \right.)$(7) 利用式(20)構(gòu)建策略模仿?lián)p失的交叉熵$H(w)$ (8) 根據(jù)式(21)進行交叉熵的迭代更新,再進行策略模仿的偏
導數(shù)的計算。(9) 直至目標基站選取出的策略達到
${Q_{\rm{TG}}}(s,a) \to {Q^*}_{\rm{TG}}(s,a)$(10) End for (11) 目標基站獲得對應的網(wǎng)絡參數(shù)$w'$ (12) 執(zhí)行算法1,目標基站得到最優(yōu)資源分配策略 下載: 導出CSV
-
MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236 SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484–489. doi: 10.1038/nature16961 ZHANG Haijun, LIU Hao, CHENG Julian, et al. Downlink energy efficiency of power allocation and wireless backhaul bandwidth allocation in heterogeneous small cell networks[J]. IEEE Transactions on Communications, 2018, 66(4): 1705–1716. doi: 10.1109/TCOMM.2017.2763623 ZHANG Yuan, WANG Ying, and ZHANG Weidong. Energy efficient resource allocation for heterogeneous cloud radio access networks with user cooperation and QoS guarantees[C]. 2016 IEEE Wireless Communications and Networking Conference, Doha, Qatar, 2016: 1–6. doi: 10.1109/WCNC.2016.7565103. HE Ying, ZHANG Zheng, YU F R, et al. Deep-reinforcement-learning-based optimization for cache-enabled opportunistic interference alignment wireless networks[J]. IEEE Transactions on Vehicular Technology, 2017, 66(11): 10433–10445. doi: 10.1109/TVT.2017.2751641 唐倫, 魏延南, 馬潤琳, 等. 虛擬化云無線接入網(wǎng)絡下基于在線學習的網(wǎng)絡切片虛擬資源分配算法[J]. 電子與信息學報, 2019, 41(7): 1533–1539. doi: 10.11999/JEIT180771TANG Lun, WEI Yannan, MA Runlin, et al. Online learning-based virtual resource allocation for network slicing in virtualized cloud radio access network[J]. Journal of Electronics &Information Technology, 2019, 41(7): 1533–1539. doi: 10.11999/JEIT180771 LI Jian, PENG Mugen, YU Yuling, et al. Energy-efficient joint congestion control and resource optimization in heterogeneous cloud radio access networks[J]. IEEE Transactions on Vehicular Technology, 2016, 65(12): 9873–9887. doi: 10.1109/TVT.2016.2531184 NEELY M J. Stochastic network optimization with application to communication and queueing systems[J]. Synthesis Lectures on Communication Networks, 2010, 3(1): 1–211. doi: 10.2200/S00271ED1V01Y201006CNT007 KUMAR N, SWAIN S N, and MURTHY C S R. A novel distributed Q-learning based resource reservation framework for facilitating D2D content access requests in LTE-A networks[J]. IEEE Transactions on Network and Service Management, 2018, 15(2): 718–731. doi: 10.1109/TNSM.2018.2807594 SAAD H, MOHAMED A, and ELBATT T. A cooperative Q-learning approach for distributed resource allocation in multi-user femtocell networks[C]. 2014 IEEE Wireless Communications and Networking Conference, Istanbul, Turkey, 2014: 1490–1495. doi: 10.1109/WCNC.2014.6952410. PAN S J and YANG Qiang. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345–1359. doi: 10.1109/TKDE.2009.191 SUN Yaohua, PENG Mugen, and MAO Shiwen. Deep reinforcement learning-based mode selection and resource management for green fog radio access networks[J]. IEEE Internet of Things Journal, 2019, 6(2): 1960–1971. doi: 10.1109/JIOT.2018.2871020 PAN Jie, WANG Xuesong, CHENG Yuhu, et al. Multisource transfer double DQN based on actor learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(6): 2227–2238. doi: 10.1109/TNNLS.2018.2806087 ALQERM I and SHIHADA B. Sophisticated online learning scheme for green resource allocation in 5G heterogeneous cloud radio access networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(10): 2423–2437. doi: 10.1109/TMC.2018.2797166 LI Yan, LIU Lingjia, LI Hongxiang, et al. Resource allocation for delay-sensitive traffic over LTE-Advanced relay networks[J]. IEEE Transactions on Wireless Communications, 2015, 14(8): 4291–4303. doi: 10.1109/TWC.2015.2418991 -