基于深度強化學習的異構(gòu)云無線接入網(wǎng)自適應無線資源分配算法

陳前斌; 管令進; 李子煜; 王兆堃; 楊恒; 唐倫

doi:10.11999/JEIT190511

基于深度強化學習的異構(gòu)云無線接入網(wǎng)自適應無線資源分配算法

doi: 10.11999/JEIT190511 cstr: 32379.14.JEIT190511

1.
重慶郵電大學通信與信息工程學院重慶 400065
2.
重慶郵電大學移動通信技術(shù)重點實驗室重慶 400065

基金項目: 國家自然科學基金(6157073)，重慶市教委科學技術(shù)研究項目(KJZD-M201800601)

詳細信息

作者簡介:
陳前斌：男，1967年生，教授，博士生導師，研究方向為個人通信、多媒體信息處理與傳輸、下一代移動通信網(wǎng)絡等

管令進：男，1995年生，碩士生，研究方向為網(wǎng)絡功能虛擬化、無線資源分配、機器學習

李子煜：女，1995年生，碩士生，研究方向為資源分配、機器學習

王兆堃：男，1995年生，碩士生，研究方向為5G網(wǎng)絡故障檢測、自愈合、機器學習

楊恒：男，1993年生，碩士生，研究方向為網(wǎng)絡切片及虛擬網(wǎng)絡資源分配

唐倫：男，1973年生，教授，博士生導師，研究方向為新一代無線通信網(wǎng)絡、異構(gòu)蜂窩網(wǎng)絡、軟件定義無線網(wǎng)絡等

通訊作者:
管令進　1633634329@qq.com

中圖分類號: TN929.5
計量
- 文章訪問數(shù): 3993
- HTML全文瀏覽量: 3214
- PDF下載量: 209
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2019-07-08
- 修回日期: 2020-03-09
- 網(wǎng)絡出版日期: 2020-04-15
- 刊出日期: 2020-06-22

Deep Reinforcement Learning-based Adaptive Wireless Resource Allocation Algorithm for Heterogeneous Cloud Wireless Access Network

1.
School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
2.
Key Laboratory of Mobile Communication Technology, Chongqing University of Post and Telecommunications, Chongqing 400065, China

Funds: The National Natural Science Foundation of China (61571073), The Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-M201800601)

摘要

摘要:
為了滿足無線數(shù)據(jù)流量大幅增長的需求，異構(gòu)云無線接入網(wǎng)(H-CRAN)的資源優(yōu)化仍然是亟待解決的重要問題。該文在H-CRAN下行鏈路場景下，提出一種基于深度強化學習(DRL)的無線資源分配算法。首先，該算法以隊列穩(wěn)定為約束，聯(lián)合優(yōu)化擁塞控制、用戶關(guān)聯(lián)、子載波分配和功率分配，并建立網(wǎng)絡總吞吐量最大化的隨機優(yōu)化模型。其次，考慮到調(diào)度問題的復雜性，DRL算法利用神經(jīng)網(wǎng)絡作為非線性近似函數(shù)，高效地解決維度災問題。最后，針對無線網(wǎng)絡環(huán)境的復雜性和動態(tài)多變性，引入遷移學習(TL)算法，利用TL的小樣本學習特性，使得DRL算法在少量樣本的情況下也能獲得最優(yōu)的資源分配策略。此外，TL通過遷移DRL模型的權(quán)重參數(shù)，進一步地加快了DRL算法的收斂速度。仿真結(jié)果表明，該文所提算法可以有效地增加網(wǎng)絡吞吐量，提高網(wǎng)絡的穩(wěn)定性。
- 異構(gòu)云無線接入網(wǎng)絡 /
- 資源分配 /
- 深度強化學習 /
- 遷移學習
Abstract:
In order to meet the demand of the substantial increase of wireless data traffic, the resource optimization of the Heterogeneous Cloud Radio Access Network (H-CRAN) is still an important problem that needs to be solved urgently. In this paper, under the H-CRAN downlink scenario, a wireless resource allocation algorithm based on Deep Reinforcement Learning (DRL) is proposed. Firstly, a stochastic optimization model for maximizing the total network throughput is established to jointly optimize the congestion control, the user association, subcarrier allocation and the power allocation under the constraint of queue stability. Secondly, considering the complexity of scheduling problem, the DRL algorithm uses neural network as nonlinear approximate function to solve the dimensional disaster problem efficiently. Finally, considering the complexity and dynamic variability of the wireless network environment, the Transfer Learning(TL) algorithm is introduced to make use of the small sample learning characteristics of TL so that the DRL algorithm can obtain the optimal resource allocation strategy in the case of insufficient samples. In addition, TL further accelerates the convergence rate of DRL algorithm by transferring the weight parameters of DRL model. Simulation results show that the proposed algorithm can effectively increase network throughput and improve network stability.
- Heterogeneous Cloud Radio Access Networks(H-CRAN) /
- Resource allocation /
- Deep Reinforcement Learning(DRL) /
- Transfer Learning(TL)

HTML全文

圖 1 H-CRAN下行傳輸場景

下載: 全尺寸圖片幻燈片

圖 2 系統(tǒng)架構(gòu)

下載: 全尺寸圖片幻燈片

圖 3 DQN算法框圖

下載: 全尺寸圖片幻燈片

圖 4 遷移學習場景圖

下載: 全尺寸圖片幻燈片

圖 5 DQN中不同優(yōu)化器下的網(wǎng)絡總吞吐量

下載: 全尺寸圖片幻燈片

圖 6 不同到達率下的平均隊列長度

下載: 全尺寸圖片幻燈片

圖 7 網(wǎng)絡用戶數(shù)的總吞吐量

下載: 全尺寸圖片幻燈片

圖 8 網(wǎng)絡的平均隊列時延

下載: 全尺寸圖片幻燈片

圖 9 遷移學習下的平均隊列長度

下載: 全尺寸圖片幻燈片

圖 10 遷移學習下的損失函數(shù)

下載: 全尺寸圖片幻燈片

表 1 算法1

算法1：DQN訓練估值網(wǎng)絡參數(shù)算法
(1) 初始化經(jīng)驗回放池
(2) 隨機初始化估值網(wǎng)絡中的參數(shù)$w$，初始化目標網(wǎng)絡中的參數(shù) 　　 ${w^ - }$，權(quán)重為${w^ - } = w$
(3) For episode $k = 0,1, ···,K - 1$ do
(4) 　隨機初始化一個狀態(tài)${s_0}$
(5) 　For $t = 0,1, ···, T - 1$ do
(6) 　　隨機選擇一個概率$p$
(7) 　　 if $p \le \varepsilon $ 資源管理器隨機選擇一個動作$a(t)$
(8) 　　 else 資源管理器根據(jù)估值網(wǎng)絡選取動作　　　　 ${a^*}(t) = \arg {\max _a}Q(s,a;w)$
(9) 　　執(zhí)行動作$a(t)$，根據(jù)式(9)得到獎勵值$r(t)$，并觀察下一　　　　個狀態(tài)$s(t + 1)$
(10) 　　將元組$(s(t),a(t),r(t),s(t + 1))$存儲到經(jīng)驗回放池中
(11) 　　從經(jīng)驗回放池中隨機抽取選取一組樣本　　　　　 $(s(t),a(t),r(t),s(t + 1))$
(12) 　　通過估值網(wǎng)絡和目標網(wǎng)絡的輸出損失函數(shù)，利用式(13), 　　　　　 (14)計算1, 2階矩
(13) 　　Adam算法通過式(15)，式(16)計算1階矩和2階矩的偏差　　　　　修正項
(14) 　　通過神經(jīng)網(wǎng)絡的反向傳播算法，利用式(17)來更新估值　　　　　網(wǎng)絡的權(quán)重參數(shù)$w$
(15) 　　每隔$\delta $將估值網(wǎng)絡中的參數(shù)$w$復制給參數(shù)${w^ - }$
(16) End for
(17) End for
(18) 獲得DQN網(wǎng)絡的最優(yōu)權(quán)重參數(shù)$w$

下載: 導出CSV

表 2 算法2

算法2：基于TLDQN的策略知識遷移算法
(1) 初始化：
(2) 　　源基站的DQN參數(shù)$w$，策略網(wǎng)絡溫度參數(shù)$T$，目標網(wǎng)絡　　　　的DQN參數(shù)$w'$
(3) For 對于每個狀態(tài)$s \in {{S}}$，源基站的動作$\overline a $，目標基站可能采　　取的動作$a$ do
(4) 　　執(zhí)行算法1，得到估值網(wǎng)絡的參數(shù)$w$，以及輸出層對應的　　　　 $Q$值函數(shù)
(5) 　　根據(jù)式(18)將源基站上的$Q$值函數(shù)轉(zhuǎn)化為策略網(wǎng)絡　　　　 ${ {\pi} _i}(\overline a \left\| s \right.)$
(6) 　　根據(jù)式(19)將目標基站上的$Q$值函數(shù)轉(zhuǎn)化為策略網(wǎng)絡　　　　 ${ {\pi} _{\rm{TG} } }(a\left\| s \right.)$
(7) 　　利用式(20)構(gòu)建策略模仿?lián)p失的交叉熵$H(w)$
(8) 　　根據(jù)式(21)進行交叉熵的迭代更新,再進行策略模仿的偏　　　　導數(shù)的計算。
(9) 　　直至目標基站選取出的策略達到　　　　 ${Q_{\rm{TG}}}(s,a) \to {Q^*}_{\rm{TG}}(s,a)$
(10) End for
(11) 目標基站獲得對應的網(wǎng)絡參數(shù)$w'$
(12) 執(zhí)行算法1，目標基站得到最優(yōu)資源分配策略

下載: 導出CSV

參考文獻(15)

MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533. doi: 10.1038/nature14236

SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484–489. doi: 10.1038/nature16961

ZHANG Haijun, LIU Hao, CHENG Julian, et al. Downlink energy efficiency of power allocation and wireless backhaul bandwidth allocation in heterogeneous small cell networks[J]. IEEE Transactions on Communications, 2018, 66(4): 1705–1716. doi: 10.1109/TCOMM.2017.2763623

ZHANG Yuan, WANG Ying, and ZHANG Weidong. Energy efficient resource allocation for heterogeneous cloud radio access networks with user cooperation and QoS guarantees[C]. 2016 IEEE Wireless Communications and Networking Conference, Doha, Qatar, 2016: 1–6. doi: 10.1109/WCNC.2016.7565103.

HE Ying, ZHANG Zheng, YU F R, et al. Deep-reinforcement-learning-based optimization for cache-enabled opportunistic interference alignment wireless networks[J]. IEEE Transactions on Vehicular Technology, 2017, 66(11): 10433–10445. doi: 10.1109/TVT.2017.2751641

唐倫, 魏延南, 馬潤琳, 等. 虛擬化云無線接入網(wǎng)絡下基于在線學習的網(wǎng)絡切片虛擬資源分配算法[J]. 電子與信息學報, 2019, 41(7): 1533–1539. doi: 10.11999/JEIT180771

TANG Lun, WEI Yannan, MA Runlin, et al. Online learning-based virtual resource allocation for network slicing in virtualized cloud radio access network[J]. Journal of Electronics &Information Technology, 2019, 41(7): 1533–1539. doi: 10.11999/JEIT180771

LI Jian, PENG Mugen, YU Yuling, et al. Energy-efficient joint congestion control and resource optimization in heterogeneous cloud radio access networks[J]. IEEE Transactions on Vehicular Technology, 2016, 65(12): 9873–9887. doi: 10.1109/TVT.2016.2531184

NEELY M J. Stochastic network optimization with application to communication and queueing systems[J]. Synthesis Lectures on Communication Networks, 2010, 3(1): 1–211. doi: 10.2200/S00271ED1V01Y201006CNT007

KUMAR N, SWAIN S N, and MURTHY C S R. A novel distributed Q-learning based resource reservation framework for facilitating D2D content access requests in LTE-A networks[J]. IEEE Transactions on Network and Service Management, 2018, 15(2): 718–731. doi: 10.1109/TNSM.2018.2807594

SAAD H, MOHAMED A, and ELBATT T. A cooperative Q-learning approach for distributed resource allocation in multi-user femtocell networks[C]. 2014 IEEE Wireless Communications and Networking Conference, Istanbul, Turkey, 2014: 1490–1495. doi: 10.1109/WCNC.2014.6952410.

PAN S J and YANG Qiang. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10): 1345–1359. doi: 10.1109/TKDE.2009.191

SUN Yaohua, PENG Mugen, and MAO Shiwen. Deep reinforcement learning-based mode selection and resource management for green fog radio access networks[J]. IEEE Internet of Things Journal, 2019, 6(2): 1960–1971. doi: 10.1109/JIOT.2018.2871020

PAN Jie, WANG Xuesong, CHENG Yuhu, et al. Multisource transfer double DQN based on actor learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(6): 2227–2238. doi: 10.1109/TNNLS.2018.2806087

ALQERM I and SHIHADA B. Sophisticated online learning scheme for green resource allocation in 5G heterogeneous cloud radio access networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(10): 2423–2437. doi: 10.1109/TMC.2018.2797166

LI Yan, LIU Lingjia, LI Hongxiang, et al. Resource allocation for delay-sensitive traffic over LTE-Advanced relay networks[J]. IEEE Transactions on Wireless Communications, 2015, 14(8): 4291–4303. doi: 10.1109/TWC.2015.2418991

相關(guān)文章

施引文獻

資源附件(0)

訪問統(tǒng)計