異構(gòu)云無(wú)線接入網(wǎng)架構(gòu)下面向混合能源供應(yīng)的動(dòng)態(tài)資源分配及能源管理算法
doi: 10.11999/JEIT190499 cstr: 32379.14.JEIT190499
-
1.
重慶郵電大學(xué)通信與信息工程學(xué)院 重慶 400065
-
2.
重慶郵電大學(xué)移動(dòng)通信技術(shù)重點(diǎn)實(shí)驗(yàn)室 重慶 400065
Dynamic Resource Allocation and Energy Management Algorithm for Hybrid Energy Supply in Heterogeneous Cloud Radio Access Networks
-
1.
School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
-
2.
Key Laboratory of Mobile Communications Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
-
摘要:
針對(duì)面向混合能源供應(yīng)的 5G 異構(gòu)云無(wú)線接入網(wǎng)(H-CRANs)網(wǎng)絡(luò)架構(gòu)下的動(dòng)態(tài)資源分配和能源管理問(wèn)題,該文提出一種基于深度強(qiáng)化學(xué)習(xí)的動(dòng)態(tài)網(wǎng)絡(luò)資源分配及能源管理算法。首先,由于可再生能源到達(dá)的波動(dòng)性及用戶數(shù)據(jù)業(yè)務(wù)到達(dá)的隨機(jī)性,同時(shí)考慮到系統(tǒng)的穩(wěn)定性、能源的可持續(xù)性以及用戶的服務(wù)質(zhì)量(QoS)需求,將H-CRANs網(wǎng)絡(luò)下的資源分配以及能源管理問(wèn)題建立為一個(gè)以最大化服務(wù)提供商平均凈收益為目標(biāo)的受限無(wú)窮時(shí)間馬爾科夫決策過(guò)程(CMDP)。然后,使用拉格朗日乘子法將所提CMDP問(wèn)題轉(zhuǎn)換為一個(gè)非受限的馬爾科夫決策過(guò)程(MDP)問(wèn)題。最后,因?yàn)樾袨榭臻g與狀態(tài)空間都是連續(xù)值集合,因此該文利用深度強(qiáng)化學(xué)習(xí)解決上述MDP問(wèn)題。仿真結(jié)果表明,該文所提算法可有效保證用戶QoS及能量可持續(xù)性的同時(shí),提升了服務(wù)提供商的平均凈收益,降低了能耗。
-
關(guān)鍵詞:
- 異構(gòu)云無(wú)線接入網(wǎng) /
- 混合能源 /
- 資源分配 /
- 能源管理 /
- 深度強(qiáng)化學(xué)習(xí)
Abstract:Considering the dynamic resource allocation and energy management problem in the 5G Heterogeneous Cloud Radio Access Networks(H-CRANs) architecture for hybrid energy supply, a dynamic network resource allocation and energy management algorithm based on deep reinforcement learning is proposed. Firstly, due to the volatility of renewable energy and the randomness of user data service arrival, taking into account the stability of the system, the sustainability of energy and the Quality of Service(QoS) requirements of users, the resource allocation and energy management issues in the H-CRANs network as a Constrained infinite time Markov Decision Process (CMDP) are modeled with the goal of maximizing the average net profit of service providers. Then, the Lagrange multiplier method is used to transform the proposed CMDP problem into an unconstrained Markov Decision Process (MDP) problem. Finally, because the action space and the state space are both continuous value sets, the deep reinforcement learning is used to solve the above MDP problem. The simulation results show that the proposed algorithm can effectively guarantee the QoS and energy sustainability of the system, while improving the average net income of the service provider and reducing energy consumption.
-
表 1 算法流程表
算法1:基于DDPG算法的資源分配與能源管理算法 (1)初始化 隨機(jī)初始化參數(shù)${\theta ^Q}$和${\theta ^\mu }$;初始化目標(biāo)網(wǎng)絡(luò)參數(shù):${\theta ^{Q'}} \leftarrow {\theta ^Q}$,${\theta ^{\mu '}} \leftarrow {\theta ^\mu }$;初始化拉格朗日乘子${\xi _u} \ge 0,\forall u \in { {{U} }_{\rm{RRH} } },{\upsilon _u} \ge 0,$
$\forall u \in { {{U} }_{\rm{MBS} } } $;初始化經(jīng)驗(yàn)回放池D(2)學(xué)習(xí)階段 For episode=1 to M do 初始化一個(gè)隨機(jī)過(guò)程作為行為噪聲${{N}}$,并觀察初始狀態(tài)${x_0}$ For t=1 to T do 根據(jù)${a_t} = \mu ({x_t}|\theta _t^\mu ) + {{N}}$選擇一個(gè)行為 if 約束C3-C13滿足: 執(zhí)行行動(dòng)${a_t}$,并得到回報(bào)值${r_{{t}}}$與下一狀態(tài)${x_{t + 1}}$ 將狀態(tài)轉(zhuǎn)換組$ < {x_t},{a_t},{r_t},{x_{t + 1}} > $存入經(jīng)驗(yàn)回放池D 從經(jīng)驗(yàn)回放池D中隨機(jī)采樣${N_{\rm{D}}}$個(gè)樣本,每個(gè)樣本用i表示 (a) 更新評(píng)判家網(wǎng)絡(luò) 從行動(dòng)者目標(biāo)網(wǎng)絡(luò)得到$\mu '({x_{i + 1}}|{\theta ^{\mu '}})$ 從評(píng)判家目標(biāo)網(wǎng)絡(luò)中得到$Q({s_{i + 1}},\mu '({x_{i + 1}}|{\theta ^{\mu '}})|{\theta ^{Q'}})$ 根據(jù)式(20)得到${y_i}$,從評(píng)判家網(wǎng)絡(luò)得到$Q({x_i},{a_i}|{\theta ^Q})$ 根據(jù)式(21)計(jì)算損失函數(shù),并根據(jù)式(19)更新評(píng)判家網(wǎng)絡(luò)參數(shù)${\theta ^Q}$ (b) 更新行動(dòng)者網(wǎng)絡(luò) 從評(píng)判家網(wǎng)絡(luò)得到$Q({x_i},{a_i}|{\theta ^Q})$,并根據(jù)式(22)計(jì)算策略梯度 根據(jù)策略梯度更新行動(dòng)者網(wǎng)絡(luò)參數(shù)${\theta ^\mu }$ (c) 更新行動(dòng)者目標(biāo)網(wǎng)絡(luò)和評(píng)判家目標(biāo)網(wǎng)絡(luò) 根據(jù)式(23)更新行動(dòng)者目標(biāo)網(wǎng)絡(luò)和評(píng)判家網(wǎng)絡(luò)參數(shù) (d)基于標(biāo)準(zhǔn)次梯度法[15]更新拉格朗日乘子 ${\xi _{u,t + 1} } \leftarrow {\left[ { {\xi _{u,t} } - {\alpha _\xi }({Q_{ {\max} } } - { {\bar Q}_u})} \right]^ + },\forall u \in { {{U} }_{ {\rm{RRH} } } }\;\;\;\;\;\;\;\;\;\;(24)$ ${\nu _{u,t + 1} } \leftarrow {\left[ { {\nu _{u,t} } - {\alpha _\nu }({Q_{ {\max} } } - { {\bar Q}_u})} \right]^ + },\forall u \in { {{U} }_{ {\rm{MBS} } } }\;\;\;\;\;\;\;\;\;\;(25)$ End for End for 下載: 導(dǎo)出CSV
表 2 仿真參數(shù)
仿真參數(shù) 值 仿真參數(shù) 值 RRH最大發(fā)射功率 3 W 數(shù)據(jù)包大小$L$ 4 kbit/packet MBS最大發(fā)射功率 10 W MUEs路徑損耗模型 31.5+35lg(d) (d[km]) 熱噪聲功率譜密度 –102 dBm/Hz RUEs路徑損耗模型 31.5+40lg(d) (d[km]) 子載波個(gè)數(shù)$N$ 12 折扣因子$\gamma $ 0.99 單個(gè)資源塊帶寬 180 kHz ${r_{{{{\varGamma}} _1}}}$ 4 Mbps 軟更新因子$\varsigma $ 0.01 ${r_{{{{\varGamma}} _2}}}$ 4.5 Mbps 時(shí)隙長(zhǎng)度$\tau $ 10 ms ${r_{{\rm{MBS}}}}$ 512 kbps 下載: 導(dǎo)出CSV
-
彭木根, 艾元. 異構(gòu)云無(wú)線接入網(wǎng)絡(luò): 原理、架構(gòu)、技術(shù)和挑戰(zhàn)[J]. 電信科學(xué), 2015, 31(5): 41–45.PENG Mugen and AI Yuan. Heterogeneous cloud radio access networks: Principle, architecture, techniques and challenges[J]. Telecommunications Science, 2015, 31(5): 41–45. ALNOMAN A, CARVALHO G H S, ANPALAGAN A, et al. Energy efficiency on fully cloudified mobile networks: Survey, challenges, and open issues[J]. IEEE Communications Surveys & Tutorials, 2018, 20(2): 1271–1291. doi: 10.1109/COMST.2017.2780238 AKTAR M R, JAHID A, AL-HASAN M, et al. User association for efficient utilization of green energy in cloud radio access network[C]. 2019 International Conference on Electrical, Computer and Communication Engineering, Cox’sBazar, Bangladesh, 2019: 1–5. doi: 10.1109/ECACE.2019.8679128. ALQERM I and SHIHADA B. Sophisticated online learning scheme for green resource allocation in 5G heterogeneous cloud radio access networks[J]. IEEE Transactions on Mobile Computing, 2018, 17(10): 2423–2437. doi: 10.1109/TMC.2018.2797166 LIU Qiang, HAN Tao, ANSARI N, et al. On designing energy-efficient heterogeneous cloud radio access networks[J]. IEEE Transactions on Green Communications and Networking, 2018, 2(3): 721–734. doi: 10.1109/TGCN.2018.2835451 吳曉民. 能量捕獲驅(qū)動(dòng)的異構(gòu)網(wǎng)絡(luò)資源調(diào)度與優(yōu)化研究[D]. [博士論文], 中國(guó)科學(xué)技術(shù)大學(xué), 2016.WU Xiaomin. Resources optimization and control in the energy harvesting heterogeneous network[D]. [Ph.D. dissertation], University of Science and Technology of China, 2016. ZHANG Deyu, CHEN Zhigang, CAI L X, et al. Resource allocation for green cloud radio access networks with hybrid energy supplies[J]. IEEE Transactions on Vehicular Technology, 2018, 67(2): 1684–1697. doi: 10.1109/TVT.2017.2754273 孔巧. 混合能源供能的異構(gòu)蜂窩網(wǎng)絡(luò)中能源成本最小化問(wèn)題的研究[D]. [碩士論文], 華中科技大學(xué), 2016.KONG Qiao. Research on energy cost minimization problem in heterogeneous cellular networks with hybrid energy supplies[D]. [Master dissertation], Huazhong University of Science and Technology, 2016. YANG Jian, YANG Qinghai, SHEN Zhong, et al. Suboptimal online resource allocation in hybrid energy supplied OFDMA cellular networks[J]. IEEE Communications Letters, 2016, 20(8): 1639–1642. doi: 10.1109/LCOMM.2016.2575834 WEI Yifei, YU F R, SONG Mei, et al. User scheduling and resource allocation in HetNets with hybrid energy supply: An actor-critic reinforcement learning approach[J]. IEEE Transactions on Wireless Communications, 2018, 17(1): 680–692. doi: 10.1109/TWC.2017.2769644 PENG Mugen, ZHANG Kecheng, JIANG Jiamo, et al. Energy-efficient resource assignment and power allocation in heterogeneous cloud radio access networks[J]. IEEE Transactions on Vehicular Technology, 2015, 64(11): 5275–5287. doi: 10.1109/TVT.2014.2379922 陳前斌, 楊友超, 周鈺, 等. 基于隨機(jī)學(xué)習(xí)的接入網(wǎng)服務(wù)功能鏈部署算法[J]. 電子與信息學(xué)報(bào), 2019, 41(2): 417–423. doi: 10.11999/JEIT180310CHEN Qianbin, YANG Youchao, ZHOU Yu, et al. Deployment algorithm of service function chain of access network based on stochastic learning[J]. Journal of Electronics &Information Technology, 2019, 41(2): 417–423. doi: 10.11999/JEIT180310 深度強(qiáng)化學(xué)習(xí)-DDPG算法原理和實(shí)現(xiàn)[EB/OL]. https://www.jianshu.com/p/6fe18d0d8822, 2018. 齊岳, 黃碩華. 基于深度強(qiáng)化學(xué)習(xí)DDPG算法的投資組合管理[J]. 計(jì)算機(jī)與現(xiàn)代化, 2018(5): 93–99. doi: 10.3969/j.issn.1006-2475.2018.05.019QI Yue and HUANG Shuohua. Portfolio management based on DDPG algorithm of deep reinforcement learning[J]. Computer and Modernization, 2018(5): 93–99. doi: 10.3969/j.issn.1006-2475.2018.05.019 California ISO[EB/OL]. http://www.caiso.com, 2019. WANG Xin, ZHANG Yu, CHEN Tianyi, et al. Dynamic energy management for smart-grid-powered coordinated multipoint systems[J]. IEEE Journal on Selected Areas in Communications, 2016, 34(5): 1348–1359. doi: 10.1109/JSAC.2016.2520220 LI Jian, PENG Mugen, YU Yuling, et al. Energy-efficient joint congestion control and resource optimization in heterogeneous cloud radio access networks[J]. IEEE Transactions on Vehicular Technology, 2016, 65(12): 9873–9887. doi: 10.1109/TVT.2016.2531184 -