基于深度強(qiáng)化學(xué)習(xí)的IRS輔助認(rèn)知無線電系統(tǒng)波束成形算法

李國權(quán); 程濤; 郭永存; 龐宇; 林金朝

doi:10.11999/JEIT240447

基于深度強(qiáng)化學(xué)習(xí)的IRS輔助認(rèn)知無線電系統(tǒng)波束成形算法

doi: 10.11999/JEIT240447 cstr: 32379.14.JEIT240447

1.
重慶郵電大學(xué)通信與信息工程學(xué)院重慶 400065
2.
光電信息感測與微系統(tǒng)重慶市重點(diǎn)實(shí)驗(yàn)室重慶 400065

基金項(xiàng)目: 國家自然科學(xué)基金 (U21A20447)，重慶市自然科學(xué)基金創(chuàng)新群體科學(xué)基金(cstc2020jcyj-cxttX0002)

詳細(xì)信息

作者簡介:
李國權(quán)：男，教授，博士生導(dǎo)師，研究方向?yàn)闊o線資源管理、智能反射面優(yōu)化等

程濤：男，碩士生，研究方向?yàn)闊o線資源管理、智能反射面

郭永存：男，碩士生，研究方向?yàn)闊o線資源管理、智能反射面

龐宇：男，教授，博士生導(dǎo)師，研究方向?yàn)榧呻娐吩O(shè)計(jì)、無線通信和人工智能等

林金朝：男，教授，博士生導(dǎo)師，研究方向?yàn)闊o線通信傳輸技術(shù)與優(yōu)化等

通訊作者:
李國權(quán)　ligq@cqupt.edu.cn

中圖分類號: TN929.5
計(jì)量
- 文章訪問數(shù): 295
- HTML全文瀏覽量: 102
- PDF下載量: 53
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2024-06-04
- 修回日期: 2025-02-17
- 網(wǎng)絡(luò)出版日期: 2025-02-26
- 刊出日期: 2025-03-01

Deep Reinforcement Learning Based Beamforming Algorithm for IRS Assisted Cognitive Radio System

1.
School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, , Chongqing 400065, China
2.
Chongqing Key Laboratory of Optoelectronic Information Sensing and Microsystems, Chongqing 400065, China

Funds: The National Natural Science Foundation of China (U21A20447), The Foundation for Innovative Research Groups of the Natural Science Foundation of Chongqing (cstc2020jcyj-cxttX0002)

摘要

摘要: 為進(jìn)一步提升多用戶無線通信系統(tǒng)的頻譜利用率，該文提出了一種基于深度強(qiáng)化學(xué)習(xí)的智能反射面(IRS)輔助認(rèn)知無線電網(wǎng)絡(luò)次用戶和速率最大化算法。首先在考慮次基站最大發(fā)射功率約束、次基站對主用戶的干擾容限約束以及IRS相移矩陣單位模量約束的情況下，建立一個聯(lián)合優(yōu)化次基站波束成形和IRS相移矩陣的資源分配模型；然后提出了一種基于深度確定性策略梯度的主被動波束成形算法，聯(lián)合進(jìn)行變量優(yōu)化以最大化次用戶和速率。仿真結(jié)果表明，所提算法相對于傳統(tǒng)優(yōu)化算法在和速率性能接近的情況下具有更低的時間復(fù)雜度。
- 智能反射面 /
- 認(rèn)知無線電 /
- 深度強(qiáng)化學(xué)習(xí) /
- 波束成形
Abstract: Objective With the rapid development of wireless communication technologies, the demand for spectrum resources has significantly increased. Cognitive Radio (CR) has emerged as a promising solution to improve spectrum utilization by enabling Secondary Users (SUs) to access licensed spectrum bands without causing harmful interference to Primary Users (PUs). However, traditional CR networks face challenges in achieving high spectral efficiency due to limited control over the wireless environment. Intelligent Reflecting Surfaces (IRS) have recently been introduced as a revolutionary technology to enhance communication performance by dynamically reconfiguring the propagation environment. This paper aims to maximize the sum rate of SUs in an IRS-assisted CR network by jointly optimizing the active beamforming at the Secondary Base Station (SBS) and the passive beamforming at the IRS, subject to constraints on the maximum transmit power of the SBS, the interference tolerance of PUs, and the unit modulus of the IRS phase shifts. Methods To address the non-convex and highly coupled optimization problem, a Deep Reinforcement Learning (DRL)-based algorithm is proposed. Specifically, the problem is formulated as a Markov Decision Process (MDP), where the state space includes the Channel State Information (CSI) of the entire system, the Signal-to-Interference-plus-Noise Ratio (SINR) in the SU network, and the action space consists of the SBS beamforming vectors and the IRS phase shift matrix. The reward function is designed to maximize the sum rate of SUs while penalizing violations of the constraints. The Deep Deterministic Policy Gradient (DDPG) algorithm is used to solve the MDP, owing to its ability to handle continuous action spaces. The DDPG framework consists of an actor network, which outputs the optimal actions, and a critic network, which evaluates these actions based on the reward function. The training process involves interacting with the environment to learn the optimal policy, and the algorithm is fine-tuned to ensure convergence and robustness under varying system conditions. Results and Discussions Simulation results show that the proposed scheme achieves comparable sum rate performance with lower time complexity after optimization, compared to traditional optimization algorithms. The proposed algorithm significantly outperforms the no-IRS and IRS-random phase shift schemes (Fig. 5). The results demonstrate that the proposed algorithm achieves a sum rate close to that of alternating optimization-based approaches (Fig. 5), while substantially reducing computational complexity (Fig. 5, Table 2). Additionally, the impact of the number of IRS elements on the sum rate is examined (Fig. 6). As expected, the average reward increases with the number of reflecting elements, while the convergence time remains stable, indicating the robustness of the proposed algorithm. The DRL-based algorithm, starting from the identity matrix, can learn and adjust the beamforming vectors and phase shifts to approach the optimal solution through interaction with the environment (Fig. 7). It is also observed that the variance of the instantaneous reward increases with the transmit power. This is due to the larger dynamic range of the instantaneous reward at higher power levels, resulting in greater fluctuations and slower convergence. The relationship between average reward and time steps under different transmit power levels is presented, highlighting the sensitivity of the algorithm to high signal-to-noise ratios (Fig. 8). Moreover, it can be observed that a learning rate of 0.001 yields the best performance, while excessively high or low learning rates degrade performance (Fig. 9). The discount factor has a relatively smaller impact on performance compared to the learning rate (Fig. 10). Conclusions This paper proposes a DRL-based algorithm for joint active and passive beamforming optimization in an IRS-assisted CR network. The algorithm utilizes the DDPG framework to maximize the sum rate of SUs while adhering to constraints on transmit power, interference, and IRS phase shifts. Simulation results demonstrate that the proposed algorithm achieves comparable sum rate performance to traditional optimization methods, with significantly lower computational complexity. The findings also highlight the impact of DRL parameter settings on performance. Future work will focus on extending the proposed algorithm to multi-cell scenarios and incorporating imperfect CSI to enhance its robustness in practical environments.
- Intelligent Reflecting Surface (IRS) /
- Cognitive Radio (CR) /
- Deep Reinforcement Learning (DRL) /
- Beamforming

HTML全文

圖 1 IRS輔助的認(rèn)知無線電系統(tǒng)模型

下載: 全尺寸圖片幻燈片

圖 2 DDPG算法框架

下載: 全尺寸圖片幻燈片

圖 3 演員網(wǎng)絡(luò)和評論家網(wǎng)絡(luò)的DNN結(jié)構(gòu)

下載: 全尺寸圖片幻燈片

圖 4 仿真場景圖

下載: 全尺寸圖片幻燈片

圖 5 SBS發(fā)射功率與SU和速率的關(guān)系

下載: 全尺寸圖片幻燈片

圖 6 不同反射單元數(shù)量下算法的收斂性能

下載: 全尺寸圖片幻燈片

圖 7 不同SBS發(fā)射功率下獎勵與時間步長的關(guān)系

下載: 全尺寸圖片幻燈片

圖 8 不同SBS發(fā)射功率下平均獎勵與時間步長的關(guān)系

下載: 全尺寸圖片幻燈片

圖 9 不同學(xué)習(xí)率下的平均獎勵與時間步長的關(guān)系

下載: 全尺寸圖片幻燈片

圖 10 在不同衰減率下的平均獎勵與時間步長的關(guān)系

下載: 全尺寸圖片幻燈片

1 基于DDPG的主被動波束成形算法訓(xùn)練

輸入：IRS輔助的下行鏈路多用戶MISO-CR系統(tǒng)的所有CSI
輸出：最優(yōu)動作${\boldsymbol{a}} = \left\{ {{{\boldsymbol{W}}_{\text{s}}},{\boldsymbol{\varTheta}} } \right\}$，Q值函數(shù)
初始化：大小為$\mathcal{D}$經(jīng)驗(yàn)回放池$\mathcal{M}$，隨機(jī)初始化演員和評論家網(wǎng) 　　　　　絡(luò)參數(shù)${\theta _\mu }$和${\theta _Q}$，賦值$ {\theta _{Q'}} \leftarrow {\theta _Q}{\text{ }},{\text{ }}{\theta _{\mu '}} \leftarrow {\theta _\mu } $
for episode = $1,2,3, \cdots ,{T_1}$，進(jìn)入循環(huán)
初始化發(fā)射波束成形矩陣${\boldsymbol{W}}_{\text{s}}^{\left( 0 \right)}$、相移矩陣${{\boldsymbol{\varTheta}} ^{\left( 0 \right)}}$為單位矩陣作　　為${{\boldsymbol{a}}^{\left( 0 \right)}}$
構(gòu)建初始狀態(tài)$ {{\boldsymbol{s}}^{\left( 0 \right)}} $
for time steps= $1,2,3, \cdots ,{T_2}$，進(jìn)入循環(huán)
從演員網(wǎng)絡(luò)中獲取動作${a^{\left( t \right)}}$
根據(jù)式(15)計(jì)算即時獎勵${r^{\left( t \right)}}$
根據(jù)式(3)計(jì)算所有SU的信干噪比$ \gamma _{{\text{SU}}}^{\left( t \right)} $
構(gòu)建在動作${{\boldsymbol{a}}^{\left( t \right)}}$下的狀態(tài)${{\boldsymbol{s}}^{\left( {t + 1} \right)}}$
存儲經(jīng)驗(yàn)數(shù)據(jù)組$\left( {{{\boldsymbol{s}}^{\left( t \right)}},{a^{\left( t \right)}},{r^{\left( t \right)}},{{\boldsymbol{s}}^{\left( {t + 1} \right)}}} \right)$到經(jīng)驗(yàn)回放池中
從$\mathcal{M}$中隨機(jī)抽取大小為${N_{\mathrm{B}}}$的小批量經(jīng)驗(yàn)樣本
根據(jù)式(6)得到目標(biāo)Q值
根據(jù)式(7)得到在線評論家網(wǎng)絡(luò)損失函數(shù)$ L({\theta _Q}) $
根據(jù)式(8)得到在線演員網(wǎng)絡(luò)策略梯度$ {\nabla _{{\theta _\mu }}}J(\mu ) $
根據(jù)式(9)更新評論家網(wǎng)絡(luò)參數(shù)$ {\iota _Q} $
根據(jù)式(10)更新演員網(wǎng)絡(luò)參數(shù)${\iota _\mu }$
根據(jù)式(11)更新目標(biāo)評論家網(wǎng)絡(luò)參數(shù)${\tau _Q}$
根據(jù)式(12)更新目標(biāo)演員網(wǎng)絡(luò)參數(shù)${\tau _\mu }$
更新狀態(tài)${{\boldsymbol{s}}^{\left( t \right)}} \leftarrow {{\boldsymbol{s}}^{\left( {t + 1} \right)}}$
end for
end for

下載: 導(dǎo)出CSV

表 1 DDPG算法參數(shù)

超參數(shù)	描述	參數(shù)值
$\gamma $	折扣率	0.99
${\iota _\mu },{\iota _Q}$	演員、評論家網(wǎng)絡(luò)的學(xué)習(xí)率	0.001
${\tau _\mu },{\tau _Q}$	目標(biāo)演員、目標(biāo)評論家網(wǎng)絡(luò)的學(xué)習(xí)率	0.001
$ {\lambda _a},{\lambda _c} $	訓(xùn)練演員、評論家網(wǎng)絡(luò)的衰減率	0.00001
${L_1},{L_2}$	DNN隱藏層神經(jīng)元數(shù)	1024
$\mathcal{D}$	經(jīng)驗(yàn)回放池$\mathcal{M}$的大小	100000
${T_1}$	回合數(shù)	10
${T_2}$	每個回合的時間步長數(shù)	1000000
${N_{\mathrm{B}}}$	小批量采樣的大小	16

下載: 導(dǎo)出CSV

表 2 不同算法運(yùn)行時間對比

IRS反射單元數(shù)	基于交替優(yōu)化(ms)	本文算法(ms)
N=4	968.76	16.24
N=10	1367.41	16.84
N=20	2248.25	16.36
N=30	3018.52	16.65

下載: 導(dǎo)出CSV

參考文獻(xiàn)(18)

[1]	LI Guoquan, HONG Zijie, PANG Yu, et al. Resource allocation for sum-rate maximization in NOMA-based generalized spatial modulation[J]. Digital Communications and Networks, 2022, 8(6): 1077–1084. doi: 10.1016/j.dcan.2022.02.005.
[2]	LI Xingwang, ZHENG Yike, ALSHEHRI M D, et al. Cognitive AmBC-NOMA IoV-MTS networks with IQI: Reliability and security analysis[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(2): 2596–2607. doi: 10.1109/TITS.2021.3113995.
[3]	李國權(quán), 黨剛, 林金朝, 等. RIS輔助的MISO系統(tǒng)安全魯棒波束賦形算法[J]. 電子與信息學(xué)報, 2023, 45(8): 2867–2875. doi: 10.11999/JEIT220894. LI Guoquan, DANG Gang, LIN Jinzhao, et al. Secure and robust beamforming algorithm for RIS assisted MISO systems[J]. Journal of Electronics & Information Technology, 2023, 45(8): 2867–2875. doi: 10.11999/JEIT220894.
[4]	CHEN Guang, CHEN Yueyun, MAI Zhiyuan, et al. Joint multiple resource allocation for offloading cost minimization in IRS-assisted MEC networks with NOMA[J]. Digital Communications and Networks, 2023, 9(3): 613–627. doi: 10.1016/j.dcan.2022.10.029.
[5]	熊軍洲, 李國權(quán), 王鑰濤, 等. 基于有源智能反射面反射單元分組的反射調(diào)制系統(tǒng)[J]. 電子與信息學(xué)報, 2024, 46(7): 2765–2772. doi: 10.11999/JEIT231187. XIONG Junzhou, LI Guoquan, WANG Yuetao, et al. A reflection modulation system based on reflecting element grouping of active intelligent reflecting surface[J]. Journal of Electronics & Information Technology, 2024, 46(7): 2765–2772. doi: 10.11999/JEIT231187.
[6]	GUAN Xinrong, WU Qingqing, and ZHANG Rui. Joint power control and passive beamforming in IRS-assisted spectrum sharing[J]. IEEE Communications Letters, 2020, 24(7): 1553–1557. doi: 10.1109/LCOMM.2020.2979709.
[7]	LE A T, DO D T, CAO Haotong, et al. Spectrum efficiency design for intelligent reflecting surface-aided IoT systems[C]. GLOBECOM 2022 - 2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 2022: 25–30. doi: 10.1109/GLOBECOM48099.2022.10000937.
[8]	YUAN Jie, LIANG Yingchang, JOUNG J, et al. Intelligent Reflecting Surface (IRS)-enhanced cognitive radio system[C]. ICC 2020 - 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 2022: 1–6. doi: 10.1109/ICC40277.2020.9148890.
[9]	WANG Zining, LIN Min, HUANG Shupei, et al. Robust beamforming for IRS-aided SWIPT in cognitive radio networks[J]. Digital Communications and Networks, 2023, 9(3): 645–654. doi: 10.1016/j.dcan.2022.10.030.
[10]	LI Guoquan, ZHANG Hui, WANG Yuhui, et al. QoS guaranteed power minimization and beamforming for IRS-assisted NOMA systems[J]. IEEE Wireless Communications Letters, 2023, 12(3): 391–395. doi: 10.1109/LWC.2022.3189272.
[11]	FENG Keming, WANG Qisheng, LI Xiao, et al. Deep reinforcement learning based intelligent reflecting surface optimization for MISO communication systems[J]. IEEE Wireless Communications Letters, 2020, 9(5): 745–749. doi: 10.1109/LWC.2020.2969167.
[12]	HUANG Chongwen, MO Ronghong, and YUEN C. Reconfigurable intelligent surface assisted multiuser MISO systems exploiting deep reinforcement learning[J]. IEEE Journal on Selected Areas in Communications, 2020, 38(8): 1839–1850. doi: 10.1109/JSAC.2020.3000835.
[13]	YANG Helin, XIONG Zehui, ZHAO Jun, et al. Deep reinforcement learning-based intelligent reflecting surface for secure wireless communications[J]. IEEE Transactions on Wireless Communications, 2021, 20(1): 375–388. doi: 10.1109/TWC.2020.3024860.
[14]	ZHONG Canwei, CUI Miao, ZHANG Guangchi, et al. Deep reinforcement learning-based optimization for IRS-assisted cognitive radio systems[J]. IEEE Transactions on Communications, 2022, 70(6): 3849–3864. doi: 10.1109/TCOMM.2022.3171837.
[15]	GUO Jianxin, WANG Zhe, LI Jun, et al. Deep reinforcement learning based resource allocation for intelligent reflecting surface assisted dynamic spectrum sharing[C]. 2022 14th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 2022: 1178–1183. doi: 10.1109/WCSP55476.2022.10039119.
[16]	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]. 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016.
[17]	WEI Yi, ZHAO Mingmin, ZHAO Minjian, et al. Channel estimation for IRS-aided multiuser communications with reduced error propagation[J]. IEEE Transactions on Wireless Communications, 2022, 21(4): 2725–2741. doi: 10.1109/TWC.2021.3115161.
[18]	HAN Yu, TANG Wankai, JIN Shi, et al. Large intelligent surface-assisted wireless communication exploiting statistical CSI[J]. IEEE Transactions on Vehicular Technology, 2019, 68(8): 8238–8242. doi: 10.1109/TVT.2019.2923997.