一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號碼
標(biāo)題
留言內(nèi)容
驗(yàn)證碼

基于深度強(qiáng)化學(xué)習(xí)的IRS輔助認(rèn)知無線電系統(tǒng)波束成形算法

李國權(quán) 程濤 郭永存 龐宇 林金朝

李國權(quán), 程濤, 郭永存, 龐宇, 林金朝. 基于深度強(qiáng)化學(xué)習(xí)的IRS輔助認(rèn)知無線電系統(tǒng)波束成形算法[J]. 電子與信息學(xué)報, 2025, 47(3): 657-665. doi: 10.11999/JEIT240447
引用本文: 李國權(quán), 程濤, 郭永存, 龐宇, 林金朝. 基于深度強(qiáng)化學(xué)習(xí)的IRS輔助認(rèn)知無線電系統(tǒng)波束成形算法[J]. 電子與信息學(xué)報, 2025, 47(3): 657-665. doi: 10.11999/JEIT240447
LI Guoquan, CHENG Tao, GUO Yongcun, PANG Yu, LIN Jinzhao. Deep Reinforcement Learning Based Beamforming Algorithm for IRS Assisted Cognitive Radio System[J]. Journal of Electronics & Information Technology, 2025, 47(3): 657-665. doi: 10.11999/JEIT240447
Citation: LI Guoquan, CHENG Tao, GUO Yongcun, PANG Yu, LIN Jinzhao. Deep Reinforcement Learning Based Beamforming Algorithm for IRS Assisted Cognitive Radio System[J]. Journal of Electronics & Information Technology, 2025, 47(3): 657-665. doi: 10.11999/JEIT240447

基于深度強(qiáng)化學(xué)習(xí)的IRS輔助認(rèn)知無線電系統(tǒng)波束成形算法

doi: 10.11999/JEIT240447 cstr: 32379.14.JEIT240447
基金項(xiàng)目: 國家自然科學(xué)基金 (U21A20447),重慶市自然科學(xué)基金創(chuàng)新群體科學(xué)基金(cstc2020jcyj-cxttX0002)
詳細(xì)信息
    作者簡介:

    李國權(quán):男,教授,博士生導(dǎo)師,研究方向?yàn)闊o線資源管理、智能反射面優(yōu)化等

    程濤:男,碩士生,研究方向?yàn)闊o線資源管理、智能反射面

    郭永存:男,碩士生,研究方向?yàn)闊o線資源管理、智能反射面

    龐宇:男,教授,博士生導(dǎo)師,研究方向?yàn)榧呻娐吩O(shè)計(jì)、無線通信和人工智能等

    林金朝:男,教授,博士生導(dǎo)師,研究方向?yàn)闊o線通信傳輸技術(shù)與優(yōu)化等

    通訊作者:

    李國權(quán) ligq@cqupt.edu.cn

  • 中圖分類號: TN929.5

Deep Reinforcement Learning Based Beamforming Algorithm for IRS Assisted Cognitive Radio System

Funds: The National Natural Science Foundation of China (U21A20447), The Foundation for Innovative Research Groups of the Natural Science Foundation of Chongqing (cstc2020jcyj-cxttX0002)
  • 摘要: 為進(jìn)一步提升多用戶無線通信系統(tǒng)的頻譜利用率,該文提出了一種基于深度強(qiáng)化學(xué)習(xí)的智能反射面(IRS)輔助認(rèn)知無線電網(wǎng)絡(luò)次用戶和速率最大化算法。首先在考慮次基站最大發(fā)射功率約束、次基站對主用戶的干擾容限約束以及IRS相移矩陣單位模量約束的情況下,建立一個聯(lián)合優(yōu)化次基站波束成形和IRS相移矩陣的資源分配模型;然后提出了一種基于深度確定性策略梯度的主被動波束成形算法,聯(lián)合進(jìn)行變量優(yōu)化以最大化次用戶和速率。仿真結(jié)果表明,所提算法相對于傳統(tǒng)優(yōu)化算法在和速率性能接近的情況下具有更低的時間復(fù)雜度。
  • 圖  1  IRS輔助的認(rèn)知無線電系統(tǒng)模型

    圖  2  DDPG算法框架

    圖  3  演員網(wǎng)絡(luò)和評論家網(wǎng)絡(luò)的DNN結(jié)構(gòu)

    圖  4  仿真場景圖

    圖  5  SBS發(fā)射功率與SU和速率的關(guān)系

    圖  6  不同反射單元數(shù)量下算法的收斂性能

    圖  7  不同SBS發(fā)射功率下獎勵與時間步長的關(guān)系

    圖  8  不同SBS發(fā)射功率下平均獎勵與時間步長的關(guān)系

    圖  9  不同學(xué)習(xí)率下的平均獎勵與時間步長的關(guān)系

    圖  10  在不同衰減率下的平均獎勵與時間步長的關(guān)系

    1  基于DDPG的主被動波束成形算法訓(xùn)練

     輸入:IRS輔助的下行鏈路多用戶MISO-CR系統(tǒng)的所有CSI
     輸出:最優(yōu)動作${\boldsymbol{a}} = \left\{ {{{\boldsymbol{W}}_{\text{s}}},{\boldsymbol{\varTheta}} } \right\}$,Q值函數(shù)
     初始化:大小為$\mathcal{D}$經(jīng)驗(yàn)回放池$\mathcal{M}$,隨機(jī)初始化演員和評論家網(wǎng)
         絡(luò)參數(shù)${\theta _\mu }$和${\theta _Q}$,賦值$ {\theta _{Q'}} \leftarrow {\theta _Q}{\text{ }},{\text{ }}{\theta _{\mu '}} \leftarrow {\theta _\mu } $
     for episode = $1,2,3, \cdots ,{T_1}$,進(jìn)入循環(huán)
      初始化發(fā)射波束成形矩陣${\boldsymbol{W}}_{\text{s}}^{\left( 0 \right)}$、相移矩陣${{\boldsymbol{\varTheta}} ^{\left( 0 \right)}}$為單位矩陣作
      為${{\boldsymbol{a}}^{\left( 0 \right)}}$
      構(gòu)建初始狀態(tài)$ {{\boldsymbol{s}}^{\left( 0 \right)}} $
      for time steps= $1,2,3, \cdots ,{T_2}$,進(jìn)入循環(huán)
      從演員網(wǎng)絡(luò)中獲取動作${a^{\left( t \right)}}$
      根據(jù)式(15)計(jì)算即時獎勵${r^{\left( t \right)}}$
      根據(jù)式(3)計(jì)算所有SU的信干噪比$ \gamma _{{\text{SU}}}^{\left( t \right)} $
      構(gòu)建在動作${{\boldsymbol{a}}^{\left( t \right)}}$下的狀態(tài)${{\boldsymbol{s}}^{\left( {t + 1} \right)}}$
      存儲經(jīng)驗(yàn)數(shù)據(jù)組$\left( {{{\boldsymbol{s}}^{\left( t \right)}},{a^{\left( t \right)}},{r^{\left( t \right)}},{{\boldsymbol{s}}^{\left( {t + 1} \right)}}} \right)$到經(jīng)驗(yàn)回放池中
      從$\mathcal{M}$中隨機(jī)抽取大小為${N_{\mathrm{B}}}$的小批量經(jīng)驗(yàn)樣本
      根據(jù)式(6)得到目標(biāo)Q值
      根據(jù)式(7)得到在線評論家網(wǎng)絡(luò)損失函數(shù)$ L({\theta _Q}) $
      根據(jù)式(8)得到在線演員網(wǎng)絡(luò)策略梯度$ {\nabla _{{\theta _\mu }}}J(\mu ) $
      根據(jù)式(9)更新評論家網(wǎng)絡(luò)參數(shù)$ {\iota _Q} $
      根據(jù)式(10)更新演員網(wǎng)絡(luò)參數(shù)${\iota _\mu }$
      根據(jù)式(11)更新目標(biāo)評論家網(wǎng)絡(luò)參數(shù)${\tau _Q}$
      根據(jù)式(12)更新目標(biāo)演員網(wǎng)絡(luò)參數(shù)${\tau _\mu }$
      更新狀態(tài)${{\boldsymbol{s}}^{\left( t \right)}} \leftarrow {{\boldsymbol{s}}^{\left( {t + 1} \right)}}$
      end for
     end for
    下載: 導(dǎo)出CSV

    表  1  DDPG算法參數(shù)

    超參數(shù) 描述 參數(shù)值
    $\gamma $ 折扣率 0.99
    ${\iota _\mu },{\iota _Q}$ 演員、評論家網(wǎng)絡(luò)的學(xué)習(xí)率 0.001
    ${\tau _\mu },{\tau _Q}$ 目標(biāo)演員、目標(biāo)評論家網(wǎng)絡(luò)的學(xué)習(xí)率 0.001
    $ {\lambda _a},{\lambda _c} $ 訓(xùn)練演員、評論家網(wǎng)絡(luò)的衰減率 0.00001
    ${L_1},{L_2}$ DNN隱藏層神經(jīng)元數(shù) 1024
    $\mathcal{D}$ 經(jīng)驗(yàn)回放池$\mathcal{M}$的大小 100000
    ${T_1}$ 回合數(shù) 10
    ${T_2}$ 每個回合的時間步長數(shù) 1000000
    ${N_{\mathrm{B}}}$ 小批量采樣的大小 16
    下載: 導(dǎo)出CSV

    表  2  不同算法運(yùn)行時間對比

    IRS反射單元數(shù) 基于交替優(yōu)化(ms) 本文算法(ms)
    N=4 968.76 16.24
    N=10 1367.41 16.84
    N=20 2248.25 16.36
    N=30 3018.52 16.65
    下載: 導(dǎo)出CSV
  • [1] LI Guoquan, HONG Zijie, PANG Yu, et al. Resource allocation for sum-rate maximization in NOMA-based generalized spatial modulation[J]. Digital Communications and Networks, 2022, 8(6): 1077–1084. doi: 10.1016/j.dcan.2022.02.005.
    [2] LI Xingwang, ZHENG Yike, ALSHEHRI M D, et al. Cognitive AmBC-NOMA IoV-MTS networks with IQI: Reliability and security analysis[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(2): 2596–2607. doi: 10.1109/TITS.2021.3113995.
    [3] 李國權(quán), 黨剛, 林金朝, 等. RIS輔助的MISO系統(tǒng)安全魯棒波束賦形算法[J]. 電子與信息學(xué)報, 2023, 45(8): 2867–2875. doi: 10.11999/JEIT220894.

    LI Guoquan, DANG Gang, LIN Jinzhao, et al. Secure and robust beamforming algorithm for RIS assisted MISO systems[J]. Journal of Electronics & Information Technology, 2023, 45(8): 2867–2875. doi: 10.11999/JEIT220894.
    [4] CHEN Guang, CHEN Yueyun, MAI Zhiyuan, et al. Joint multiple resource allocation for offloading cost minimization in IRS-assisted MEC networks with NOMA[J]. Digital Communications and Networks, 2023, 9(3): 613–627. doi: 10.1016/j.dcan.2022.10.029.
    [5] 熊軍洲, 李國權(quán), 王鑰濤, 等. 基于有源智能反射面反射單元分組的反射調(diào)制系統(tǒng)[J]. 電子與信息學(xué)報, 2024, 46(7): 2765–2772. doi: 10.11999/JEIT231187.

    XIONG Junzhou, LI Guoquan, WANG Yuetao, et al. A reflection modulation system based on reflecting element grouping of active intelligent reflecting surface[J]. Journal of Electronics & Information Technology, 2024, 46(7): 2765–2772. doi: 10.11999/JEIT231187.
    [6] GUAN Xinrong, WU Qingqing, and ZHANG Rui. Joint power control and passive beamforming in IRS-assisted spectrum sharing[J]. IEEE Communications Letters, 2020, 24(7): 1553–1557. doi: 10.1109/LCOMM.2020.2979709.
    [7] LE A T, DO D T, CAO Haotong, et al. Spectrum efficiency design for intelligent reflecting surface-aided IoT systems[C]. GLOBECOM 2022 - 2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 2022: 25–30. doi: 10.1109/GLOBECOM48099.2022.10000937.
    [8] YUAN Jie, LIANG Yingchang, JOUNG J, et al. Intelligent Reflecting Surface (IRS)-enhanced cognitive radio system[C]. ICC 2020 - 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 2022: 1–6. doi: 10.1109/ICC40277.2020.9148890.
    [9] WANG Zining, LIN Min, HUANG Shupei, et al. Robust beamforming for IRS-aided SWIPT in cognitive radio networks[J]. Digital Communications and Networks, 2023, 9(3): 645–654. doi: 10.1016/j.dcan.2022.10.030.
    [10] LI Guoquan, ZHANG Hui, WANG Yuhui, et al. QoS guaranteed power minimization and beamforming for IRS-assisted NOMA systems[J]. IEEE Wireless Communications Letters, 2023, 12(3): 391–395. doi: 10.1109/LWC.2022.3189272.
    [11] FENG Keming, WANG Qisheng, LI Xiao, et al. Deep reinforcement learning based intelligent reflecting surface optimization for MISO communication systems[J]. IEEE Wireless Communications Letters, 2020, 9(5): 745–749. doi: 10.1109/LWC.2020.2969167.
    [12] HUANG Chongwen, MO Ronghong, and YUEN C. Reconfigurable intelligent surface assisted multiuser MISO systems exploiting deep reinforcement learning[J]. IEEE Journal on Selected Areas in Communications, 2020, 38(8): 1839–1850. doi: 10.1109/JSAC.2020.3000835.
    [13] YANG Helin, XIONG Zehui, ZHAO Jun, et al. Deep reinforcement learning-based intelligent reflecting surface for secure wireless communications[J]. IEEE Transactions on Wireless Communications, 2021, 20(1): 375–388. doi: 10.1109/TWC.2020.3024860.
    [14] ZHONG Canwei, CUI Miao, ZHANG Guangchi, et al. Deep reinforcement learning-based optimization for IRS-assisted cognitive radio systems[J]. IEEE Transactions on Communications, 2022, 70(6): 3849–3864. doi: 10.1109/TCOMM.2022.3171837.
    [15] GUO Jianxin, WANG Zhe, LI Jun, et al. Deep reinforcement learning based resource allocation for intelligent reflecting surface assisted dynamic spectrum sharing[C]. 2022 14th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 2022: 1178–1183. doi: 10.1109/WCSP55476.2022.10039119.
    [16] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]. 4th International Conference on Learning Representations, San Juan, Puerto Rico, 2016.
    [17] WEI Yi, ZHAO Mingmin, ZHAO Minjian, et al. Channel estimation for IRS-aided multiuser communications with reduced error propagation[J]. IEEE Transactions on Wireless Communications, 2022, 21(4): 2725–2741. doi: 10.1109/TWC.2021.3115161.
    [18] HAN Yu, TANG Wankai, JIN Shi, et al. Large intelligent surface-assisted wireless communication exploiting statistical CSI[J]. IEEE Transactions on Vehicular Technology, 2019, 68(8): 8238–8242. doi: 10.1109/TVT.2019.2923997.
  • 加載中
圖(10) / 表(3)
計(jì)量
  • 文章訪問數(shù):  295
  • HTML全文瀏覽量:  102
  • PDF下載量:  53
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2024-06-04
  • 修回日期:  2025-02-17
  • 網(wǎng)絡(luò)出版日期:  2025-02-26
  • 刊出日期:  2025-03-01

目錄

    /

    返回文章
    返回