基于強(qiáng)化學(xué)習(xí)的802.11ax上行鏈路調(diào)度算法
doi: 10.11999/JEIT210590 cstr: 32379.14.JEIT210590
-
同濟(jì)大學(xué)電子與信息工程學(xué)院 上海 201800
802.11ax Uplink Scheduling Algorithm Based on Reinforcement Learning
-
College of Electronic and Information Engineering, Tongji University, Shanghai 201800, China
-
摘要: 隨著物聯(lián)網(wǎng)(IoT)時(shí)代的到來(lái),無(wú)線網(wǎng)絡(luò)飽和的問(wèn)題已經(jīng)越來(lái)越嚴(yán)重。為了克服終端密集接入問(wèn)題,IEEE標(biāo)準(zhǔn)協(xié)會(huì)(IEEE-SA)制定了無(wú)線局域網(wǎng)的最新標(biāo)準(zhǔn)—IEEE 802.11ax。該標(biāo)準(zhǔn)使用正交頻分多址(OFDMA)技術(shù)對(duì)無(wú)線信道資源進(jìn)行了更細(xì)致的劃分,劃分出的子信道被稱為資源單元(RU)。為解決密集用戶環(huán)境下802.11ax 上行鏈路的信道資源調(diào)度問(wèn)題,該文提出一種基于強(qiáng)化學(xué)習(xí)的RU調(diào)度算法。該算法使用演員-評(píng)論家(Actor-Critic)算法訓(xùn)練指針網(wǎng)絡(luò),解決了自適應(yīng)RU調(diào)度問(wèn)題,最終合理分配RU資源給各用戶,兼具優(yōu)先級(jí)和公平性的保障。仿真結(jié)果表明,該調(diào)度算法在IEEE 802.11ax上行鏈路中比傳統(tǒng)的調(diào)度方式更有效,具有較強(qiáng)的泛化能力,適合應(yīng)用在密集用戶環(huán)境下的物聯(lián)網(wǎng)場(chǎng)景中。
-
關(guān)鍵詞:
- 物聯(lián)網(wǎng) /
- IEEE 802.11ax /
- 強(qiáng)化學(xué)習(xí) /
- 上行鏈路 /
- 演員-評(píng)論家
Abstract: With the arrival of the Internet of Things (IoT) era, the problem of wireless network saturation has become more and more serious. In order to overcome this problem, the IEEE Standards Association (IEEE-SA) has formulated the latest standard for wireless local area networks—IEEE 802.11ax. In this standard, the Orthogonal Frequency Division Multiple Access (OFDMA) technology is utilized to divide wireless channel into several groups of tones, and the divided sub-channels are called Resource Units (RUs). In order to solve the channel resource scheduling problem of 802.11ax uplink in dense user environments, an RU scheduling algorithm based on reinforcement learning is proposed in this paper. The Actor-Critic algorithm is used to train the pointer network and solve the adaptive allocation problem of RU. Finally, RUs are allocated to each user reasonably with the guarantee of priority and fairness. The simulation results show that the scheduling algorithm is more effective than traditional scheduling methods in the IEEE 802.11ax uplink and has a strong generalization ability, which is suitable for the IoT scenario in dense user environments.-
Key words:
- Internet of Things (IoT) /
- IEEE 802.11ax /
- Reinforcement learning /
- Uplink /
- Actor-Critic
-
表 1 QoS值與業(yè)務(wù)類(lèi)型對(duì)應(yīng)關(guān)系
QoS 業(yè)務(wù)類(lèi)型 1 探測(cè)請(qǐng)求、火災(zāi)報(bào)警、交通事故報(bào)警等 2 患者監(jiān)測(cè)、工業(yè)設(shè)備監(jiān)測(cè)等 3 智能家居、智慧農(nóng)業(yè)、倉(cāng)儲(chǔ)管理等 4 監(jiān)控視頻、智能水表、智能電表等 5 信道質(zhì)量指示符、無(wú)線電測(cè)量服務(wù)等 下載: 導(dǎo)出CSV
表 2 不同MCS與不同RU大小情況下的數(shù)據(jù)傳輸速率(Mbps)
MCS索引 MCS 26 tones 52 tones 106 tones 242 tones 484 tones 996 tones 1 BPSK, 1/2 0.8 1.7 3.5 8.1 46.3 34.0 2 QPSK, 1/2 1.7 3.3 7.1 16.3 32.5 68.1 3 QPSK, 3/4 2.5 5.0 10.6 24.4 48.8 102.1 4 16-QAM, 1/2 3.3 6.7 14.2 32.5 65.0 136.1 5 16-QAM, 3/4 5.0 10.0 21.3 48.8 97.5 204.2 6 64-QAM, 2/3 6.7 13.3 28.3 65.0 130.0 272.2 7 64-QAM, 3/4 7.5 15.0 31.9 73.1 146.3 306.3 8 64-QAM, 5/6 8.3 16.7 35.4 81.3 162.5 340.3 9 256-QAM, 3/4 10.0 20.0 42.5 97.5 195.0 408.3 10 256-QAM, 5/6 11.1 22.2 47.2 108.3 216.7 453.7 11 1024-QAM, 3/4 – – – 121.9 243.8 510.4 下載: 導(dǎo)出CSV
表 3 Actor-Critic算法訓(xùn)練指針網(wǎng)絡(luò)的過(guò)程
(1) 初始化超參數(shù),初始化訓(xùn)練集$ {C^{{\text{in}}}} $,設(shè)置訓(xùn)練總步長(zhǎng)$ T $,設(shè)置
批次數(shù)$ N $(2) 初始化指針網(wǎng)絡(luò)參數(shù)$ \theta $ (3) 初始化Critic網(wǎng)絡(luò)參數(shù)$ {\theta _v} $ (4) for t = 1 to $ T $: (5) 從訓(xùn)練集中獲取輸入: ${c_i}{ {\sim {\rm{SampleInput} }(} }{C^{ {\text{in} } } }){\text{ for } }i \in \{ 1,2,\cdots,N\}$ (6) 使用$ \theta $選出物品子集:
${\pi _i}\sim{\text{SampleSolution(} }{p_\theta }(.|{c_i}){\text{) for } }i \in \{ 1,2,\cdots,N\}$(7) 使用$ {\theta _v} $計(jì)算基線值:
$b({c_i}) = {b_{ {\theta _v} } }({{\boldsymbol{c}}_i}){\text{ for } }i \in \{ 1,2,\cdots,N\}$(8) 計(jì)算Actor目標(biāo)函數(shù)的梯度:
${{\text{?}}_\theta }J(\theta ) = \dfrac{1}{N}\displaystyle\sum\limits_{i = 1}^N ( V({\pi _i}|{{\boldsymbol{c}}_i}) - b({c_i})){{\text{?}}_\theta }\ln {p_\theta }({\pi _i}|{{\boldsymbol{c}}_i})$
(9) 計(jì)算Critic的損失函數(shù):
$L({\theta _v}) = \frac{1}{N}\displaystyle\sum\limits_{i = 1}^N \parallel {b_{ {\theta _v} } }({{\boldsymbol{c}}_i}) - V({\pi _i}|{{\boldsymbol{c}}_i})\parallel _2^2$
(10) 使用Adam優(yōu)化器對(duì)參數(shù)$ \theta $進(jìn)行更新: $\theta = {\text{Adam(} }\theta ,{{\text{?}}_\theta }J(\theta ){\text{)} }$ (11) 使用Adam優(yōu)化器對(duì)參數(shù)$ {\theta _v} $進(jìn)行更新: ${\theta _v} = {\text{Adam(} }{\theta _v},{{\text{?}}_{ {\theta _v} } }L({\theta _v}){\text{)} }$ (12) end 下載: 導(dǎo)出CSV
表 4 4種算法下5個(gè)STA代表的平均等待時(shí)間(ms)
算法名 STA1 STA21 STA41 STA61 STA81 輪詢算法 8.73 8.83 8.73 8.60 9.01 PRA算法 5.42 7.36 10.87 13.84 16.90 自適應(yīng)分組算法 9.10 9.14 9.12 9.13 9.61 本文算法 4.49 5.65 7.97 9.31 11.56 下載: 導(dǎo)出CSV
-
[1] LEE J. OFDMA-based hybrid channel access for IEEE 802.11ax WLAN[C]. 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC), Limassol, Cyprus, 2018: 188–193. [2] BHATTARAI S, NAIK G, and PARK J M J. Uplink resource allocation in IEEE 802.11ax[C]. ICC 2019-2019 IEEE International Conference on Communications (ICC), Shanghai, China, 2019: 1–6. [3] PIRO G, GRIECO L A, BOGGIA G, et al. Two-level downlink scheduling for real-time multimedia services in LTE networks[J]. IEEE Transactions on Multimedia, 2011, 13(5): 1052–1065. doi: 10.1109/TMM.2011.2152381 [4] SAFA H and TOHME K. LTE uplink scheduling algorithms: Performance and challenges[C]. 2012 19th International Conference on Telecommunications (ICT), Jounieh, Lebanon, 2012: 1–6. [5] KARTHIK R M and PALANISWAMY S. Resource unit (RU) based OFDMA scheduling in IEEE 802.11ax system[C]. 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 2018: 1297–1302. [6] BANKOV D, DIDENKO A, KHOROV E, et al. OFDMA uplink scheduling in IEEE 802.11ax Networks[C]. 2018 IEEE International Conference on Communications (ICC), Kansas City, USA, 2018: 1–6. [7] WANG Kaidong and PSOUNIS K. Scheduling and Resource Allocation in 802.11ax[C]. IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, USA, 2018: 279–287. [8] 唐倫, 賀小雨, 王曉, 等. 基于遷移演員-評(píng)論家學(xué)習(xí)的服務(wù)功能鏈部署算法[J]. 電子與信息學(xué)報(bào), 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542TANG Lun, HE Xiaoyu, WANG Xiao, et al. Deployment algorithm of service function chain based on transfer actor-critic learning[J]. Journal of Electronics &Information Technology, 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542 [9] AFAQUI M S, GARCIA-VILLEGAS E, and LOPEZ-AGUILERA E. IEEE 802.11ax: Challenges and requirements for future high efficiency WiFi[J]. IEEE Wireless Communications, 2017, 24(3): 130–137. doi: 10.1109/MWC.2016.1600089WC [10] MACHROUH Z and NAJID A. High efficiency WLANs IEEE 802.11ax performance evaluation[C]. 2018 International Conference on Control, Automation and Diagnosis (ICCAD), Marrakech, Morocco, 2018: 1–5. [11] ZHOU Hu, LI Bo, YAN Zhongjiang, et al. An OFDMA based multiple access protocol with QoS guarantee for next generation WLAN[C]. 2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Ningbo, China, 2015: 1–6. [12] FILOSO D G, KUBO R, HARA K, et al. Proportional-based resource allocation control with QoS adaptation for IEEE 802.11ax[C]. ICC 2020-2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 2020: 1–6. [13] BAI Jiyang, FANG He, SUH J, et al. An adaptive grouping scheme in ultra-dense IEEE 802.11ax network using buffer state report based two-stage mechanism[J]. China Communications, 2019, 16(9): 31–44. doi: 10.23919/JCC.2019.09.003 [14] DUAN Ren, CHEN Xiaojiang, and XING Tianzhang. A QoS architecture for IOT[C]. 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing, Dalian, China, 2011: 717–720. [15] VINYALS O, FORTUNATO M, and JAITLY N. Pointer networks[J]. arXiv: 1506.03134, 2015. [16] BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[J]. arXiv: 1611.09940, 2017. [17] 李晨溪, 曹雷, 陳希亮, 等. 基于云推理模型的深度強(qiáng)化學(xué)習(xí)探索策略研究[J]. 電子與信息學(xué)報(bào), 2018, 40(1): 244–248. doi: 10.11999/JEIT170347LI Chenxi, CAO Lei, CHEN Xiliang, et al. Cloud reasoning model-based exploration for deep reinforcement learning[J]. Journal of Electronics &Information Technology, 2018, 40(1): 244–248. doi: 10.11999/JEIT170347 -