基于強(qiáng)化學(xué)習(xí)的802.11ax上行鏈路調(diào)度算法

黃新林; 鄭人華

doi:10.11999/JEIT210590

基于強(qiáng)化學(xué)習(xí)的802.11ax上行鏈路調(diào)度算法

doi: 10.11999/JEIT210590 cstr: 32379.14.JEIT210590

黃新林,
鄭人華^,

同濟(jì)大學(xué)電子與信息工程學(xué)院上海 201800

基金項(xiàng)目: 國(guó)家自然科學(xué)基金(62071332)，上海市青年科技啟明星計(jì)劃(19QA1409100)，中央高?；究蒲袠I(yè)務(wù)費(fèi)專項(xiàng)資金

詳細(xì)信息

作者簡(jiǎn)介:
黃新林：男，1985年生，教授，博士生導(dǎo)師，研究方向?yàn)闄C(jī)器學(xué)習(xí)與智能通信

鄭人華：男，1996年生，碩士生，研究方向?yàn)閺?qiáng)化學(xué)習(xí)與智能通信

通訊作者:
鄭人華　471539350@qq.com

中圖分類(lèi)號(hào): TN915; TP393
計(jì)量
- 文章訪問(wèn)數(shù): 940
- HTML全文瀏覽量: 623
- PDF下載量: 120
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2021-06-17
- 修回日期: 2022-01-16
- 錄用日期: 2022-01-14
- 網(wǎng)絡(luò)出版日期: 2022-02-02
- 刊出日期: 2022-05-25

802.11ax Uplink Scheduling Algorithm Based on Reinforcement Learning

HUANG Xinlin,
ZHENG Renhua^,

College of Electronic and Information Engineering, Tongji University, Shanghai 201800, China

Funds: The National Natural Science Foundation of China (62071332), Shanghai Rising-Star Program (19QA1409100), The Fundamental Research Funds for the Central Universities

摘要

摘要: 隨著物聯(lián)網(wǎng)(IoT)時(shí)代的到來(lái)，無(wú)線網(wǎng)絡(luò)飽和的問(wèn)題已經(jīng)越來(lái)越嚴(yán)重。為了克服終端密集接入問(wèn)題，IEEE標(biāo)準(zhǔn)協(xié)會(huì)(IEEE-SA)制定了無(wú)線局域網(wǎng)的最新標(biāo)準(zhǔn)—IEEE 802.11ax。該標(biāo)準(zhǔn)使用正交頻分多址(OFDMA)技術(shù)對(duì)無(wú)線信道資源進(jìn)行了更細(xì)致的劃分，劃分出的子信道被稱為資源單元(RU)。為解決密集用戶環(huán)境下802.11ax 上行鏈路的信道資源調(diào)度問(wèn)題，該文提出一種基于強(qiáng)化學(xué)習(xí)的RU調(diào)度算法。該算法使用演員-評(píng)論家(Actor-Critic)算法訓(xùn)練指針網(wǎng)絡(luò)，解決了自適應(yīng)RU調(diào)度問(wèn)題，最終合理分配RU資源給各用戶，兼具優(yōu)先級(jí)和公平性的保障。仿真結(jié)果表明，該調(diào)度算法在IEEE 802.11ax上行鏈路中比傳統(tǒng)的調(diào)度方式更有效，具有較強(qiáng)的泛化能力，適合應(yīng)用在密集用戶環(huán)境下的物聯(lián)網(wǎng)場(chǎng)景中。
- 物聯(lián)網(wǎng) /
- IEEE 802.11ax /
- 強(qiáng)化學(xué)習(xí) /
- 上行鏈路 /
- 演員-評(píng)論家
Abstract: With the arrival of the Internet of Things (IoT) era, the problem of wireless network saturation has become more and more serious. In order to overcome this problem, the IEEE Standards Association (IEEE-SA) has formulated the latest standard for wireless local area networks—IEEE 802.11ax. In this standard, the Orthogonal Frequency Division Multiple Access (OFDMA) technology is utilized to divide wireless channel into several groups of tones, and the divided sub-channels are called Resource Units (RUs). In order to solve the channel resource scheduling problem of 802.11ax uplink in dense user environments, an RU scheduling algorithm based on reinforcement learning is proposed in this paper. The Actor-Critic algorithm is used to train the pointer network and solve the adaptive allocation problem of RU. Finally, RUs are allocated to each user reasonably with the guarantee of priority and fairness. The simulation results show that the scheduling algorithm is more effective than traditional scheduling methods in the IEEE 802.11ax uplink and has a strong generalization ability, which is suitable for the IoT scenario in dense user environments.
- Internet of Things (IoT) /
- IEEE 802.11ax /
- Reinforcement learning /
- Uplink /
- Actor-Critic

HTML全文

圖 1 使用各種大小的RU劃分20 MHz的信道

下載: 全尺寸圖片幻燈片

圖 2 基于OFDMA的802.11ax上行鏈路調(diào)度接入過(guò)程

下載: 全尺寸圖片幻燈片

圖 3 指針網(wǎng)絡(luò)結(jié)構(gòu)圖

下載: 全尺寸圖片幻燈片

圖 4 本文算法的吞吐量隨時(shí)間變化的仿真結(jié)果

下載: 全尺寸圖片幻燈片

圖 5 4種算法下STA₁和STA₆₃的吞吐量隨時(shí)間變化的仿真結(jié)果

下載: 全尺寸圖片幻燈片

圖 6 4種算法上行鏈路數(shù)據(jù)流總價(jià)值隨時(shí)間變化的仿真結(jié)果

下載: 全尺寸圖片幻燈片

圖 7 4種算法上行鏈路數(shù)據(jù)流平均總價(jià)值與STA數(shù)量的關(guān)系

下載: 全尺寸圖片幻燈片

表 1 QoS值與業(yè)務(wù)類(lèi)型對(duì)應(yīng)關(guān)系

QoS	業(yè)務(wù)類(lèi)型
1	探測(cè)請(qǐng)求、火災(zāi)報(bào)警、交通事故報(bào)警等
2	患者監(jiān)測(cè)、工業(yè)設(shè)備監(jiān)測(cè)等
3	智能家居、智慧農(nóng)業(yè)、倉(cāng)儲(chǔ)管理等
4	監(jiān)控視頻、智能水表、智能電表等
5	信道質(zhì)量指示符、無(wú)線電測(cè)量服務(wù)等

下載: 導(dǎo)出CSV

表 2 不同MCS與不同RU大小情況下的數(shù)據(jù)傳輸速率(Mbps)

MCS索引	MCS	26 tones	52 tones	106 tones	242 tones	484 tones	996 tones
1	BPSK, 1/2	0.8	1.7	3.5	8.1	46.3	34.0
2	QPSK, 1/2	1.7	3.3	7.1	16.3	32.5	68.1
3	QPSK, 3/4	2.5	5.0	10.6	24.4	48.8	102.1
4	16-QAM, 1/2	3.3	6.7	14.2	32.5	65.0	136.1
5	16-QAM, 3/4	5.0	10.0	21.3	48.8	97.5	204.2
6	64-QAM, 2/3	6.7	13.3	28.3	65.0	130.0	272.2
7	64-QAM, 3/4	7.5	15.0	31.9	73.1	146.3	306.3
8	64-QAM, 5/6	8.3	16.7	35.4	81.3	162.5	340.3
9	256-QAM, 3/4	10.0	20.0	42.5	97.5	195.0	408.3
10	256-QAM, 5/6	11.1	22.2	47.2	108.3	216.7	453.7
11	1024-QAM, 3/4	–	–	–	121.9	243.8	510.4

下載: 導(dǎo)出CSV

表 3 Actor-Critic算法訓(xùn)練指針網(wǎng)絡(luò)的過(guò)程

(1) 初始化超參數(shù)，初始化訓(xùn)練集$ {C^{{\text{in}}}} $，設(shè)置訓(xùn)練總步長(zhǎng)$ T $，設(shè)置　　批次數(shù)$ N $
(2) 初始化指針網(wǎng)絡(luò)參數(shù)$ \theta $
(3) 初始化Critic網(wǎng)絡(luò)參數(shù)$ {\theta _v} $
(4) for t = 1 to $ T $：
(5) 從訓(xùn)練集中獲取輸入：
${c_i}{ {\sim {\rm{SampleInput} }(} }{C^{ {\text{in} } } }){\text{ for } }i \in \{ 1,2,\cdots,N\}$
(6) 　　使用$ \theta $選出物品子集：　${\pi _i}\sim{\text{SampleSolution(} }{p_\theta }(.\|{c_i}){\text{) for } }i \in \{ 1,2,\cdots,N\}$
(7) 　　使用$ {\theta _v} $計(jì)算基線值：　　　　 $b({c_i}) = {b_{ {\theta _v} } }({{\boldsymbol{c}}_i}){\text{ for } }i \in \{ 1,2,\cdots,N\}$
(8) 　　計(jì)算Actor目標(biāo)函數(shù)的梯度：　　　　${{\text{?}}_\theta }J(\theta ) = \dfrac{1}{N}\displaystyle\sum\limits_{i = 1}^N ( V({\pi _i}\|{{\boldsymbol{c}}_i}) - b({c_i})){{\text{?}}_\theta }\ln {p_\theta }({\pi _i}\|{{\boldsymbol{c}}_i})$
(9) 　　計(jì)算Critic的損失函數(shù)：　　　　$L({\theta _v}) = \frac{1}{N}\displaystyle\sum\limits_{i = 1}^N \parallel {b_{ {\theta _v} } }({{\boldsymbol{c}}_i}) - V({\pi _i}\|{{\boldsymbol{c}}_i})\parallel _2^2$
(10) 　　使用Adam優(yōu)化器對(duì)參數(shù)$ \theta $進(jìn)行更新：
$\theta = {\text{Adam(} }\theta ,{{\text{?}}_\theta }J(\theta ){\text{)} }$
(11) 　　　使用Adam優(yōu)化器對(duì)參數(shù)$ {\theta _v} $進(jìn)行更新：
${\theta _v} = {\text{Adam(} }{\theta _v},{{\text{?}}_{ {\theta _v} } }L({\theta _v}){\text{)} }$
(12) end

下載: 導(dǎo)出CSV

表 4 4種算法下5個(gè)STA代表的平均等待時(shí)間(ms)

算法名	STA₁	STA₂₁	STA₄₁	STA₆₁	STA₈₁
輪詢算法	8.73	8.83	8.73	8.60	9.01
PRA算法	5.42	7.36	10.87	13.84	16.90
自適應(yīng)分組算法	9.10	9.14	9.12	9.13	9.61
本文算法	4.49	5.65	7.97	9.31	11.56

下載: 導(dǎo)出CSV

參考文獻(xiàn)(17)

[1]	LEE J. OFDMA-based hybrid channel access for IEEE 802.11ax WLAN[C]. 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC), Limassol, Cyprus, 2018: 188–193.
[2]	BHATTARAI S, NAIK G, and PARK J M J. Uplink resource allocation in IEEE 802.11ax[C]. ICC 2019-2019 IEEE International Conference on Communications (ICC), Shanghai, China, 2019: 1–6.
[3]	PIRO G, GRIECO L A, BOGGIA G, et al. Two-level downlink scheduling for real-time multimedia services in LTE networks[J]. IEEE Transactions on Multimedia, 2011, 13(5): 1052–1065. doi: 10.1109/TMM.2011.2152381
[4]	SAFA H and TOHME K. LTE uplink scheduling algorithms: Performance and challenges[C]. 2012 19th International Conference on Telecommunications (ICT), Jounieh, Lebanon, 2012: 1–6.
[5]	KARTHIK R M and PALANISWAMY S. Resource unit (RU) based OFDMA scheduling in IEEE 802.11ax system[C]. 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore, India, 2018: 1297–1302.
[6]	BANKOV D, DIDENKO A, KHOROV E, et al. OFDMA uplink scheduling in IEEE 802.11ax Networks[C]. 2018 IEEE International Conference on Communications (ICC), Kansas City, USA, 2018: 1–6.
[7]	WANG Kaidong and PSOUNIS K. Scheduling and Resource Allocation in 802.11ax[C]. IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, USA, 2018: 279–287.
[8]	唐倫, 賀小雨, 王曉, 等. 基于遷移演員-評(píng)論家學(xué)習(xí)的服務(wù)功能鏈部署算法[J]. 電子與信息學(xué)報(bào), 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542 TANG Lun, HE Xiaoyu, WANG Xiao, et al. Deployment algorithm of service function chain based on transfer actor-critic learning[J]. Journal of Electronics &Information Technology, 2020, 42(11): 2671–2679. doi: 10.11999/JEIT190542
[9]	AFAQUI M S, GARCIA-VILLEGAS E, and LOPEZ-AGUILERA E. IEEE 802.11ax: Challenges and requirements for future high efficiency WiFi[J]. IEEE Wireless Communications, 2017, 24(3): 130–137. doi: 10.1109/MWC.2016.1600089WC
[10]	MACHROUH Z and NAJID A. High efficiency WLANs IEEE 802.11ax performance evaluation[C]. 2018 International Conference on Control, Automation and Diagnosis (ICCAD), Marrakech, Morocco, 2018: 1–5.
[11]	ZHOU Hu, LI Bo, YAN Zhongjiang, et al. An OFDMA based multiple access protocol with QoS guarantee for next generation WLAN[C]. 2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Ningbo, China, 2015: 1–6.
[12]	FILOSO D G, KUBO R, HARA K, et al. Proportional-based resource allocation control with QoS adaptation for IEEE 802.11ax[C]. ICC 2020-2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 2020: 1–6.
[13]	BAI Jiyang, FANG He, SUH J, et al. An adaptive grouping scheme in ultra-dense IEEE 802.11ax network using buffer state report based two-stage mechanism[J]. China Communications, 2019, 16(9): 31–44. doi: 10.23919/JCC.2019.09.003
[14]	DUAN Ren, CHEN Xiaojiang, and XING Tianzhang. A QoS architecture for IOT[C]. 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing, Dalian, China, 2011: 717–720.
[15]	VINYALS O, FORTUNATO M, and JAITLY N. Pointer networks[J]. arXiv: 1506.03134, 2015.
[16]	BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcement learning[J]. arXiv: 1611.09940, 2017.
[17]	李晨溪, 曹雷, 陳希亮, 等. 基于云推理模型的深度強(qiáng)化學(xué)習(xí)探索策略研究[J]. 電子與信息學(xué)報(bào), 2018, 40(1): 244–248. doi: 10.11999/JEIT170347 LI Chenxi, CAO Lei, CHEN Xiliang, et al. Cloud reasoning model-based exploration for deep reinforcement learning[J]. Journal of Electronics &Information Technology, 2018, 40(1): 244–248. doi: 10.11999/JEIT170347