多無人機分布式感知任務分配-通信基站關聯(lián)與飛行策略聯(lián)合優(yōu)化設計

何江; 喻莞芯; 黃浩; 蔣衛(wèi)恒

doi:10.11999/JEIT240738

多無人機分布式感知任務分配-通信基站關聯(lián)與飛行策略聯(lián)合優(yōu)化設計

doi: 10.11999/JEIT240738 cstr: 32379.14.JEIT240738

1.
西南電子技術研究所成都 610036
2.
重慶大學微電子與通信工程學學院重慶 400044

基金項目: 重慶市教委科技攻關計劃(KJQN202203101)

詳細信息

作者簡介:
何江：男，工程師，研究方向為無人機集群技術

喻莞芯：女，碩士生，研究方向為無人機集群，多智能體技術

黃浩：男，碩士生，研究方向為通信信號處理，深度強化學習

蔣衛(wèi)恒：男，副研究員，研究方向為智能使能無線通信

通訊作者:
蔣衛(wèi)恒　whjiang@cqu.edu.cn

中圖分類號: TN929.52
計量
- 文章訪問數(shù): 358
- HTML全文瀏覽量: 157
- PDF下載量: 37
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2024-08-26
- 修回日期: 2025-02-21
- 網(wǎng)絡出版日期: 2025-03-06
- 刊出日期: 2025-05-01

Joint Task Allocation, Communication Base Station Association and Flight Strategy Optimization Design for Distributed Sensing Unmanned Aerial Vehicles

1.
Southwest China Institute of Electronic Technology, Chengdu 610036, China
2.
School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China

Funds: The Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJQN202203101)

摘要

摘要: 針對多無人機(UAV)分布式感知開展研究，為協(xié)調各UAV行為，該文設計了任務感知-數(shù)據(jù)回傳協(xié)議，并建立了UAV任務分配、數(shù)據(jù)回傳基站關聯(lián)與飛行策略聯(lián)合優(yōu)化混合整數(shù)非線性規(guī)劃問題模型。鑒于該問題數(shù)學結構的復雜性，以及集中式優(yōu)化算法設計面臨計算復雜度高且信息交互開銷大等不足，提出將該問題轉化為協(xié)作式馬爾可夫博弈(MG)，定義了基于成本-效用復合的收益函數(shù)。考慮到MG問題連續(xù)-離散動作空間復雜耦合特點，設計了基于獨立學習者(IL)的復合動作表演評論家(MA-IL-CA2C)的MG問題求解算法。仿真分析結果表明，相對于基線算法，所提算法能顯著提高系統(tǒng)收益并降低網(wǎng)絡能耗。
- 無人機 /
- 分布式感知 /
- 聯(lián)合優(yōu)化 /
- 強化學習 /
- 馬爾可夫博弈
Abstract: Objective The demand for Unmanned Aerial Vehicles (UAVs) in distributed sensing applications has increased significantly due to their low cost, flexibility, mobility, and ease of deployment. In these applications, the coordination of multi-UAV sensing tasks, communication strategies, and flight trajectory optimization presents a significant challenge. Although there have been preliminary studies on the joint optimization of UAV communication strategies and flight trajectories, most existing work overlooks the impact of the randomly distributed and dynamically updated task airspace model on the optimal design of UAV communication and flight strategies. Furthermore, accurate UAV energy consumption modeling is often lacking when establishing system design goals. Energy consumption during flight, sensing, and data transmission is a critical issue, especially given the UAV’s limited payload capacity and energy supply. Achieving an accurate energy consumption model is essential for extending UAV operational time. To address the requirements of multiple UAVs performing distributed sensing, particularly when tasks are dynamically updated and data must be transmitted to ground base stations, this paper explores the optimal design of joint UAV sensing task allocation, base station association for data backhaul, flight strategy planning, and transmit power control. Methods To coordinate the relationships among UAVs, base stations, and sensing tasks, a protocol framework for multi-UAV distributed task sensing applications is first proposed. This framework divides the UAVs’ behavior during distributed sensing into four stages: cooperation, movement, sensing, and transmission. The framework ensures coordination in the UAVs’ movement to the task area, task sensing, and the backhaul transmission of sensed data. A sensing task model based on dynamic updates, a UAV movement model, a UAV sensing behavior model, and a data backhaul transmission model are then established. A revenue function, combining task sensing utility and task execution costs, is designed, leading to a joint optimization problem of UAV task allocation, communication base station association, and flight strategy. The objective is to maximize the long-term weighted utility-cost. Given that the optimization problem involves high-dimensional decision variables in both discrete and continuous forms, and the objective function is non-convex with respect to these variables, the problem is a typical non-convex Mixed-Integer Non-Linear Programming (MINLP) problem. It falls within the NP-Hard complexity class. Centralized optimization algorithms for this formulation require a central node with high computational capacity and the collection of substantial additional information, such as channel state and UAV location data. This results in high information-interaction overhead and poor scalability. To overcome these challenges, the problem is reformulated as a Markov Game (MG). An effective algorithm is designed by leveraging the distributed coordination concept of Multi-Agent (MA) systems and the exploration capability of deep Reinforcement Learning (RL) within the optimization solution space. Specifically, due to the complex coupling between the continuous and discrete action spaces in the MG problem, a novel solution algorithm called Multi-Agent Independent-Learning Compound-Action Actor-Critic (MA-IL-CA2C) based on Independent Learning (IL) is proposed. The core idea is as follows: first, the independent-learning algorithm is applied to extend single-agent RL to a MA environment. Then, deep learning is used to represent the high-dimensional action and state spaces. To handle the combined discrete and continuous action spaces, the UAV action space is decomposed into discrete and continuous components, with the DQN algorithm applied to the discrete space and the DDPG algorithm to the continuous space. Results and Discussions The computational complexity of action selection and training for the proposed MA-IL-CA2C algorithm is theoretically analyzed. The results show that its complexity is almost equivalent to that of the two benchmark algorithms, DQN and DDPG. Additionally, the performance of the proposed algorithm is simulated and analyzed. When compared with the DQN, DDPG, and Greedy algorithms, the MA-IL-CA2C algorithm demonstrates lower network energy consumption throughout the network operation (Fig. 6), improved system revenue (Fig. 5, Fig. 8, and Fig. 9), and optimized UAV flight strategies (Fig. 7). Conclusions This paper addresses and solves the optimal design problems of joint UAV sensing task allocation, data backhaul base station association, flight strategy planning, and transmit power control for multi-UAV distributed task sensing. A new MA-IL-CA2C algorithm based on IL is proposed. The simulation results show that the proposed algorithm achieves better system revenue while minimizing UAV energy consumption.
- Unmanned Aerial Vehicle(UAV) /
- Distributed sensing /
- Joint optimization /
- Reinforcement Learning (RL) /
- Markov Game(MG)

HTML全文

圖 1 面向分布式感知應用的UAV網(wǎng)絡場景

下載: 全尺寸圖片幻燈片

圖 2 感知-傳輸協(xié)議時隙結構

下載: 全尺寸圖片幻燈片

圖 3 ${\text{UA}}{{\text{V}}_n}$飛行方向角${\boldsymbol{\delta}} _n^t = \left( {\alpha _n^t,\beta _n^t} \right)$

下載: 全尺寸圖片幻燈片

圖 4 ${\text{UA}}{{\text{V}}_n}$使用MA-IL-CA2C算法進行聯(lián)合通信策略與飛行策略優(yōu)化設計

下載: 全尺寸圖片幻燈片

圖 5 不同算法之間的系統(tǒng)收益對比

下載: 全尺寸圖片幻燈片

圖 6 不同算法之間的系統(tǒng)成本對比

下載: 全尺寸圖片幻燈片

圖 7 UAV在使用不同DRL算法下的3D飛行軌跡任務

下載: 全尺寸圖片幻燈片

圖 8 不同算法所選任務平均收益對比

下載: 全尺寸圖片幻燈片

圖 9 MA-IL-CA2C算法在不同功率分配與速度控制考慮情況下系統(tǒng)收益對比

下載: 全尺寸圖片幻燈片

1 MA-IL-CA2C算法

(1)初始化：設置$t = 0$，最大決策周期數(shù)$T$，選擇經(jīng)驗回放模塊　容量$ {N_{\mathrm{c}}} $，批量大小${N_{\mathrm}}$，網(wǎng)絡學習率${\alpha _{{\boldsymbol{\theta}} _n^t}}$和$ {\alpha _{{\boldsymbol{\omega}} _n^t}} $，軟更新參數(shù) 　$ \rho $；
(2)對于每個智能體$n \in \mathcal{N}$：
隨機初始化網(wǎng)絡參數(shù)$ {{\boldsymbol{\theta}} }_n^t $, $ {\hat {\boldsymbol{\theta}} }_n^t $, $ {{\boldsymbol{\omega}} }_n^t $, $ {\hat {\boldsymbol{\omega}} }_n^t $，并設置初始狀態(tài)${{\boldsymbol s}^0}$；
#主循環(huán)
(3)如果$t \le T$：
(a)對于每個智能體$n \in \mathcal{N}$：
根據(jù)式(28)，在${\boldsymbol{s}}_n^t$處選擇離散動作$ {\boldsymbol a}_n^{{\text{dis}},t} $，即選擇感知任務$m$和$ {\text{B}}{{\text{S}}_k} $；
#協(xié)作階段
在控制信道上反饋決策$D_n^{\mathrm{c}} = \left\{ {n,{\boldsymbol a}_n^{{\mathrm{dis}},t}} \right\}$，并接收其余　　　UAV的決策信息；
根據(jù)離散動作$ {\boldsymbol a}_n^{{\mathrm{dis}},t} $決定連續(xù)動作${\boldsymbol a}_n^{{\text{con}},t}{ = v}_n^t\left( {{{\boldsymbol s}^t},{\boldsymbol a}_n^{{\mathrm{dis}},t}} \right)$，　　　即決定飛行方向角$ \delta _n^t $、移動速度$ v_n^t $和發(fā)射功率$ P_n^t $；
#移動階段
基于飛行方向角$ {\boldsymbol{\delta}} _n^t $和移動速度$ v_n^t $，飛行至感知位置$ {\boldsymbol{x}}_n^{{\mathrm{s}},t} $；
#感知階段
執(zhí)行感知任務并收集任務數(shù)據(jù)$D_n^{s,t}$；
#傳輸階段
以發(fā)射功率$ P_n^t $將任務數(shù)據(jù)回傳給$ {\text{B}}{{\text{S}}_k} $；
根據(jù)式(23)獲得收益$ r_n^{t + 1} $，觀察得到${{\boldsymbol s}^{t + 1}}$；
將經(jīng)驗元組$ \left( {{{\boldsymbol s}^t},{\boldsymbol a}_n^t,r_n^{t + 1},{{\boldsymbol s}^{t + 1}}} \right) $存入經(jīng)驗回放模塊${\mathcal{D}_n}$中；
如果$ t \gt {N_{\mathrm{c}}} $：
從經(jīng)驗回放模塊${\mathcal{D}_n}$中移除舊的經(jīng)驗元組；
#訓練網(wǎng)絡
在經(jīng)驗回放模塊${\mathcal{D}_n}$中隨機抽取一個批量${N_{\mathrm}}$的經(jīng)驗元組　　　$ \left( {{{\boldsymbol s}^t},{\boldsymbol a}_n^t,r_n^{t + 1},{{\boldsymbol s}^{t + 1}}} \right) $；
根據(jù)式(29)–式(34)，更新當前網(wǎng)絡參數(shù)$ {{\boldsymbol{\theta}} }_n^t $與$ {{\boldsymbol{\omega}} }_n^t $；
根據(jù)式(36)和式(37)，更新目標網(wǎng)絡參數(shù)$ {\hat {\boldsymbol{\theta}} }_n^t $與$ {\hat {\boldsymbol{\omega}} }_n^t $；
(b)令$t = t + 1$, ${{\boldsymbol s}^t} \leftarrow {{\boldsymbol s}^{t + 1}}$；
(4)重復步驟(3)，直至算法結束。

下載: 導出CSV

表 1 仿真參數(shù)

參數(shù)	數(shù)值
UAV數(shù)目$N$，感知任務數(shù)目$M$，BS數(shù)目$K$	3, 10, 2
網(wǎng)絡范圍半徑${r_{\text{c}}}$	500 m
信道帶寬$ W $	1 MHz
BS高度$ {H_0} $	25 m
UAV最大與最低高度${h_{\min }},{h_{\max }}$	50 m, 100 m
UAV最大飛行速度$ {v_{\max }} $	15 m/s
UAV最大發(fā)射功率$ {P_{\max }} $	30 dBm
感知參數(shù)$\lambda $	0.01
環(huán)境參數(shù)${\mathrm{a}},{\mathrm}$	9.61, 0.16
LoS和NLoS額外路徑損耗${\eta ^{{\text{LoS}}}},{\eta ^{{\text{NLoS}}}}$	1 dB, 20 dB
載波頻率${f_{\text{c}}}$	2 GHz
噪聲功率${N_0}$	–96 dBm

下載: 導出CSV

表 2 模型超參數(shù)

超參數(shù)	數(shù)值
Actor網(wǎng)絡與Critic網(wǎng)絡初始學習率$ {\alpha _{{\boldsymbol{\theta}} _n^t}} $,$ {\alpha _{{\boldsymbol{\omega}} _n^t}} $	0.001, 0.002
軟更新權重$\rho $	0.01
貪婪率$\varepsilon $	0.1
激活函數(shù)	ReLu
批量大小${N_{\text}}$	64
經(jīng)驗回放模塊大小${N_{\text{c}}}$	20 000
DQN網(wǎng)絡初始學習率	0.01
DQN目標網(wǎng)絡更新周期	100
Actor網(wǎng)絡和Critic網(wǎng)絡層數(shù)	4,4
隱層神經(jīng)元數(shù)	128

下載: 導出CSV

參考文獻(28)

[1]	SHRESTHA R, ROMERO D, and CHEPURI S P. Spectrum surveying: Active radio map estimation with autonomous UAVs[J]. IEEE Transactions on Wireless Communications, 2023, 22(1): 627–641. doi: 10.1109/TWC.2022.3197087.
[2]	NOMIKOS N, GKONIS P K, BITHAS P S, et al. A survey on UAV-aided maritime communications: Deployment considerations, applications, and future challenges[J]. IEEE Open Journal of the Communications Society, 2023, 4: 56–78. doi: 10.1109/OJCOMS.2022.3225590.
[3]	HARIKUMAR K, SENTHILNATH J, and SUNDARAM S. Multi-UAV oxyrrhis marina-inspired search and dynamic formation control for forest firefighting[J]. IEEE Transactions on Automation Science and Engineering, 2019, 16(2): 863–873. doi: 10.1109/TASE.2018.2867614.
[4]	QU Yuben, SUN Hao, DONG Chao, et al. Elastic collaborative edge intelligence for UAV Swarm: Architecture, challenges, and opportunities[J]. IEEE Communications Magazine, 2024, 62(1): 62–68. doi: 10.1109/MCOM.002.2300129.
[5]	ZHANG Tao, ZHU Kun, ZHENG Shaoqiu, et al. Trajectory design and power control for joint radar and communication enabled multi-UAV cooperative detection systems[J]. IEEE Transactions on Communications, 2023, 71(1): 158–172. doi: 10.1109/TCOMM.2022.3224751.
[6]	PAN Hongyang, LIU Yanheng, SUN Geng, et al. Joint power and 3D trajectory optimization for UAV-Enabled wireless powered communication networks with obstacles[J]. IEEE Transactions on Communications, 2023, 71(4): 2364–2380. doi: 10.1109/TCOMM.2023.3240697.
[7]	NGUYEN P X, NGUYEN V D, NGUYEN H V, et al. UAV-assisted secure communications in terrestrial cognitive radio networks: Joint power control and 3D trajectory optimization[J]. IEEE Transactions on Vehicular Technology, 2021, 70(4): 3298–3313. doi: 10.1109/TVT.2021.3062283.
[8]	ZENG Shuhao, ZHANG Hongliang, DI Boya, et al. Trajectory optimization and resource allocation for OFDMA UAV relay networks[J]. IEEE Transactions on Wireless Communications, 2021, 20(10): 6634–6647. doi: 10.1109/TWC.2021.3075594.
[9]	LI Peiming and XU Jie. Fundamental rate limits of UAV-enabled multiple access channel with trajectory optimization[J]. IEEE Transactions on Wireless Communications, 2020, 19(1): 458–474. doi: 10.1109/TWC.2019.2946153.
[10]	GUAN Yue, ZOU Sai, PENG Haixia, et al. Cooperative UAV trajectory design for disaster area emergency communications: A multiagent PPO method[J]. IEEE Internet of Things Journal, 2024, 11(5): 8848–8859. doi: 10.1109/JIOT.2023.3320796.
[11]	SILVIRIANTI, NAROTTAMA B, and SHIN S Y. Layerwise quantum deep reinforcement learning for joint optimization of UAV trajectory and resource allocation[J]. IEEE Internet of Things Journal, 2024, 11(1): 430–443. doi: 10.1109/JIOT.2023.3285968.
[12]	HU Jingzhi, ZHANG Hongliang, SONG Lingyang, et al. Cooperative internet of UAVs: Distributed trajectory design by multi-agent deep reinforcement learning[J]. IEEE Transactions on Communications, 2020, 68(11): 6807–6821. doi: 10.1109/TCOMM.2020.3013599.
[13]	WU Fanyi, ZHANG Hongliang, WU Jianjun, et al. Cellular UAV-to-device communications: Trajectory design and mode selection by Multi-Agent deep reinforcement learning[J]. IEEE Transactions on Communications, 2020, 68(7): 4175–4189. doi: 10.1109/TCOMM.2020.2986289.
[14]	DAI Xunhua, LU Zhiyu, CHEN Xuehan, et al. Multiagent RL-based joint trajectory scheduling and resource allocation in NOMA-assisted UAV swarm network[J]. IEEE Internet of Things Journal, 2024, 11(8): 14153–14167. doi: 10.1109/JIOT.2023.3340669.
[15]	ZHANG Zhongyu, LIU Yunpeng, LIU Tianci, et al. DAGN: A real-time UAV remote sensing image vehicle detection framework[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 17(11): 1884–1888. doi: 10.1109/LGRS.2019.2956513.
[16]	YANG Jun, YOU Xinghui, WU Gaoxiang, et al. Application of reinforcement learning in UAV cluster task scheduling[J]. Future Generation Computer Systems, 2019, 95: 140–148. doi: 10.1016/j.future.2018.11.014.
[17]	NOBAR S K, AHMED M H, MORGAN Y, et al. Resource allocation in cognitive radio-enabled UAV communication[J]. IEEE Transactions on Cognitive Communications and Networking, 2022, 8(1): 296–310. doi: 10.1109/TCCN.2021.3103531.
[18]	CHEN Jiming, LI Junkun, and LAI T H. Energy-efficient intrusion detection with a barrier of probabilistic sensors: Global and local[J]. IEEE Transactions on Wireless Communications, 2013, 12(9): 4742–4755. doi: 10.1109/TW.2013.072313.122083.
[19]	SHAKHOV V V and KOO I. Experiment design for parameter estimation in probabilistic sensing models[J]. IEEE Sensors Journal, 2017, 17(24): 8431–8437. doi: 10.1109/JSEN.2017.2766089.
[20]	YANG Qianqian, HE Shibo, LI Junkun, et al. Energy-efficient probabilistic area coverage in wireless sensor networks[J]. IEEE Transactions on Vehicular Technology, 2015, 64(1): 367–377. doi: 10.1109/TVT.2014.2300181.
[21]	AL-HOURANI A, KANDEEPAN S, and LARDNER S. Optimal LAP altitude for maximum coverage[J]. IEEE Wireless Communications Letters, 2014, 3(6): 569–572. doi: 10.1109/LWC.2014.2342736.
[22]	ZHANG Xinyu and SHIN K G. E-MiLi: Energy-minimizing idle listening in wireless networks[J]. IEEE Transactions on Mobile Computing, 2012, 11(9): 1441–1454. doi: 10.1109/TMC.2012.112.
[23]	ZHU Changxi, DASTANI M, and WANG Shihan. A survey of multi-agent deep reinforcement learning with communication[J]. Autonomous Agents and Multi-Agent Systems, 2024, 38(1): 4. doi: 10.1007/s10458-023-09633-6.
[24]	喻莞芯. 基于多智能體強化學習的無人機集群網(wǎng)絡優(yōu)化設計[D]. [碩士論文], 重慶大學, 2022. doi: 10.27670/d.cnki.gcqdu.2022.001082. YU Wanxin. Optimization design of UAV cluster network based on multi-agent reinforcement learning[D]. [Master dissertation], Chongqing University, 2022. doi: 10.27670/d.cnki.gcqdu.2022.001082.
[25]	SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge, USA: MIT Press, 1998.
[26]	WOOD L F. Training neural networks[P]. US, 4914603A, 1990.
[27]	SIPPER M. A serial complexity measure of neural networks[C]. IEEE International Conference on Neural Networks, San Francisco, USA, 1993: 962–966. doi: 10.1109/ICNN.1993.298687.
[28]	GUO Shaoai and ZHAO Xiaohui. Multi-agent deep reinforcement learning based transmission latency minimization for delay-sensitive cognitive satellite-UAV networks[J]. IEEE Transactions on Communications, 2023, 71(1): 131–144. doi: 10.1109/TCOMM.2022.3222460.