有向無環(huán)圖區(qū)塊鏈輔助深度強化學(xué)習(xí)的智能駕駛策略優(yōu)化算法

黃曉舸; 李春磊; 黎文靜; 梁承超; 陳前斌

doi:10.11999/JEIT240407

有向無環(huán)圖區(qū)塊鏈輔助深度強化學(xué)習(xí)的智能駕駛策略優(yōu)化算法

doi: 10.11999/JEIT240407 cstr: 32379.14.JEIT240407

重慶郵電大學(xué)通信與信息工程學(xué)院重慶 400065

基金項目: 國家自然科學(xué)基金(62371082, 62001076)，廣西科技計劃(AB24010317)，重慶市自然科學(xué)基金(CSTB2023NSCQ-MSX0726, cstc2020jcyj-msxmX0878)

詳細信息

作者簡介:
黃曉舸：女，博士，研究方向為移動通信技術(shù)、網(wǎng)絡(luò)優(yōu)化，區(qū)塊鏈，物聯(lián)網(wǎng)相關(guān)技術(shù)

李春磊：男，碩士生，研究方向為移動通信技術(shù)、分布式學(xué)習(xí)、區(qū)塊鏈、智能駕駛相關(guān)技術(shù)

黎文靜：女，碩士生，研究方向為移動通信技術(shù)、分布式學(xué)習(xí)、區(qū)塊鏈、車聯(lián)網(wǎng)相關(guān)技術(shù)

梁承超：男，博士，教授，研究方向無線通信、空天地一體化網(wǎng) 絡(luò)、(衛(wèi)星)互聯(lián)網(wǎng)架構(gòu)與協(xié)議

陳前斌：男，博士，教授，研究方向為新一代移動通信網(wǎng)絡(luò)、未來網(wǎng)絡(luò)、LTE-Advanced異構(gòu)小蜂窩網(wǎng)絡(luò)

通訊作者:
黃曉舸　huangxg@cqupt.edu.cn

中圖分類號: TN92
計量
- 文章訪問數(shù): 321
- HTML全文瀏覽量: 154
- PDF下載量: 59
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2024-05-25
- 修回日期: 2024-11-13
- 網(wǎng)絡(luò)出版日期: 2024-11-19
- 刊出日期: 2024-12-01

An Intelligent Driving Strategy Optimization Algorithm Assisted by Direct Acyclic Graph Blockchain and Deep Reinforcement Learning

School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Funds: The National Natural Science Foundation of China (62371082, 62001076), Guangxi Science and Technology Project (AB24010317), The Natural Science Foundation of Chongqing (CSTB2023NSCQ-MSX0726, cstc2020jcyj-msxmX0878)

摘要

摘要: 深度強化學(xué)習(xí)(DRL)在智能駕駛決策中的應(yīng)用日益廣泛，通過與環(huán)境的持續(xù)交互，能夠有效提高智能駕駛系統(tǒng)的決策能力。然而，DRL在實際應(yīng)用中面臨學(xué)習(xí)效率低和數(shù)據(jù)共享安全性差的問題。為了解決這些問題，該文提出一種基于有向無環(huán)圖(DAG)區(qū)塊鏈輔助深度強化學(xué)習(xí)的智能駕駛策略優(yōu)化(D-IDSO)算法。首先，構(gòu)建了基于DAG區(qū)塊鏈的雙層安全數(shù)據(jù)共享架構(gòu)，以確保模型數(shù)據(jù)共享的效率和安全性。其次，設(shè)計了一個基于DRL的智能駕駛決策模型，綜合考慮安全性、舒適性和高效性設(shè)定多目標獎勵函數(shù)，優(yōu)化智能駕駛決策。此外，提出了一種改進型優(yōu)先經(jīng)驗回放的雙延時確定策略梯度(IPER-TD3)方法，以提升訓(xùn)練效率。最后，在CARLA仿真平臺中選取制動和變道場景對智能網(wǎng)聯(lián)汽車(CAV)進行訓(xùn)練。實驗結(jié)果表明，所提算法顯著提高了智能駕駛場景中模型訓(xùn)練效率，在確保模型數(shù)據(jù)安全共享的基礎(chǔ)上，有效提升了智能駕駛的安全性、舒適性和高效性。
- 智能駕駛 /
- 數(shù)據(jù)共享 /
- 深度強化學(xué)習(xí) /
- 有向無環(huán)圖
Abstract: The application of Deep Reinforcement Learning (DRL) in intelligent driving decision-making is increasingly widespread, as it effectively enhances decision-making capabilities through continuous interaction with the environment. However, DRL faces challenges in practical applications due to low learning efficiency and poor data-sharing security. To address these issues, a Directed Acyclic Graph (DAG)blockchain-assisted deep reinforcement learning Intelligent Driving Strategy Optimization (D-IDSO) algorithm is proposed. First, a dual-layer secure data-sharing architecture based on DAG blockchain is constructed to ensure the efficiency and security of model data sharing. Next, a DRL-based intelligent driving decision model is designed, incorporating a multi-objective reward function that optimizes decision-making by jointly considering safety, comfort, and efficiency. Additionally, an Improved Prioritized Experience Replay with Twin Delayed Deep Deterministic policy gradient (IPER-TD3) method is proposed to enhance training efficiency. Finally, braking and lane-changing scenarios are selected in the CARLA simulation platform to train Connected and Automated Vehicles (CAVs). Experimental results demonstrate that the proposed algorithm significantly improves model training efficiency in intelligent driving scenarios, while ensuring data security and enhancing the safety, comfort, and efficiency of intelligent driving.
- Intelligent driving /
- Data sharing /
- Deep Reinforcement Learning(DRL) /
- Directed Acyclic Graph(DAG)

HTML全文

圖 1 基于DAG區(qū)塊鏈的雙層安全數(shù)據(jù)共享車聯(lián)網(wǎng)架構(gòu)

下載: 全尺寸圖片幻燈片

圖 2 兩種典型駕駛場景

下載: 全尺寸圖片幻燈片

圖 3 不同智能駕駛策略下模型訓(xùn)練平均獎勵變化

下載: 全尺寸圖片幻燈片

圖 4 不同智能駕駛策略下制動模型測試

下載: 全尺寸圖片幻燈片

圖 5 不同智能駕駛策略下變道模型測試

下載: 全尺寸圖片幻燈片

圖 6 不同智能駕駛策略下變道軌跡

下載: 全尺寸圖片幻燈片

圖 7 CARLA仿真平臺中協(xié)同變道示意圖

下載: 全尺寸圖片幻燈片

圖 8 不同經(jīng)驗回放算法的平均獎勵及其標準差變化

下載: 全尺寸圖片幻燈片

1 基于DAG區(qū)塊鏈輔助DRL的智能駕駛策略優(yōu)化算法

輸入：Critic網(wǎng)絡(luò)初始參數(shù)，Actor網(wǎng)絡(luò)初始參數(shù)，本地迭代輪次　E，學(xué)習(xí)率η，折現(xiàn)因子γ和更新率τ；
輸出：最優(yōu)CAV智能駕駛決策；
(1) 車輛服務(wù)提供商發(fā)布任務(wù)
(2) RSU $m$初始化網(wǎng)絡(luò)參數(shù)，并上傳至DAG區(qū)塊鏈
(3) for CAV $ v $=1 to V do
(4) 　CAV $ v $發(fā)送請求向量$ {\boldsymbol{\sigma}} _{v,m}^{{\text{dw}}} $
(5) 　RSU $m$發(fā)送響應(yīng)向量$ {\boldsymbol{\sigma}} _{m,v}^{{\text{dw}}} $和初始模型
(6) 　//本地DRL訓(xùn)練
(7) 　for episode e= 1 to E do
(8) 　for step j = 1 to J do
(9) 　　CAV $ v $與環(huán)境不斷交互
(10) 存儲4元組訓(xùn)練樣本$ \left\{ {{{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t},{{{r}}_t},{{\boldsymbol{s}}_{t{\text{ + 1}}}}} \right\} $到${B_{\text{1}}}$
(11) if step done then
(12) 根據(jù)式(20)計算$\bar r$
(13) 存儲5元組訓(xùn)練樣本$ \{ {{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t},{{{r}}_t},{{\boldsymbol{s}}_{t{\text{ + 1}}}},\bar r\} $到${B_{\text{2}}}$
(14) end if
(15) 根據(jù)式(21)更新經(jīng)驗回放池${B_{\text{1}}}$中樣本優(yōu)先級
(16) 根據(jù)式(22)更新經(jīng)驗回放池${B_{\text{2}}}$中樣本優(yōu)先級
(17) 從${B_{\text{1}}}$,${B_{\text{2}}}$中抽樣N₁,N₂數(shù)量的訓(xùn)練樣本
(18) 采用梯度下降方法更新Critic網(wǎng)絡(luò)
(19) if Critic網(wǎng)絡(luò)更新2次 then
(20) 采用梯度下降方法更新Actor網(wǎng)絡(luò)
(21) 采用軟更新方法更新目標網(wǎng)絡(luò)
(22) end if
(23) end for
(24) //上傳模型
(25) if 模型質(zhì)量$ {U_t} \ge {U_{{\text{threshold}}}} $ then
(26) CAV $ v $發(fā)送新site, $ {\bf{TX}}_{v,m}^{{\text{dw}}} $和請求向量$ {\boldsymbol{\sigma}} _{v,m'}^{{\text{up}}} $
(27) RSU $m'$打包交易向量，將新site添加至DAG
(28) end if
(29) end for
(30) end for

下載: 導(dǎo)出CSV

參考文獻(15)

[1]	XU Wenchao, ZHOU Haibo, CHENG Nan, et al. Internet of vehicles in big data era[J]. IEEE/CAA Journal of Automatica Sinica, 2018, 5(1): 19–35. doi: 10.1109/JAS.2017.7510736.
[2]	TENG Siyu, HU Xuemin, DENG Peng, et al. Motion planning for autonomous driving: The state of the art and future perspectives[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(6): 3692–3711. doi: 10.1109/TIV.2023.3274536.
[3]	LI Guofa, QIU Yifan, YANG Yifan, et al. Lane change strategies for autonomous vehicles: A deep reinforcement learning approach based on transformer[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(3): 2197–2211. doi: 10.1109/TIV.2022.3227921.
[4]	ZHU Zhuangdi, LIN Kaixiang, JAIN A K, et al. Transfer learning in deep reinforcement learning: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 13344–13362. doi: 10.1109/TPAMI.2023.3292075.
[5]	WU Jingda, HUANG Zhiyu, HUANG Wenhui, et al. Prioritized experience-based reinforcement learning with human guidance for autonomous driving[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(1): 855–869. doi: 10.1109/TNNLS.2022.3177685.
[6]	CHEN Junlong, KANG Jiawen, XU Minrui, et al. Multiagent deep reinforcement learning for dynamic avatar migration in AIoT-Enabled vehicular metaverses with trajectory prediction[J]. IEEE Internet of Things Journal, 2024, 11(1): 70–83. doi: 10.1109/JIOT.2023.3296075.
[7]	ZOU Guangyuan, HE Ying, YU F R, et al. Multi-constraint deep reinforcement learning for smooth action control[C]. The 31st International Joint Conference on Artificial Intelligence, Vienna, Austria, 2022: 3802–3808. doi: 10.24963/ijcai.2022/528.
[8]	HUANG Xiaoge, WU Yuhang, LIANG Chengchao, et al. Distance-aware hierarchical federated learning in blockchain-enabled edge computing network[J]. IEEE Internet of Things Journal, 2023, 10(21): 19163–19176. doi: 10.1109/JIOT.2023.3279983.
[9]	CAO Bin, WANG Zixin, ZHANG Long, et al. Blockchain systems, technologies, and applications: A methodology perspective[J]. IEEE Communications Surveys & Tutorials, 2023, 25(1): 353–385. doi: 10.1109/COMST.2022.3204702.
[10]	HUANG Xiaoge, YIN Hongbo, CHEN Qianbin, et al. DAG-based swarm learning: A secure asynchronous learning framework for internet of vehicles[J]. Digital Communications and Networks, 2023. doi: 10.1016/j.dcan.2023.10.004.
[11]	XIA Le, SUN Yao, SWASH R, et al. Smart and secure CAV networks empowered by AI-enabled blockchain: The next frontier for intelligent safe driving assessment[J]. IEEE Network, 2022, 36(1): 197–204. doi: 10.1109/MNET.101.2100387.
[12]	FU Yuchuan, LI Changle, YU F R, et al. An autonomous lane-changing system with knowledge accumulation and transfer assisted by vehicular blockchain[J]. IEEE Internet of Things Journal, 2020, 7(11): 11123–11136. doi: 10.1109/JIOT.2020.2994975.
[13]	FAN Bo, DONG Yiwei, LI Tongfei, et al. Blockchain-FRL for vehicular lane changing: Toward traffic, data, and training safety[J]. IEEE Internet of Things Journal, 2023, 10(24): 22153–22164. doi: 10.1109/JIOT.2023.3303918.
[14]	YIN Hongbo, HUANG Xiaoge, WU Yuhang, et al. Multi-region asynchronous swarm learning for data sharing in large-scale internet of vehicles[J]. IEEE Communications Letters, 2023, 27(11): 2978–2982. doi: 10.1109/LCOMM.2023.3314662.
[15]	CAO Mingrui, ZHANG Long, and CAO Bin. Toward on-device federated learning: A direct acyclic graph-based blockchain approach[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 2028–2042. doi: 10.1109/TNNLS.2021.3105810.