有向無環(huán)圖區(qū)塊鏈輔助深度強化學(xué)習(xí)的智能駕駛策略優(yōu)化算法
doi: 10.11999/JEIT240407 cstr: 32379.14.JEIT240407
-
重慶郵電大學(xué)通信與信息工程學(xué)院 重慶 400065
基金項目: 國家自然科學(xué)基金(62371082, 62001076),廣西科技計劃(AB24010317),重慶市自然科學(xué)基金(CSTB2023NSCQ-MSX0726, cstc2020jcyj-msxmX0878)
An Intelligent Driving Strategy Optimization Algorithm Assisted by Direct Acyclic Graph Blockchain and Deep Reinforcement Learning
-
School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
Funds: The National Natural Science Foundation of China (62371082, 62001076), Guangxi Science and Technology Project (AB24010317), The Natural Science Foundation of Chongqing (CSTB2023NSCQ-MSX0726, cstc2020jcyj-msxmX0878)
-
摘要: 深度強化學(xué)習(xí)(DRL)在智能駕駛決策中的應(yīng)用日益廣泛,通過與環(huán)境的持續(xù)交互,能夠有效提高智能駕駛系統(tǒng)的決策能力。然而,DRL在實際應(yīng)用中面臨學(xué)習(xí)效率低和數(shù)據(jù)共享安全性差的問題。為了解決這些問題,該文提出一種基于有向無環(huán)圖(DAG)區(qū)塊鏈輔助深度強化學(xué)習(xí)的智能駕駛策略優(yōu)化(D-IDSO)算法。首先,構(gòu)建了基于DAG區(qū)塊鏈的雙層安全數(shù)據(jù)共享架構(gòu),以確保模型數(shù)據(jù)共享的效率和安全性。其次,設(shè)計了一個基于DRL的智能駕駛決策模型,綜合考慮安全性、舒適性和高效性設(shè)定多目標獎勵函數(shù),優(yōu)化智能駕駛決策。此外,提出了一種改進型優(yōu)先經(jīng)驗回放的雙延時確定策略梯度(IPER-TD3)方法,以提升訓(xùn)練效率。最后,在CARLA仿真平臺中選取制動和變道場景對智能網(wǎng)聯(lián)汽車(CAV)進行訓(xùn)練。實驗結(jié)果表明,所提算法顯著提高了智能駕駛場景中模型訓(xùn)練效率,在確保模型數(shù)據(jù)安全共享的基礎(chǔ)上,有效提升了智能駕駛的安全性、舒適性和高效性。
-
關(guān)鍵詞:
- 智能駕駛 /
- 數(shù)據(jù)共享 /
- 深度強化學(xué)習(xí) /
- 有向無環(huán)圖
Abstract: The application of Deep Reinforcement Learning (DRL) in intelligent driving decision-making is increasingly widespread, as it effectively enhances decision-making capabilities through continuous interaction with the environment. However, DRL faces challenges in practical applications due to low learning efficiency and poor data-sharing security. To address these issues, a Directed Acyclic Graph (DAG)blockchain-assisted deep reinforcement learning Intelligent Driving Strategy Optimization (D-IDSO) algorithm is proposed. First, a dual-layer secure data-sharing architecture based on DAG blockchain is constructed to ensure the efficiency and security of model data sharing. Next, a DRL-based intelligent driving decision model is designed, incorporating a multi-objective reward function that optimizes decision-making by jointly considering safety, comfort, and efficiency. Additionally, an Improved Prioritized Experience Replay with Twin Delayed Deep Deterministic policy gradient (IPER-TD3) method is proposed to enhance training efficiency. Finally, braking and lane-changing scenarios are selected in the CARLA simulation platform to train Connected and Automated Vehicles (CAVs). Experimental results demonstrate that the proposed algorithm significantly improves model training efficiency in intelligent driving scenarios, while ensuring data security and enhancing the safety, comfort, and efficiency of intelligent driving. -
1 基于DAG區(qū)塊鏈輔助DRL的智能駕駛策略優(yōu)化算法
輸入:Critic網(wǎng)絡(luò)初始參數(shù),Actor網(wǎng)絡(luò)初始參數(shù),本地迭代輪次
E,學(xué)習(xí)率η,折現(xiàn)因子γ和更新率τ;輸出:最優(yōu)CAV智能駕駛決策; (1) 車輛服務(wù)提供商發(fā)布任務(wù) (2) RSU $m$初始化網(wǎng)絡(luò)參數(shù),并上傳至DAG區(qū)塊鏈 (3) for CAV $ v $=1 to V do (4) CAV $ v $發(fā)送請求向量$ {\boldsymbol{\sigma}} _{v,m}^{{\text{dw}}} $ (5) RSU $m$發(fā)送響應(yīng)向量$ {\boldsymbol{\sigma}} _{m,v}^{{\text{dw}}} $和初始模型 (6) //本地DRL訓(xùn)練 (7) for episode e= 1 to E do (8) for step j = 1 to J do (9) CAV $ v $與環(huán)境不斷交互 (10) 存儲4元組訓(xùn)練樣本$ \left\{ {{{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t},{{{r}}_t},{{\boldsymbol{s}}_{t{\text{ + 1}}}}} \right\} $到${B_{\text{1}}}$ (11) if step done then (12) 根據(jù)式(20)計算$\bar r$ (13) 存儲5元組訓(xùn)練樣本$ \{ {{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t},{{{r}}_t},{{\boldsymbol{s}}_{t{\text{ + 1}}}},\bar r\} $到${B_{\text{2}}}$ (14) end if (15) 根據(jù)式(21)更新經(jīng)驗回放池${B_{\text{1}}}$中樣本優(yōu)先級 (16) 根據(jù)式(22)更新經(jīng)驗回放池${B_{\text{2}}}$中樣本優(yōu)先級 (17) 從${B_{\text{1}}}$,${B_{\text{2}}}$中抽樣N1,N2數(shù)量的訓(xùn)練樣本 (18) 采用梯度下降方法更新Critic網(wǎng)絡(luò) (19) if Critic網(wǎng)絡(luò)更新2次 then (20) 采用梯度下降方法更新Actor網(wǎng)絡(luò) (21) 采用軟更新方法更新目標網(wǎng)絡(luò) (22) end if (23) end for (24) //上傳模型 (25) if 模型質(zhì)量$ {U_t} \ge {U_{{\text{threshold}}}} $ then (26) CAV $ v $發(fā)送新site, $ {\bf{TX}}_{v,m}^{{\text{dw}}} $和請求向量$ {\boldsymbol{\sigma}} _{v,m'}^{{\text{up}}} $ (27) RSU $m'$打包交易向量,將新site添加至DAG (28) end if (29) end for (30) end for 下載: 導(dǎo)出CSV
-
[1] XU Wenchao, ZHOU Haibo, CHENG Nan, et al. Internet of vehicles in big data era[J]. IEEE/CAA Journal of Automatica Sinica, 2018, 5(1): 19–35. doi: 10.1109/JAS.2017.7510736. [2] TENG Siyu, HU Xuemin, DENG Peng, et al. Motion planning for autonomous driving: The state of the art and future perspectives[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(6): 3692–3711. doi: 10.1109/TIV.2023.3274536. [3] LI Guofa, QIU Yifan, YANG Yifan, et al. Lane change strategies for autonomous vehicles: A deep reinforcement learning approach based on transformer[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(3): 2197–2211. doi: 10.1109/TIV.2022.3227921. [4] ZHU Zhuangdi, LIN Kaixiang, JAIN A K, et al. Transfer learning in deep reinforcement learning: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 13344–13362. doi: 10.1109/TPAMI.2023.3292075. [5] WU Jingda, HUANG Zhiyu, HUANG Wenhui, et al. Prioritized experience-based reinforcement learning with human guidance for autonomous driving[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(1): 855–869. doi: 10.1109/TNNLS.2022.3177685. [6] CHEN Junlong, KANG Jiawen, XU Minrui, et al. Multiagent deep reinforcement learning for dynamic avatar migration in AIoT-Enabled vehicular metaverses with trajectory prediction[J]. IEEE Internet of Things Journal, 2024, 11(1): 70–83. doi: 10.1109/JIOT.2023.3296075. [7] ZOU Guangyuan, HE Ying, YU F R, et al. Multi-constraint deep reinforcement learning for smooth action control[C]. The 31st International Joint Conference on Artificial Intelligence, Vienna, Austria, 2022: 3802–3808. doi: 10.24963/ijcai.2022/528. [8] HUANG Xiaoge, WU Yuhang, LIANG Chengchao, et al. Distance-aware hierarchical federated learning in blockchain-enabled edge computing network[J]. IEEE Internet of Things Journal, 2023, 10(21): 19163–19176. doi: 10.1109/JIOT.2023.3279983. [9] CAO Bin, WANG Zixin, ZHANG Long, et al. Blockchain systems, technologies, and applications: A methodology perspective[J]. IEEE Communications Surveys & Tutorials, 2023, 25(1): 353–385. doi: 10.1109/COMST.2022.3204702. [10] HUANG Xiaoge, YIN Hongbo, CHEN Qianbin, et al. DAG-based swarm learning: A secure asynchronous learning framework for internet of vehicles[J]. Digital Communications and Networks, 2023. doi: 10.1016/j.dcan.2023.10.004. [11] XIA Le, SUN Yao, SWASH R, et al. Smart and secure CAV networks empowered by AI-enabled blockchain: The next frontier for intelligent safe driving assessment[J]. IEEE Network, 2022, 36(1): 197–204. doi: 10.1109/MNET.101.2100387. [12] FU Yuchuan, LI Changle, YU F R, et al. An autonomous lane-changing system with knowledge accumulation and transfer assisted by vehicular blockchain[J]. IEEE Internet of Things Journal, 2020, 7(11): 11123–11136. doi: 10.1109/JIOT.2020.2994975. [13] FAN Bo, DONG Yiwei, LI Tongfei, et al. Blockchain-FRL for vehicular lane changing: Toward traffic, data, and training safety[J]. IEEE Internet of Things Journal, 2023, 10(24): 22153–22164. doi: 10.1109/JIOT.2023.3303918. [14] YIN Hongbo, HUANG Xiaoge, WU Yuhang, et al. Multi-region asynchronous swarm learning for data sharing in large-scale internet of vehicles[J]. IEEE Communications Letters, 2023, 27(11): 2978–2982. doi: 10.1109/LCOMM.2023.3314662. [15] CAO Mingrui, ZHANG Long, and CAO Bin. Toward on-device federated learning: A direct acyclic graph-based blockchain approach[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 2028–2042. doi: 10.1109/TNNLS.2021.3105810. -