基于云推理模型的深度強化學習探索策略研究

李晨溪; 曹雷; 陳希亮; 張永亮; 徐志雄; 彭輝; 段理文

doi:10.11999/JEIT170347

基于云推理模型的深度強化學習探索策略研究

doi: 10.11999/JEIT170347 cstr: 32379.14.JEIT170347

1.
(解放軍理工大學指揮信息系統(tǒng)學院南京 210007)
2.
(浙江大學機械工程學院杭州 310027)

基金項目:

中電集團重點預研基金(6141B08010101)，中國博士后科學基金(2015T81081, 2016M602974)，江蘇省自然科學青年基金(BK20140075)

計量
- 文章訪問數(shù): 1692
- HTML全文瀏覽量: 268
- PDF下載量: 268
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2017-04-18
- 修回日期: 2017-09-30
- 刊出日期: 2018-01-19

Cloud Reasoning Model-Based Exploration for Deep Reinforcement Learning

1.
(Institute of Command Information System, PLA University of Science and Technology, Nanjing 210007, China)
2.
(College of Mechanical Engineering, Zhejiang University, Hangzhou 310027, China)

Funds:

The Advanced Research of China Electronics Technology Group Corporation (6141B08010101), China Postdoctoral Science Foundation (2015T81081, 2016M602974), The Jiangsu Natural Science Foundation for Youths (BK20140075)

摘要

摘要: 強化學習通過與環(huán)境的交互學得任務的決策策略，具有自學習與在線學習的特點。但交互試錯的機制也往往導致了算法的運行效率較低、收斂速度較慢。知識包含了人類經(jīng)驗和對事物的認知規(guī)律，利用知識引導智能體(agent)的學習，是解決上述問題的一種有效方法。該文嘗試將定性規(guī)則知識引入到強化學習中，通過云推理模型對定性規(guī)則進行表示，將其作為探索策略引導智能體的動作選擇，以減少智能體在狀態(tài)-動作空間探索的盲目性。該文選用OpenAI Gym作為測試環(huán)境，通過在自定義的CartPole-v2中的實驗，驗證了提出的基于云推理模型探索策略的有效性，可以提高強化學習的學習效率，加快收斂速度。
- 云推理 /
- 深度強化學習 /
- 知識 /
- 探索策略
Abstract: Reinforcement learning which has self-improving and online learning properties gets the policy of tasks through the interaction with environment. But the mechanism of trial-and-error usually leads to a large number of training episodes. Knowledge includes human experience and the cognition of environment. This paper tries to introduce the qualitative rules into the reinforcement learning, and represents these rules through the cloud reasoning model. It is used as the heuristics exploration strategy to guide the action selection. Empirical evaluation is conducted in OpenAI Gym environment called CartPole-v2 and the result shows that using exploration strategy based on the cloud reasoning model significantly enhances the performance of the learning process.
- Cloud reasoning /
- Deep reinforcement learning /
- Knowledge /
- Exploration strategy

HTML全文

參考文獻(13)

MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[OL]. https://arxiv.org /abs/1312.5602v1, 2013.12.

SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. MA: MIT Press, 1998: 3-24. doi: 10.1109/ TNN.1998.712192.

MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human- level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236.

OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 4026-4034.

BELLEMARE M, SRINIVASAN S, OSTROVSKI G, et al. Unifying count-based exploration and intrinsic motivation[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1471-1479.

HOUTHOOFT R, CHEN X, DUAN Y, et al. VIME: Variational information maximizing exploration[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1109-1117.

DAVENPORT T H, PRUSAK L, and PRUSAK L. Working Knowledge: How Organizations Manage What They Know [M]. Boston: Harvard Business School Press, 1997: 1-24. doi: 10.1145/347634.348775.

SANTOS M and BOTELLA G. Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems[J]. Knowledge-Based Systems, 2012, 32(8): 28-36.

BIANCHI R A C, ROS R, and MANTARAS R L D. Improving reinforcement learning by using case based heuristics[C]. Proceedings of the International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development, Burlin, 2009: 75-89.

KUHLMANN G, STONE P, MOONEY R, et al. Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer[C]. Proceedings of the 19th National Conference on Artificial Intelligence Workshop on Supervisory Control of Learning and Adaptive Systems, California, 2004: 30-35.

LI Deyi, CHEUNG D, SHI Xuemei, et al. Uncertainty reasoning based on cloud models in controllers[J]. Computers Mathematics with Applications, 1998, 35(3): 99-123.

SINGH S P. Learning to solve Markovian decision processes [D]. [Ph.D. dissertation], University of Massachusetts, Amherst, 1994: 66-72.

HASSELT H V, GUEZ A, and SILVER D. Deep reinforcement learning with double Q-learning[C]. Proceedings of the 30th AAAI Conference on Articial Intelligence, Phoenix, 2016: 2094-2100.

施引文獻

資源附件(0)

訪問統(tǒng)計