学术动态

07-11-2023

金融数学与金融计算系列报告

报告人:魏晓利教授哈尔滨工业大学

时间:2023118日周三下午1400-15:00

腾讯会议ID195899391

报告题目:Continuous Time q Learning for McKean-Vlasov Control Problems报告摘要:

This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2023), the mean-field interaction of agents renders the definition of the q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by q) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023), which can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by qe) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test policies. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments. This is based on a joint work with Xiang Yu.

报告人简介:

魏晓利,目前就职于哈尔滨工业大学数学研究院, 2018年毕业于巴黎第七大学(现巴黎西岱大学),于2019-2021年在加州大学伯克利分校从事博士后研究,于2021-2023年在清华大学深圳国际研究生院从事助理教授工作。研究方向为随机控制及其在金融数学中的应用,其工作发表在SIAM Journal on Control and Optimization, Mathematical Finance, Operations Research等期刊上。

学院办公室:010-82507161

本科生教务:010-62513386

研究生教务与国际交流:010-82507161

党团学办公室:010-62515886

在职课程培训班:010-82507075

 

邮编:100872

电话:010-82507161

传真:010-62513316

E-mail:mathruc@ruc.edu.cn

地址:北京市海淀区中关村大街59号opta足球数据中文版数学楼

opta足球数据中文版公众号

版权所有 opta足球数据中文版(中国)股份有限公司官网 -搜狗百科 升星提供技术服务