日韩午夜电影av,色综合久久久久久中文网,日韩美女视频一区二区,精品不卡视频

10月14日 劉衛東教授學術報告(數學與統計學院)

來源:數學行政作者:時間:2023-10-12瀏覽:265設置

報 告 人:劉衛東 教授

報告題目:Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning

報告時間:2023年10月14日(周六上午10:10 )

報告地點:江蘇師范大學數學與統計學院學術報告廳(靜遠樓1506室)

主辦單位:數學研究院、數學與統計學院、科學技術研究院

報告人簡介:

       劉衛東,上海交通大學特聘教授,國家杰出青年科學基金獲得者,中國工業與應用數學學會理事。主要研究方向為統計學和機器學習等,目前已在AOS、 JASA、JRSSB、Biometrika、JMLR、ICML、IJCAI、IEEE TSP等專業頂尖期刊/會議上發表論文六十余篇。主持國家重點研發計劃課題1項,國家杰出青年科學基金1項,國家優秀青年科學基金1項。

報告摘要: 

       Recently, reinforcement learning has gained prominence in modern statistics, with policy evaluation being a key component. Unlike traditional machine learning literature on this topic, our work places emphasis on statistical inference for the parameter estimates computed using reinforcement learning algorithms. While most existing analyses assume random rewards to follow standard distributions, limiting their applicability, we embrace the concept of robust statistics in reinforcement learning by simultaneously addressing issues of outlier contamination and heavy-tailed rewards within a unified framework. In this paper, we develop an online robust policy evaluation procedure, and establish the limiting distribution of our estimator, based on its Bahadur representation. Furthermore, we develop a fully-online procedure to efficiently conduct statistical inference based on the asymptotic distribution. This paper bridges the gap between robust statistics and statistical inference in reinforcement learning, offering a more versatile and reliable approach to policy evaluation. Finally, we validate the efficacy of our algorithm through numerical experiments conducted in real-world reinforcement learning experiments.



返回原圖
/

主站蜘蛛池模板: 沙坪坝区| 黑水县| 东山县| 五莲县| 黄石市| 玉山县| 台南市| 射阳县| 绍兴市| 承德市| 汉沽区| 城固县| 北碚区| 潞西市| 清水县| 枣阳市| 确山县| 鸡西市| 台东市| 石首市| 萨嘎县| 安乡县| 祁阳县| 鄯善县| 仁寿县| 邳州市| 博客| 通道| 大悟县| 阳高县| 桃江县| 阜新| 清镇市| 曲周县| 齐河县| 资溪县| 金川县| 大连市| 邓州市| 珲春市| 陇南市|