報 告 人:王漢生 教授
報告題目:Mixture Conditional Regression with Ultrahigh Dimensional Text Data for Estimating Extralegal Factor Effects
報告時間:2024年3月14日(周四) 上午10:00
報告地點:靜遠樓1506學術報告廳
主辦單位:數學與統計學院、數學研究院、科學技術研究院
報告人簡介:
王漢生,1998年北京大學數學學院概率統計系本科畢業,2001年美國威斯康星大學麥迪遜分校統計系博士畢業。2003年加入光華至今,歷任副系主任(2007—2013),系主任(2013—2021)。國內外各種專業雜志上發表文章100+篇,并合著有英文專著共1本,(合)著中文教材3本。國家杰出青年基金獲得者,全國工業統計學教學研究會青年統計學家協會創始會長,美國數理統計協會(IMS)Fellow,美國統計學會(ASA)Fellow,國際統計協會(ISI)Elected Member。先后歷任9個國際學術期刊副主編(Associate Editor / Editor)。國內外各種專業雜志上發表文章100+篇,并合著有英文專著共1本,(合)著中文教材4本。
報告摘要:
Testing judicial impartiality is a problem of fundamental importance in empirical legal studies, for which standard regression methods have been popularly used to estimate the extralegal factor effects. However, those methods cannot handle control variables with ultrahigh dimensionality, such as those found in judgment documents recorded in text format. To solve this problem, we develop a novel mixture conditional regression (MCR) approach, assuming that the whole sample can be classified into a number of latent classes. Within each latent class, a standard linear regression model can be used to model the relationship between the response and a key feature vector, which is assumed to be of a fixed dimension. Meanwhile, ultrahigh dimensional control variables are then used to determine the latent class membership, where a na\ive Bayes type model is used to describe the relationship. Hence, the dimension of control variables is allowed to be arbitrarily high. A novel expectation-maximization algorithm is developed for model estimation. Therefore, we are able to estimate the key parameters of interest as efficiently as if the true class membership were known in advance. Simulation studies are presented to demonstrate the proposed MCR method. A real dataset of Chinese burglary offenses is analyzed for illustration purposes.