ntry-header

近日,清华大学2020-2021秋季学期教学评估结果出炉,邓婉璐老师的“初等概率论”课程入围“100人以上理论课程”前5%名单。

此外,统计学研究中心吴未迟老师开设的“高等数理统计I”课程也排名全校前5%行列(30人以下课堂)。

#post-12231
ntry-header

2021年3月5日,清华大学交叉信息研究院王禹皓助理教授访问我中心,并做学术报告,报告的题目是Debiased Inverse Propensity Score Weighting for Estimation of Average Treatment Effects with High-Dimensional Confounders。

王禹皓助理教授
与会教员合影
#post-12230
ntry-header

序贯蒙特卡洛方法作为一种重要的计算工具,被广泛地应用于各个领域中,其中重抽样是序贯蒙特卡洛方法中重要的一步。同时重抽样也是一把双刃剑:一方面,重抽样可以保证序列样本保持一定的有效样本量;另一方面,重抽样会引入新的随机性,使得估计的误差变大。重抽样有着很多种不同的选择,例如Bootstrap重抽样,分层重抽样等。清华大学统计学研究中心邓柯副教授团队与哈佛大学统计系刘军教授团队针对不同情形下的最优重抽样问题展开了进一步研究,相关成果已在统计学顶刊Biometrika发表。中心16级博士研究生李艺超及哈佛大学统计学博士生王文槊为文章的共同第一作者。

在重抽样最优化理论的研究上,本研究的主要贡献包括:

(1)在一维情形下,证明了将样本排序后,分层重抽样在条件方差、能量距离、最优传输等意义下均是最优的。(2)在多维情形下,通过希尔伯特曲线对样本进行排序,分层重抽样的条件方差可以得到最优上界。

结合前两个结论,在序列拟蒙特卡洛方法(SQMC)的框架下,研究团队将抽样和重抽样两个部分结合起来,提出了一种新的抽样方法(Stratified Multiple-Descendant Sampling),并证明了该方法在理论上可以得到已知的最优均方误差。

相关工作建立了序贯蒙特卡洛重抽样算法最优性的系统理论,并以此为基础提出了新的、效率更高的抽样算法,在统计计算理论和应用方面具有重要的原创性贡献。

#post-12229
ntry-header

近日,2021年QS世界大学学科排名发布,清华大学统计与运筹学学科保持第16名,国内排名第一。

#post-12228
ntry-header

近日,清华大学工业工程系2020年度表彰公告发布,我中心多名教职工荣获表彰,以鼓励过去一年中心教职工在科研、教学、人才引进与发展等方面做出的努力和贡献。

#post-12227
ntry-header

Talking with Great Minds—Howell Tong

Keywords: International view   Broad-minded   Practical need  Leadership

Prof. Tong in Tsinghua
  1. Childhood and study experience oversea.

Q: Can you talk about your childhood?

A: In my childhood, my family and I were faced with general tough life conditions, but eventually we went through the hardship and became stronger. I am also very lucky that I always have good teachers. One of the teachers that I remember particularly once told us a story about Hua Luogeng when I was very young. That was the time Hua Luogeng returned to China. My teacher told me how he became famous after studying, and that actually made a quite impression on me. I think it’s because of his story that I decided to study Mathematics. I moved to England in 1961 when my father worked there. The secondary school I attended was not a top school, but was very comprehensive. Under the support of the school and headmaster, I picked up English quickly. I was the only boy from that school who went to a university.

Q: In that period, did you encounter any challenge in your study or life?

A: Yes. First of all, I need to get used to the English way of schooling, for example moving across different classrooms to have lessons, and different dietary habit on campus. But luckily the students around were all very nice, and we became good friends. Because of this experience, I was able to know different culture and their way of speaking. It’s quite a big challenge to adapt from the Hong Kong school system to a working class environment in London at that time.

Q: Is there any particular reason that you choose Statistics as your major?

A: Well as I said I decided to study Mathematics, and I graduated in Mathematics in Manchester. We had good statistics teachers and received many statistics courses, which was unusual at that time in England. I also got a chance to listen to a lecture about probability theory given by an eminent probability researcher. I was impressed by his lecture and became interested in probability theory. Because of my family, I decided to suspend the post-graduate study and took a job. During that time, I started to read some papers, and I came across one paper about time series by my former teacher in Manchester. That’s how I became interested in time series. I wrote to him and went back to Manchester. Due to some reasons, I accidentally became a university teacher teaching statistics instead of a post-graduate student. I was very lucky.

  1. Early career as a statistician.

Q: How did you finally decide to become a statistics researcher?

A: Once I returned to Manchester, I became quite clear that statistics is the career I want to pursue. Thanks to my school, I had an opportunity to meet with other scientists and technologists, and became interested in control engineering and stochastic control. So, time series became quite a natural subject for me. My early career was mainly oriented in frequency domain, and I changed to time domain later on when I met Akaike. He visited us in Manchester for half a year, working on multivariate control system using multivariate linear AR model, as well as some aspect of AIC(Akaike Information Criterion). We became very good friends, and I wanted to learn more from him, so I applied a Royal Society Japan Fellowship, and went to japan for 6 months. During that visit, I read a number of papers he collected, and learned a lot from not only the papers, but also the marks and personal notes he made. It was very valuable for me. By talking to him, I learned the background of why he did certain research. He did research not in front of the desk, but went out and met other scientists. He did not publish many papers in the first ten years of his career, but did a lot of great works later. He spent time cultivating friendship with engineers and other people. Because of this, he was asked to solve a problem of selecting a suitable model from a number of models in the field of predicting. That’s the original problem behind AIC. So, I got a deep understanding of the whole idea of his research besides reading papers.

Q: We know that you published many great papers in your early career, so what’s your secret for this fantastic achievements?

A: I remember the words of Mr. Yang Zhenning. He said do something that you are really passionate about. My father never interfered in my study, and I never interfered in my children’s career either. Let the person choose what he or she is really interested in. My mentor is a time series analyst, but he never pushed me, so I had the chance to choose my own area. The reason why I choose statistics is because I want to produce something new, so I am lucky to be in the right environment where there is no pressure. I am also very lucky to have a good wife taking care of my family, and lucky to have the chance meeting with other scientists. I am a good learner, and I am able to pick up the things I want to learn. I think passion is very important rather than any secret. Remember to be observant and passionate.

  1. About the threshold model

Q: Now let’s talk about one of your most important work in non-linear time series, the threshold model. Where did the idea come from?

A: When I was visiting Akaike, I learned the way he produced the spectral density estimate. So, I used the approach on the lynx data, which I was very interested in. There was a session in the Royal Statistic Society and I presented this paper. During that discussion, there was one gentleman who made a very, very important comment. He said that the data is cyclical, but the cycle is not symmetric. The lynx population would rise slowly but fall rapidly. If you use a linear Gaussian model, you would never be able to capture it. Also, he said that from the point view of dynamical system, the cycle should be considered limit cycle. So, if you can produce a model that leads to limit cycle, it would be ideal. And David Cox and Akaike also made some similar comments. However, it is very difficult and is a big challenge.So, I decided to work on the problem. But my entire education up to that time was all in linear. So, I need to teach myself nonlinear dynamical systems.

Then, one day I was in my garden and mowing the lawn. When you mow the lawn, you go strip by strip. Suddenly, the idea of piecewise linearity came into my mind. This is because I was subconsciously thinking of the problem all the time.

Then I started working on the idea and a student did programs. One day she brought me some results which were too perfectly periodic to be possible. Then I found that she forgot the noise. This was the first time I saw limit cycle. Then, I said we could also see whether this model can produce other nonlinear phenomena, such as subharmonics, higher harmonics, amplitude-frequency dependency and so on. And it turned out that the model could do that.

Q: Did you encounter difficult times with the model?

A: Yes. A lot of people discussed the paper but I could not say everybody liked it, maybe because the idea was so new. I also got one or two people attacking. The model was invented in 1980s but has remained fairly quiet for 10 years. It was in about the 1990s that the model attracted a lot of attention. So, the beginning was not easy.

Q: From your experience, how to find a good research problem?

A: First of all, you have to be social. To me, statisticians are toolmakers. What tool you want to invent must be dictated by practical needs from people on the ground. So, we should go out, interact and collaborate with other scientists. We should be members of scientific teams. Don’t follow fashion blindly. I never want to follow fashion. When I did nonlinear time series, almost none of the leaders in time series worked on that.

There are probably two types of research. One is the run-of-mill research, which means you have an incremental improvement. Those things do not take us long and you can publish these very quickly. The other one is the revolutionary research. Of course, in one’s lifetime, one would probably not have more than a couple of such revolutions. But you must always keep them in mind, work on them in any spare time.

  1. About the leadership

Q: You have been Chair of Statistics at several universities. How can you do good jobs in both academic and management? What’s your secret?

A: I adopted the principle I learned from Lao Tzu (老子) and Sun Tzu’s “Art of War” (孙子兵法). I cannot micromanage, so if there is any big job I will identify a suitable person. Then I will give the person my full support. So if you use one person you need to trust him (用人不疑,疑人不用).

  1. About statistics in the future

Q: Do you worry about the future of statistics given the competition from Machine Learning and AI?

A: As Lao Tzu has said, behind every good fortune there is a misfortune, and misfortune leads to good fortune (祸兮福之所倚,福兮祸之所伏). I think the two aspects are certainly true for what challenge statistics is facing in the domain of data science. But if we sensibly steer our ship of statistics, we can benefit. Machine learning is certainly a powerful tool, but some of the ideas are not unknown or uncommon in statistics. Because in statistics, the basic training is how to handle randomness, and for anything that requires that, statistics has advantages. But on the other hand, we have to be fully prepared and liberate our minds. Some of the old ideas may be too restrictive. We used to deal with small data set in days of Fisher, but now we have to deal with large data sets. To defeat the new challenge, we have to adopt the attitude in Chinese culture: when foreigners come, we absorb them.

So, I don’t worry. As long as we are broad-minded and ready to adapt, we can survive and grow.

 

#post-12226
ntry-header

#post-12225
ntry-header

2020年12月27-29日,“世界华人数学家联盟年会”在安徽合肥举行,清华大学统计学研究中心邓柯副教授作为第一作者的学术论文“On the unsupervised analysis of domain-specific Chinese texts”荣获“2020世界华人数学家联盟最佳论文奖-银奖”。该论文是邓柯副教授与美国哈佛大学Peter Bol教授、哈佛大学刘军教授和萨福克大学李佳漪副教授共同完成,论文发表于美国科学院院刊PNAS杂志。

文章提出运用统计学模型和原理进行无指导中文文本分析的新方法-TopWORDS,可对特定领域中文文本进行词语发现和中文分词。此方法还可以结合其他文本分析工具,如词嵌入、主题模型、关联规则挖掘等,可提取文本中的主要特征和信息,是中文文本挖掘领域的重要突破。

丘成桐先生(右)和林勇教授(左)给邓柯副教授颁奖​

 

#post-12224
ntry-header

2020年12月21日,清华大学统计学论坛特邀报告在华业大厦3702成功举办。本次论坛邀请到中国人民大学统计学院李扬教授。本次报告由王天颖助理教授主持。报告题目是Adaptive Randomization via Mahalanobis Distance.

#post-12223
ntry-header

近日,中心五年级博士生蒋斐宇以第一作者的身份撰写的论文“Adaptive Inference for a Semiparametric Generalized Autoregressive Conditional Heteroskedasticity Model”被计量经济学顶尖期刊Journal of Econometrics接受并在线发表。此文是蒋斐宇同学发表的第3篇JOE论文。

该论文是蒋斐宇与我中心李东副教授和香港大学统计与精算系朱柯助理教授合作完成的,主要研究了一类半参数的广义条件异方差模型(简记为S-GARCH模型)的参数估计、检验和模型诊断等统计推断问题。

传统的GARCH模型由Engle (1982)以及Bollerslev(1986)提出,是应用最为广泛的金融时间序列模型之一,但在使用该模型时通常需要假定数据是平稳的。为此,该论文基于现有的文献方法,在GARCH模型中引入了一般的非参数趋势项,而使得拓展后的S-GARCH模型可以处理非平稳的金融时间序列,拓宽了GARCH模型的适用范围。       该论文提出的两步估计方法简单有效,在一定的假设下,S-GARCH模型的模型参数估计、参数检验和模型诊断等极限理论不依赖于非参数趋势项的估计,并且拥有自适应性和有效性等特点,非常便于使用。

#post-12222