Associate Professor WANY Haiying gave a speech titled “Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data” at AMSS on June 14, 2021
In the speech, he investigated the issue of parameter estimation with nonuniform negative sampling for imbalanced data. He first proved that, with imbalanced data, the available information about unknown parameters was only tied to the relatively small number of positive instances, which justified the usage of negative sampling. However, if the negative instances were subsampled to the same level of the positive cases, there was information loss. To maintain more information, he derived the asymptotic distribution of a general inverse probability weighted (IPW) estimator and obtained the optimal sampling probability that minimized its variance. To further improve the estimation efficiency over the IPW method, he proposed a likelihood-based estimator by correcting log odds for the sampled data and proved that the improved estimator had the smallest asymptotic variance among a large class of estimators. It was also more robust to pilot misspecification. He validated our approach on simulated data as well as a real click-through rate dataset with more than 0.3 trillion instances, collected over a period of a month. Both theoretical and empirical resulted demonstrate the effectiveness of our method.
WANG Haiying is an Associate Professor in the Department of Statistics at the University of Connecticut. He was an Assistant Professor in the Department of Mathematics and Statistics at the University of New Hampshire from 2013 to 2017. He obtained his Ph.D. from the Department of Statistics at the University of Missouri in 2013, and his M.S. from the Academy of Mathematics and Systems Science, Chinese Academy of Sciences in 2006. His research interests include informative subdata selection for big data, model selection, model averaging, measurement error models, and semi-parametric regression.