Data-driven smoothing parameter selection in density estimation
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Kernel density estimation (KDE) is a seasoned concept in nonparametric density estimation problems. KDE accuracy depends on the shape of the kernel as well as the bandwidth of the kernel. However, the shape of the kernel has only a minor influence on the estimation, whereas selecting proper smoothing parameter (bandwidth) is critical. If the bandwidth is too small, then spurious features become visible, whereas when the selected bandwidth is too large, important features disappear. Many bandwidth selection methods have been developed over the years, where each has its own characteristics. Few bandwidth selection methods are selected systematically from the recent research literature and verified using simulations in R for a sample dataset. Strengths and limitations of each method is identified and discussed.
Similarly, there exists Bernstein density estimation (BDE) methods for nonparametric density estimation, which are gaining much interest recently. BDEs have an advantage over KDEs when underling density is supported in an unit interval. BDEs are inherently stable in boundaries and have very low boundary bias, but they also introduce considerable variance when compared to KDEs. Like bandwidth selection in KDE, accurate order selection is critical in BDE. Order selection criteria of existing BDEs are then discussed. Based on the limitations identified from KDEs and existing BDEs, few data driven order selection methods are introduced for Bernstein polynomial estimators of density functions on the unit interval. These methods are also verified with a simulation in R, and respective error criteria are compared to verify the effectiveness of the new order selection methods. Finally, bootstrapped order selection method is identified as a potential candidate for further investigation, whereas it's desirable features are clearly identified.