Regularized regression in generalized linear measurement error models with instrumental variables -variable selection and parameter estimation

Loading...
Thumbnail Image
Date
2020
Authors
Xue, Lin
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Regularization method is a commonly used technique in high dimensional data analysis. With properly chosen tuning parameter for certain penalty functions, the resulting estimator is consistent in both variable selection and parameter estimation. Most regularization methods assume that the data can be observed and precisely measured. However, it is well-known that the measurement error (ME) is ubiquitous in real-world datasets. In many situations some or all covariates cannot be observed directly or are measured with errors. For example, in cardiovascular disease related studies, the goal is to identify important risk factors such as blood pressure, cholesterol level and body mass index, which cannot be measured precisely. Instead, the corresponding proxies are employed for analysis. If the ME is ignored in regularized regression, the resulting naive estimator can have high selection and estimation bias. Accordingly, the important covariates are falsely dropped from the model and the redundant covariates are retained in the model incorrectly. We illustrate how ME affects the variable selection and parameter estimation through theoretical analysis and several numerical examples. To correct for the ME effects, we propose the instrumental variable assisted regularization method for linear and generalized linear models. We showed that the proposed estimator has the oracle property such that it is consistent in both variable selection and parameter estimation. The asymptotic distribution of the estimator is derived. In addition, we showed that the implementation of the proposed method is equivalent to the plug-in approach under linear models, and the asymptotic variance-covariance matrix has a compact form. Extensive simulation studies in linear, logistic and poisson log-linear regression showed that the proposed estimator outperforms the naive estimator in both linear and generalized linear models. Although the focus of this study is the classical ME, we also discussed the variable selection and estimation in the setting of Berkson ME. In particular, our finite sample simulation studies show that in contrast to the estimation in linear regression, the Berkson ME may cause bias in variable selection and estimation. Finally, the proposed method is applied to real datasets of diabetes and Framingham heart study.
Description
Keywords
Measurement error, Regularization, Instrumental variable
Citation