Mixture model analysis with rank-based samples

Thumbnail Image
Date
2014, 2013
Authors
Hatefi, Armin
Journal Title
Journal ISSN
Volume Title
Publisher
Statistica Sinica
Journal of Multivariate Analysis
Abstract
Simple random sampling (SRS) is the most commonly used sampling design in data collection. In many applications (e.g., in fisheries and medical research) quantification of the variable of interest is either time-consuming or expensive but ranking a number of sampling units, without actual measurement on them, can be done relatively easy and at low cost. In these situations, one may use rank-based sampling (RBS) designs to obtain more representative samples from the underlying population and improve the efficiency of the statistical inference. In this thesis, we study the theory and application of the finite mixture models (FMMs) under RBS designs. In Chapter 2, we study the problems of Maximum Likelihood (ML) estimation and classification in a general class of FMMs under different ranked set sampling (RSS) designs. In Chapter 3, deriving Fisher information (FI) content of different RSS data structures including complete and incomplete RSS data, we show that the FI contained in each variation of the RSS data about different features of FMMs is larger than the FI contained in their SRS counterparts. There are situations where it is difficult to rank all the sampling units in a set with high confidence. Forcing rankers to assign unique ranks to the units (as RSS) can lead to substantial ranking error and consequently to poor statistical inference. We hence focus on the partially rank-ordered set (PROS) sampling design, which is aimed at reducing the ranking error and the burden on rankers by allowing them to declare ties (partially ordered subsets) among the sampling units. Studying the information and uncertainty structures of the PROS data in a general class of distributions, in Chapter 4, we show the superiority of the PROS design in data analysis over RSS and SRS schemes. In Chapter 5, we also investigate the ML estimation and classification problems of FMMs under the PROS design. Finally, we apply our results to estimate the age structure of a short-lived fish species based on the length frequency data, using SRS, RSS and PROS designs.
Description
Keywords
Finite mixture models, Ranked set sampling, Partial ranking, Latent variables, Expectation-Maximization algorithm, Classification, Fisher information, Entropy, Age structures of Spot fish
Citation
Hatefi, A., Jafari Jozani, M. and Ziou, D. (2014). Estimation and classification for finite mixture models under ranked set sampling. Statistica Sinica, 24, 675--698.
Hatefi , A. and Jafari Jozani, M. (2013). Fisher Information in different types of perfect and imperfect ranked set samples from finite mixture models. Journal of Multivariate Analysis, 119, 16--31.