Fast and scalable MapReduce-based vertical mining
Abstract
Mining uncertain data is challenging because uncertainty is usually represented
as real numbers which are in infinite (cf. representing infinite occurrence counts when
mining precise data). This means that they are not easy to store in a data structure.
Although there exist some data mining algorithms for handling uncertain data, these
algorithms become inefficient when the size of data becomes so big. Vertical data
mining algorithms have advantages in that they run fast and require low memory
space. Hence, for my M.Sc. thesis, I propose two vertical mining algorithms that
mine big uncertain data. Analytical and experimental evaluation results show that,
between these two MapReduce-based vertical mining algorithms, MR-UV-Eclat is
fast and scalable.