Scalable vertical mining for big data analytics

Zhang, Hao

Scalable vertical mining for big data analytics

Files

zhang_hao.pdf(644.29 KB)

Date

2016

Authors

Zhang, Hao

Abstract

The increasing size of modern applications produces huge amounts of data, which in turn leads to a new challenge to data mining or big data analytics. Researchers often use the five V’s (Volume, Velocity, Variety, Veracity, and Value) to describe the features of big data. The interest of discovering patterns from a large collection of data has risen in both academic and industrial areas. Examples of rich sources of big data are on-line social networks like Facebook or Twitter. Embedded in these user online social activities are useful information and knowledge. Recently, although some algorithms have been proposed to mine a large scale of data, they mostly focused on the volume aspect. Unfortunately, not that many approaches have been focused on data variety which is also a critical criterion for mining process. The composition of a dataset could either be sparse or dense, or not evenly uniformly distributed. For example, a list of common friends in an on-line social network can be dense if two people share a lot of common friends; it could be sparse otherwise. For my MSc thesis, I design and implement a big data analytic algorithm that tackles both volume and variety aspects of big data.

Keywords

Data mining, Frequent pattern mining, Big data, Data analytics

URI

http://hdl.handle.net/1993/31951

Collections

FGS - Electronic Theses and Practica

Full item page