Mining frequent patterns from uncertain data with MapReduce
Frequent pattern mining from uncertain data allows data analysts to mine frequent patterns from probabilistic databases, within which each item is associated with an existential probability representing the likelihood of the presence of the item in the transaction. When compared with precise data, the solution space for mining uncertain data is often much larger due to the probabilistic nature of uncertain databases. Thus, uncertain data mining algorithms usually take substantially more time to execute. Recent studies show that the MapReduce programming model yields significant performance gains for data mining algorithms, which can be mapped to the map and reduce execution phases of MapReduce. An attractive feature of MapReduce is fault-tolerance, which permits detecting and restarting failed jobs on working machines. In this M.Sc. thesis, I explore the feasibility of applying MapReduce to frequent pattern mining of uncertain data. Specifically, I propose two algorithms for mining frequent patterns from uncertain data with MapReduce.
Data mining, Databases