Rare pattern mining from noisy data

Thumbnail Image
Connor, Hryhoruk
Journal Title
Journal ISSN
Volume Title
Data mining is the process of discovering previously unknown, yet useful, information from data. Data is often large, requiring automated forms of analysis to extract interesting patterns and relationships. To add to this problem, interesting information may be lost when noise is introduced into data. Existing approaches apply lenience to support to accommodate the impacts of noise on frequent patterns. Whilst past research provides improvements to mining within noisy datasets, limited or no work has considered support lenience in relation to rare pattern mining. Rare patterns, as by their name, do not commonly occur. Discovering rare patterns has a variety of applications in areas such as medicine, market analysis, and outlier detection, where commonly occurring events are often uninteresting as they are already well-known and thus do not provide novel insight. I introduce the mining of approximately rare itemsets to discover rare patterns with data in the presence of noise. To mine approximately rare itemsets, an ARI-growth algorithm is proposed. Computational results show ARI-growth is an order of magnitude faster than an Apriori approach. Memory consumption shows similar trends, although, is outperformed by Apriori in one tested case. Approximation with ARI-growth is applicable for rare core pattern recovery.
rare pattern mining, data mining, core pattern, approximate, ARI-growth, support lenience, noise, association rules