Rare pattern mining from noisy data

dc.contributor.authorConnor, Hryhoruk
dc.contributor.examiningcommitteeLix, Lisa (Community Health Sciences)
dc.contributor.examiningcommitteeWang, Shaowei (Computer Science)
dc.contributor.supervisorLeung, Carson K
dc.date.accessioned2023-09-08T14:29:34Z
dc.date.available2023-09-08T14:29:34Z
dc.date.issued2023-08-24
dc.date.submitted2023-08-24T06:39:04Zen_US
dc.date.submitted2023-08-24T17:05:31Zen_US
dc.date.submitted2023-09-08T03:45:24Zen_US
dc.degree.disciplineComputer Scienceen_US
dc.degree.levelMaster of Science (M.Sc.)
dc.description.abstractData mining is the process of discovering previously unknown, yet useful, information from data. Data is often large, requiring automated forms of analysis to extract interesting patterns and relationships. To add to this problem, interesting information may be lost when noise is introduced into data. Existing approaches apply lenience to support to accommodate the impacts of noise on frequent patterns. Whilst past research provides improvements to mining within noisy datasets, limited or no work has considered support lenience in relation to rare pattern mining. Rare patterns, as by their name, do not commonly occur. Discovering rare patterns has a variety of applications in areas such as medicine, market analysis, and outlier detection, where commonly occurring events are often uninteresting as they are already well-known and thus do not provide novel insight. I introduce the mining of approximately rare itemsets to discover rare patterns with data in the presence of noise. To mine approximately rare itemsets, an ARI-growth algorithm is proposed. Computational results show ARI-growth is an order of magnitude faster than an Apriori approach. Memory consumption shows similar trends, although, is outperformed by Apriori in one tested case. Approximation with ARI-growth is applicable for rare core pattern recovery.
dc.description.noteOctober 2023
dc.identifier.urihttp://hdl.handle.net/1993/37620
dc.language.isoeng
dc.rightsopen accessen_US
dc.subjectrare pattern mining
dc.subjectdata mining
dc.subjectcore pattern
dc.subjectapproximate
dc.subjectARI-growth
dc.subjectsupport lenience
dc.subjectnoise
dc.subjectassociation rules
dc.titleRare pattern mining from noisy data
dc.typemaster thesisen_US
local.subject.manitobano
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ConnorH_Thesis_resub.pdf
Size:
1.17 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
770 B
Format:
Item-specific license agreed to upon submission
Description: