Bio-inspired constrained clustering: A case study on aspect-based sentiment analysis
Abstract
Clustering is an important problem in the era of big data. Exact algorithmic clustering approaches are not affordable for many real-world applications (RWA), requiring innovative, and approximation algorithms. Among them are bio or nature-inspired techniques such as “ant brood clustering algorithm” (ACA) inspired by how real ants brood sort their nests. ACA's mathematical model assumes a static radius of perception which is not adaptable to RWA. I address this issue by developing an adaptive clustering algorithm, called “ACA with Adaptive Radius (ACA-AR)” using kernel density estimation, a non-parametric statistical model, to measure average dissimilarity of data objects in ant’s neighborhood. I extend this algorithm to a search-based semi-supervised constrained clustering algorithm (CACA-AR) that incorporates supervisory information to guide the clustering algorithm towards solutions where constraints are minimally violated. I evaluate the accuracy of CACA-AR on benchmark datasets and provide a feasibility study on one RWA, aspect-based sentiment analysis. The F1-score results show that CACA-AR outperforms baseline techniques, multi-class logistic regression, and lexicon based approaches by 20%.