Adaptive algorithms for hypertext clustering
Vlajic, Natalija J.
Artificial neural networks (ANNs) based on unsupervised learning have a powerful ability to organize themselves to learn categories of patterns, and then to recognize subsequent patterns in terms of learned categories. However, a number of results obtained, and presented in this work, show that some ANN algorithms, such as the self-organizing map (SOM) algorithm and hard competitive learning (HCL), produce results dependent on the input data distribution density, and therefore may not be appropriate for document clustering tasks. On the other hand, a modified adaptive resonance theory (ART2) is shown to overcome the main drawbacks of the SOM and HCL, and provide perfectly stable multi-hierarchical clustering. Moreover, ART2 in conjunction with competitive Hebbian learning (CHL) exhibits a very interesting ability to preserve the topology of input data, and enable the retrieval of related or relevant groups of documents. The main problem of combined hyper-text clustering is regarding the requirement for the multi-space representation of Web documents. The adaptive hypertext clustering (AHC) algorithm, based upon the modified ART2, is shown to successfully cope with this problem, and depending on the required mode of operation may produce either pure text-based, hyper dimension-based, or combined hypertext clustering. (Abstract shortened by UMI.)