CLUSTERING DOCUMENT TREES BASED ON SIMILARITY MEASURE

Manasa Sudha Akula*, PrajnaBodapati, Shashi Mogalla

CLUSTERING DOCUMENT TREES BASED ON SIMILARITY MEASURE

Authors

Manasa Sudha Akula*, PrajnaBodapati, Shashi Mogalla

Abstract

With rapid change in technology, Data Mining and Warehousing is gaining a lot of prominence in the field of computers. Retrieval of information in large intra organizations is becoming a tedious task. Data mining is now offering many powerful and innovative techniques for solving the problem of information retrieval. This paper introduces a novelapproach for clustering text documents based on frequent subtrees. Document trees are constructed by extracting noun hypernyms relationship for each and every word in the text document using Wordnet 2.1 lexical reference. This technique sweeps over the traditional text mining approaches which are based on frequent keyword occurrences. The aim of this technique is that it can cluster documents even if the documents do not have words in common. The key idea behind this paper is to automate the clustering mechanism by discovering frequent subtrees from various document trees. To identify the frequent sub trees occurrences in the constructed document trees, the closed frequent substructure mining approach is employed. This approach explores the depth first search in frequent subtree mining to discover all frequent subtrees without candidate generation and false positive pruning to accelerate the mining process. Clusters are formed based on similarity measure. Hence, this paper accentuates on the concept of frequent subtree mining based on noun hypernyms to form clusters resulting in the automation of system for easy searching, organizing and maintaining of voluminous text documents

Article Metrics Graph

Chart Graph | Range Graph

Downloads

Published

2013-10-12

Issue

Vol. 2 No. 6 (2012): Asian Journal of Computer Science and Information Technology

Section

Articles

License

COPYRIGHT AGREEMENT AND AUTHORSHIP RESPONSIBILITY

Â All paper submissions must carry the following duly signed by all the authors:

â€œI certify that I have participated sufficiently in the conception and design of this work and the analysis of the data (wherever applicable), as well as the writing of the manuscript, to take public responsibility for it. I believe the manuscript represents valid work. I have reviewed the final version of the manuscript and approve it for publication. Neither has the manuscript nor one with substantially similar content under my authorship been published nor is being considered for publication elsewhere, except as described in an attachment. Furthermore I attest that I shall produce the data upon which the manuscript is based for examination by the editors or their assignees, if requested.â€