Project Abstract

This project falls into the automatic text documents classification and aims to improve the results of document classification developing some strategies that are combining the results of used classifiers.

There is an increasing number of online documents and an automated document classification is an important challenge. It is essential to be able to automatically organize such documents into classes so as to facilitate document retrieval and analysis. The existence of some intelligent programs that automatically organize documents into categories is essential in order to facilitate the analysis and processing of those documents. Due to the vast domain in which classifiers should work, it is difficult to create a single classifier with good performance. The current approaches are to combine multiple classifiers of different types into a meta-classifier or to implement a hybrid classification. This hybrid classification is based on the prediction of the best classifier for a particular problem, obtained from the input vector characteristics and the classifier history. Having a lot of classifiers, with different types (SVM - Support Vector Machine, Bayes, neural networks, etc.), the approach is to learn a meta-classifier to predict the correctness of each classifier. Meta labeling of an instance indicates the reliability of the classification, if the instance is classified correctly by a classifier selected from the used classifiers. The classification rule of the combined classifiers is that each used classifier assigns a class to the current instance and then the meta-classifier decides if the classification is reliable. Metaclassification is effective only if its components synergies can be exploited, increasing the classification accuracy.