ISBN-13: 9786200282323 / Angielski
The number of textual documents are increasing at an incredible rate and very often, there is a need to classify those documents into some fixed predefined categories. The concepts of text mining and machine learning help a lot in this task of automated document classification. Since the classification is being done automatically, the classifier needs to be a good classifier so that there are as less misclassifications as possible. Therefore, the classification accuracy is very important and needs to be taken care of. There are various factors that can affect the classification accuracy of classifiers. One of the factors is the Feature Selection method used to reduce the number of features in the documents. Information Gain (IG) is one of the most popular methods employed for this task but there are few shortcomings in this method of evaluating the better words. In our work, we have devised a new formula for evaluating the words in the documents and thus finding the better words which are more useful in the classification task. Our method aims to find those words which have more discriminating power than others and therefore, we have named our formula as Discriminating Power (DP).