Rule Extraction for Infrequent Class



Journal Title

Journal ISSN

Volume Title



The thesis narrates the classification rules that are developed in the infrequent class to make decisions about their future actions. Rules are the most expressive and most human-readable representation for any kind of hypotheses made in the prediction world. Dealing with the imbalanced datasets it is always portrayed that the standard classifier algorithms are always biased towards the Majority class which finally gives more rules for the majority class when compared to the infrequent class. That is because the conventional algorithms loss functions attempt to optimize quantities such as error rate and not taking the data distribution into consideration. The importance of the infrequent class will be picturized clearly only in the form of the rules that are developed from them. The thesis emphasizes the use of Undersampling technique which is one of the naïve methods used to balance the data and apply the clustering algorithm which clusters the attributes of the similar features and categorize them according to their distance as Euclidean distance and Manhattan distance. The clusters that are generated from the Euclidean distance contributes to the majority class and the Manhattan distance contributes to the minority class. This helps in increasing the minority count of the dataset when compared to the original dataset. Creating a new dataset from them are applied to the conventional classification algorithm to obtain more rules for the minority class which helps in further predictions. The proposed algorithm generates more readable and understandable rules with increased coverage for the minority class when compared to the previously published works.



Imbalanced class, distance measures, rule extraction