Optimal Supervised Feature Extraction in Internet Traffic Classification
Internet traffic classification is important in many aspects of network management such as data exploitation detection, malicious user identification, and restricting application traffic. Previously, features such as port and protocol numbers have been used to classify traffic, but these features can now be changed easily, making their use in traffic classification inadequate. Consequently, traffic classification based on machine learning (ML) is now employed. The number of features used in an ML algorithm has a significant impact on performance, in particular accuracy. In this paper, a minimum best feature set is chosen using a supervised method to obtain uncorrelated features. Outlier removal and data normalization is used to reduce the dimensionality. The data projected into the resulting space is then used to construct the classifier input. Finally, the decision tree, artificial neural network and naïve Bayesian single classifier algorithms, and the bagging and boosting ensemble algorithms, are used for traffic classification. Results are presented which show that the feature space dimension can be reduced to M-1, where M is the number of classes, with no loss in class separability.