CONFUSION MATRIX AND MACHINE LEARNING IN CYBER SECURITY
What is a Confusion Matrix?
A confusion matrix is a performance measurement technique for Machine learning classification. It is a kind of table which helps you to the know the performance of the classification model on a set of test data for that the true values are known. The term confusion matrix itself is very simple, but its related terminology can be a little confusing.
For binary classification the negative class is 0 and the positive class is 1, the confusion matrix is constructed with a 2x2 grid table where the columns are the actual values of the data, and the rows are the predicted values from the model. So it is a table with 4 different combinations of predicted and actual values.
Below I have explained what the four boxes in the confusion matrix are representing.
True Positive: The model predicted positive and the label was actually positive.
True Negative: The model predicted negative and the label was actually negative.
False Positive: The model predicted positive and the label was actually negative. Also known as the Type 1 error
False Negative: The model predicted negative and the label was actually positive. Also known as the Type 2 error
We can use confusion matrix to calculate various metrics:
- Accuracy (all correct / all) = TP + TN / TP + TN + FP + FN
- Misclassification (all incorrect / all) = FP + FN / TP + TN + FP + FN
- Precision (true positives / predicted positives) = TP / TP + FP
- Sensitivity aka Recall (true positives / all actual positives) = TP / TP + FN
- Specificity (true negatives / all actual negatives) =TN / TN + FP
Errors in Confusion Matrix
False Positive: (Type 1 Error)
Interpretation: You predicted positive and it’s false.
You predicted that a man is pregnant but he actually is not.
False Negative: (Type 2 Error)
Interpretation: You predicted negative and it’s false.
You predicted that a woman is not pregnant but she actually is.
Applications of Confusion Matrix
Cyber Attack Detection and Classification using Parallel Support Vector Machine
Support Vector Machines (SVM) are the classifiers that were originally designed for binary c1assification. The classification applications can solve multi-class problems. The result shows that pSVM gives more detection accuracy for classes and comparable to the false alarm rate.
Cyberattack detection is a classification problem, in which we classify the normal pattern from the abnormal pattern (attack) of the system.
The SDF is a very powerful and popular data mining algorithm for decision-making and classification problems. It has been using in many real-life applications like medical diagnosis, radar signal classification, weather prediction, credit approval, and fraud detection, etc.
A parallel Support Vector Machine (pSVM) algorithm was proposed for the detection and classification of cyber attack datasets.
The performance of the support vector machine is greatly dependent on the kernel function used by SVM. Therefore, we modified the Gaussian kernel function in a data-dependent way in order to improve the efficiency of the classifiers. The relative results of both the classifiers are also obtained to ascertain the theoretical aspects. The analysis is also taken up to show that PSVM performs better than SDF.
The classification accuracy of PSVM remarkably improve (accuracy for Normal class as well as DOS class is almost 100%) and comparable to false alarm rate and training, testing times.
Thankyou…..