Internet-of-Things (IoT) connects various physical objects through the Internet and it has a wide application, such as in transportation, military, healthcare, agriculture, and many more. Those applications are increasingly popular because they address real-time problems. In contrast, the use of transmission and communication protocols has raised serious security concerns for IoT devices, and traditional methods such as signature and rule-based methods are inefficient for securing these devices. Hence, identifying network traffic behavior and mitigating cyber attacks are important in IoT to provide guaranteed network security. Therefore, we develop an Intrusion Detection System (IDS) based on a deep learning model called Pearson-Correlation Coefficient - Convolutional Neural Networks (PCC-CNN) to detect network anomalies. The PCC-CNN model combines the important features obtained from the linear-based extractions followed by the Convolutional Neural Network. It performs a binary classification for anomaly detection and also a multiclass classification for various types of attacks. The model is evaluated on three publicly available datasets: NSL-KDD, CICIDS-2017, and IOTID20. We first train and test five different (Logistic Regression, Linear Discriminant Analysis, K Nearest Neighbour, Classification and Regression Tree,& Support Vector Machine) PCC-based Machine Learning models to evaluate the model performance. We achieve the best similar accuracy from the KNN and CART model of 98%, 99%, and 98%, respectively, on the three datasets. On the other hand, we achieve a promising performance with a better detection accuracy of 99.89% and with a low misclassification rate of 0.001 with our proposed PCC-CNN model. The integrated model is promising, with a misclassification rate (or False alarm rate) of 0.02, 0.02, and 0.00 with Binary and Multiclass intrusion detection classifiers. Finally, we compare and discuss our PCC-CNN model in comparison to five traditional PCC-ML models. Our proposed Deep Learning (DL)-based IDS outperforms traditional methods.
Avoid common mistakes on your manuscript.
The Internet of Things, or IoT, is an interconnected system based on approved protocols that exchange information. Recent developments in the industry contribute to the development of intelligent smart cities [1], smart devices [2], smart homes [3], smart transportation [4], healthcare [5], agriculture [6], smart grid [7], military [8] and much more[9]. IoT provides the platform to interact with real-world applications via the Internet in the IoT domain [10,11,12]. Moreover, The Industrial Internet of Things (IIoT) is empowered by the rapid growth of integration between sensors and devices[13,14,15]. Therefore, to provide an intelligent industrial edge computing solution, the use of machine-learning, deep-learning, and artificial intelligence is paramount [16]. From statistics [17, 18], the number of smart devices for IoT applications is expected to grow by 21% in the year 2030 compared to 2019, which depicts that IoT will play a crucial role soon. The automotive market will grow to 195.7 billion endpoints during that period [17].
As the number of connected devices across the globe increases, multiple sensors are used to allow them to collect real-time data from physical objects remotely [19]. This data helps us build intelligent decision-making algorithms and effectively manage IoT settings. In parallel, the wide usage of real-world devices raises the risk of cybersecurity threats [18]. Malicious devices can spy on others without notice, manipulate traffic signals remotely and disrupt the networks [20,21,22]. The most notable real-time attacks are DDoS (Distributed Denial of Service)[23], Mirai botnet [24], DoS (Denial of Service)[25], Port Scan, and website crippling from botnet creators who also offer mitigation services at the victim’s expense [26,27,28]. IoT device protection against such intrusions is significant in the field of security. There is an increasing need to take the necessary actions to provide physical and cybersecurity against such powerful attacks. Hence, a detailed analysis of network protection is crucial [29].
Traditional intrusion detection technologies do not provide guaranteed security in IoT applications because of their limited bandwidth capacity and global connectivity [30,31,32]. This results in building an advanced Intrusion Detection System (IDS) to protect IoT devices against intrusions [33]. The IDS can alert the system administrator to suspicious activities or anomalies [34, 35]. The IDS is known as adaptive network security, which can offer valuable feedback to the network administrator about new attack types. It is also essential to update the network with an instance of the attack type before it can detect it [36]. IDS can be classified into three major categories signature-based, anomaly-based, and specification-based [37, 38]. Signature-based approaches can detect attacks based on known attack patterns and based on signatures [39]. An anomaly-based detection system detects a deviation from defined normal behaviors [40]. Similarly, A specification-based system uses the rules defined by the administrator [41]. It is challenging to keep IDS up-to-date due to the complex network and changing environment [42]. To enhance their benefits, researchers use adaptable methods such as machine learning (ML) and deep learning (DL) [43]. The machine learning model can be based on a single classifier, with one classifying model, or a multi-classifiers model, using multiple classification models in parallel [44,45,46]. IDS model can be further divided into binary and multiclass classification models. In binary classification, the traffic divides into either 0 or 1, which means normal (0) or abnormal (1). In contrast, multiclass classification discovers the type of attacks. The multiclass model is more complex than the binary model, leading to lower accuracy for unknown attacks when the data does not contain enough attack instances while training the model [47,48,49].
Furthermore, security attacks are categorized as active or passive attacks [50]. An active attack appears during the run-time conditions. It can disrupt and create damage to the physical device. It is difficult to perform and detect active attacks compared to passive attacks. Denial of Service is the most common example of an active attack [51]. Packet replay, spoofing, and message modification are also examples of active attacks [52,53,54]. Passive attack observes and monitors the information for the specific target [55]. The attackers hide and keep the communication line open to collect information, but the data remains unchanged. The most common threats are eavesdropping network mapping and traffic analysis [56, 57]. To reduce the impact of these attacks on IoT devices and the consumer, a real-time network Anomaly-based intrusion detection system is necessary to block adversaries.
In our previous study [58], we proposed a machine learning-based IDS for detecting DDoS attacks in the context of the CAV (connected and autonomous vehicle) setup, which represents IoT devices. We achieved 94% to 98% accuracy for five different ML models (LR (Logistic Regression), LDA (Linear Discriminant Analysis), CART (Classification and Regression Tree), KNN (K Nearest Neighbour), & SVM (Support Vector Machine)) for binary and multiclass classification on NSL-KDD dataset. Our previous study obtained promising results in binary classification. However, the model in our previous work did not perform well in multiclass classification because we did not use any outlier handling techniques for imbalanced attack instances. Moreover, we only used traditional machine learning approaches, which did not give the best model accuracy.
Therefore, in this current study, we explore different attacks against network layers of IoT devices. We review the current state-of-the-art methods for securing IoT devices against such attacks. We propose an advanced approach for anomaly-based IDS using a deep learning model named Pearson Correlation Coefficient - Convolution Neural Network (PCC-CNN) for anomaly detection with low misclassification rates. We experiment and evaluate binary classification and multiclass classification models. Finally, we discuss the results through a comparative analysis with five different ML algorithms, LR, LDA, CART, KNN, & SVM, using three benchmarking datasets, NSL-KDD, CICIDS2017, and IOTID20, to check the feasibility of binary and multiclass classification approaches. The PCC-CNN model is efficient and lightweight and can detect different cyber attacks.
This research is organized as follows: Sect. 2 presents the background and prior related work. Section 3 details the datasets and methods of ML/DL techniques used. Section 4 exhibits the experimental results with the comparative evaluation of the performance metrics, followed by the concluding research in Sect. 5.
IoT is an interconnection of billions of heterogeneous objects through the Internet. The growth and usage of intelligent IoT devices have surpassed the human population. The application of IoT devices is vast, such as in healthcare, military, transportation, and agriculture [59]. It has become essential to know how IoT works and communicates. Mainly, IoT devices operate in three phases: collection, transmission, and utilization [60]. The collection phase starts with data collection from the physical devices. Then, the transmission phase transmits the data to the end-user or specific communication applications. Lastly, the utilization phase processes the received data to retrieve the environmental information. These phases need protection against the emergence of various threats [61]. The following sections provide current security attacks against IoT structures.
Typically, IoT has three layers, four layers, and five layers of architecture [62,63,64]. Although, there is no standard model of the IoT architecture. This research assumes a three-layer architecture because many professionals accept it [65]. The layers are perception, transport, and network.
The perception Layer is also known as the control layer. It is the lowest layer in the three layers of the architecture of IoT. IoT devices, physical sensors, and actuators sense the environment and collect information about surroundings like temperature, humidity, force, etc. Once the information collection is finished, the perception layer completes the primary processing and packaging of this information. It will also receive information from the network layer for control operations. [66] The most common attacks in this layer are jamming and tampering. These attacks corrupt the network communication using high-frequency radio signals [67].
The transport layer is a top layer in IoT architecture. It is also known as the application layer. This layer analyzes and processes the information from the network and perception layer. It controls the end-to-end links [68]. Different network protocols, such as LAN, Bluetooth, and 3 G [69], transmit sensor data from the perception layer to the processing layer. It typically deals with three types of attacks, flooding, De-synchronization, and man-in-the-middle attack (MITM). The flooding attack takes the device’s memory resources and repeatedly drains them to mitigate the control signal [70]. In the De-synchronization attack, an intruder tries to interrupt the communication and exhaust the network resources [71]. The MITM attack appears when the attacker taps to manipulate or delete information. The transport layer establishes a communication link between the source and the destination. It assures the communication authority to the end side [72].
The data link layer is the middle layer in IoT architecture. It is a layer that uses various techniques, such as 3 G and WiFi, for communicating with physical devices. These wireless communication media use standard, well-known protocols and are prone to different network attacks, also known as routing attacks [73]. These attacks involve eavesdropping, denial of service (DoS), spoofing, network mapping, and traffic analysis. In eavesdropping, an attacker can listen to data and tries to alter it from the wireless channel [74]. DoS is the most common attack nowadays. It can affect the whole network’s data, network performance, and reliability. In spoofing, the attacker checks the actual sender information [75]. Network mapping is placed by defining the software on the system, and last, traffic analysis learns the model from the traffic patterns [76].
The attacks discussed earlier are brutal and hard to mitigate with traditional techniques. To identify and minimize cyber-attacks, robust defense mechanisms are required. Moreover, IDS can help us interact with such attacks in perception, transport, and network layers. An IDS aims to find the occurrence of malicious activity by analyzing network traffic and raising an alert if it does [77]. Several methods and frameworks alleviate various layer attacks using ML and DL techniques [78,79,80]. Many kinds of literature have been published to classify anomalies using ML methods in the IoT infrastructure [81,82,83,84]. Depending upon the methods, we surveyed the literature to learn the recent algorithms used. An IDS system must differentiate anomalies accurately with a low detection time. Hence, it becomes more challenging to develop such a system that can handle complex data and make fast decisions for real-time detection with a low false alarm rate [85].
Depending upon the type of IDS, it can be classified broadly into three types: signature, specification, and anomaly. Signature-based approaches look for pattern similarities between the gathered data and the existing attributes [86, 87]. The signatures of the current activities are extracted and compared with the database signatures by using matching methods. This method is helpful in accurately deciding the type of attacks and preventing a false alarm, but it is challenging to detect unknown attacks [88,89,90]. This class of IDS can also be named misuse detection or knowledge-based detection [91]. The specification-based method uses predefined rules by the administrator. With new episodes, the administrator has the hurdle of updating old laws, which is the major problem with using this method. Anomaly-based methods detect unusual changes in network behavior [92]. This type of system can detect new intrusions, but on the other hand, it creates many false positives. Thus, it is crucial to promptly update the IDS in a complex network and changing environment. Anomaly-based IDS is difficult to exploit because target interaction would raise an alert [93, 94]. Current IDS methods mainly use ML and DL models to detect anomalies [95].
Many researchers have shown promising results in detecting network intrusions [96, 97]. However, there have been fewer targets for their research on IoT network datasets [98]. The commonly used datasets to design new IDS are the KDDCUP99 [99] and DARPA datasets [100]. The problem with these two datasets is that they were created a long time back, as well as the issue of redundancy of features between the classes [101]. The research studies [102] claim that ML algorithms are not producing good results in signature-based intrusion detection. On the other hand, some studies also claim ML algorithms have promising results for anomaly-based intrusion detection in IoT networks [103, 104]. Weka Data Mining Package performs ML techniques on network datasets [105]. Weka is a tool that has a collection of different ML and DL classifiers to predict and compare how different algorithms would perform. The research work in [106] represented an intrusion detection-based approach using the KDDCUP99 datasets to classify anomalies and types of attacks via Weka 3.6 software. The authors constructed the system with an accurate, flexible, and effective performance compared to other methods, but the only drawback is the high incorrectly predicted instances of 13%.
Anish Halimaa et al. [107] developed another ML approach for network IDS to classify the anomalies on NSL-KDD datasets. They used the support vector machine (SVM) and Naive Bayes (NB) techniques to calculate the performance based on accuracy and misclassification rate. However, they used three methods to compare the results: without prepossessed datasets, with normalized datasets, and with the reduced feature using cfsSubsetEval. The methods perform better with the SVM algorithm, with the highest accuracy of 97% achieved using the first method (without prepossessed datasets). NB algorithm has comparatively lower inaccuracies with a high misclassification rate varying from 32 to 44.
Similarly, the authors in [108] proposed supervised ML-based IDS for detecting network attacks in IoT devices. They selected the concept of normalization on the UNSW-NB15 dataset [109] to overcome the information leakage in the test data. The authors applied principal component analysis (PCA) for dimensionality reduction, followed by six different ML classifiers to evaluate the performance metrics: accuracy, precision, F1 Score, and Mathew correlation coefficient. Their findings are competitive with current works. However, they overlooked model overfitting. Biesiada et al. [110] propose an algorithm for extracting non-redundant features from Pearson’s correlation. This method works well on a high-dimensional biomedical dataset used to verify the results.
M.A.Ferrag et al. [114] compared the binary classification performance using the decision tree ML model on commonly used datasets, such as CICIDS and BOT-IOT. They achieved 96% detection accuracy for anomaly detection. However, they did not mention the false alarm rate of the model. Similarly, a research paper by I. Sharafaldin [115] discussed the selection of appropriate ML algorithms. They used the CICIDS-2017 dataset with seven different ML algorithms but only focused on anomaly detection. To minimize the computational complexity, N. Kunhare [116] proposed a swarm optimization technique to select features of importance for the ML algorithm. The model was tested with the random forest classifier for binary classification and achieved 99% accuracy. However, one major drawback of ML approaches for anomaly detection is the high false alarm rate, which can not be neglected in real-time.
Deep Learning techniques are widely used for anomaly-based IDS nowadays because the deep network can, on its own, learn valuable features without any feature selection techniques. Recurrent Neural Network (RNN)-based IDS was developed by Yin et al. [117] for binary and multiclass classification on the NSL-KDD dataset. The model achieved the best accuracy using 80 hidden nodes with 0.1 and 0.5 learning rates for the binary, and multiclass classification, respectively. Their results also depict that different hidden neurons and learning rates affect the model’s accuracy. However, the main disadvantage of this model is that it increases computational complexity, resulting in high training time and low detection rate. Table 1. shows published papers reviewed based on the Network IDS with different publicly available datasets.
Figure 2 shows the step-by-step process that followed to implement the model. For the classification phase, either CNN or ML models were used to predict the binary and multiclass classification. Details of all the steps are discussed in subsequent subsections.
It is also known as cleaning the data or data massaging. It includes removing redundant features. Null/Nan values must be deleted or replaced with substitute values, and the sub-attack instances must be merged into respected attack instances. For each dataset, we divided the output into binary and multiclass classifications. For binary classification, we used two classes, whereas for multiclass classification, we used five classes, including the regular class. Then, we normalize the data between 0 and 1. The critical dimension for this stage is that the data should be compatible with more than one algorithm for consistency and to reduce computation complexity.
Feature extraction is essential while applying ML or DL models as it influences prediction accuracy [134]. With the correct feature selection, we can reduce the over-fitting of the model, improve the accuracy, reduce the model cost, and also help to reduce the training time [135]. Feature extraction is needed to improve learning accuracy by removing irrelevant features and applying feature transformation. Our model uses the feature importance technique based on our proposed Pearson Correlation Coefficient method [136]. The proposed method evaluates the values of an attribute by measuring the correlation between the instances and class [137]. To reduce overfitting, we selected features contributing most to the classification. This method followed the three-stage approach mentioned in [138]. The method first calculates Pearson’s correlation, correlation attribute ranking, and lastly selects features of a new dataset. This method has mainly been used when two variables are normally distributed. We choose the cut-off value to reduce irrelevant features. The cut-off value is evaluated to get the best threshold. The threshold of feature importance is a correlation for this approach, which varies from 0.02 to 0.5 depending on the dataset. The threshold values for each dataset are shown below in Table 5.
CNN is a state-of-the-art model for classification tasks. It combines multiple filters to the data to learn the features and use them for classification. CNN contains input, hidden, and output layers. The input layer starts with the convolutional layers that apply an activation function. The hidden layer has pooling layers that scale down the data to reduce the feature dimensionality, and at last, fully connected layers perform classification.
The designed CNN model contains three convolution layers with respective sizes of 96 multiplied by 4, 64 multiplied by 3, 32 multiplied by 2, and a rectified linear unit activation function (RELU). RELU is a linear function that outputs the input directly if it is positive, or it will output zero if otherwise. The flattening layer reshapes the values from the previous layer into one-dimensional. Three dense layers with 512,128, 32 and 2 were applied with RELU activation in the dense layers except the last, which uses the Softmax activation function. Softmax will help normalize the output in probabilistic form. The dense layers see the values in non-linear form. The adaptive moment estimation (Adam) optimizer tunes the parameter values in the final layer. The number class parameter is set to 2 or 5 depending on whether the expected outcomes are binary or multiclass. Sparse categorical cross-entropy was used for the loss function because the output is in categorical labels. The model was trained over five epochs for binary classification and multiclass classification with 64 batch sizes.
Evaluating a model is an integral part of any effective model. The criteria are different for each model depending upon the application and datasets used. The evaluation metrics help us decide which technique is best suited for a particular job. Following are some list criteria we used to evaluate the model’s performance.