| English | Arabic | Home | Login |

Conference

2023

On Phishing: Proposing a Traffic Behavior-Based Model to Detect, Prevent, and Classify Webpage Suspicious and Malicious Activities

2023-02
2023 IEEE 2nd International Conference on AI in Cybersecurity (ICAIC)
Phishing is a criminal act in which a Phisher sends a well- counterfeit Webpage, using its Lexical, Host-Based, and Content (LHC) features, containing unseen security threats and stealthy attacks to unwary victims tickling them to disclose sensitive credentials such as financial data, address, etc. The Webpage will probably pass under anti-phishing techniques (APT) because they mainly focus on detecting and classifying Webpages as either Malicious or Benign, neglecting Webpage traffic behavior (TB). In this research, we propose the detection, prevention, and classification (DPC) of Webpages' (W) suspicious and malicious activities based on their TB model, namely DPC based on the B - WTB or L2 model, as the second line of defense against a classified Benign Webpage has passed under the APT line of defense undetected. The L2 model is encapsulated in a sandbox to avoid system failure and keep attacks from spreading around the network, which will classify L1 Webpages as Benign, Suspicious, or Malicious based on their TB when they attempt to access unauthorized resources. Using 10369 records from ISCX - URL2016 dataset, the L2 model achieves an accuracy of 90.07%, 91.85%, and 92.62%, using KNN, LR, and SVM machine learning algorithms. In addition, the implementation of the proposed L2 model shows a significant observation regarding classified Webpages' attempt to access restricted resources based on their maximum number of access violation attempts for each of the restricted resources and an accumulative number of access attempts over time for each violation access attempts on the restricted resources. The experimental results show the precision score, the recall score, and the F1 score for each model.
2022

On Phishing: URL Lexical and Network Traffic Features Analysis and Knowledge Extraction using Machine Learning Algorithms (A Comparison Study)

2022-11
2022 5th International Conference on Data Science and Information Technology (DSIT)
Phishing is a criminal act in which a Phisher creates almost identical website connections exploiting URL Lexical characteristics to dupe unsuspecting users into exposing sensitive information such as financial data, address, and other personal information. Phishers recently sought to trick security experts by masking malicious URLs with obfuscation techniques to make them appear legitimate. This action leads us to conclude that URL lexical features analysis approaches are absolute procedures, and extract analysis techniques are required. This research is a first step toward designing and developing a decision-making system that uses a combination of URL Lexical, and Network Traffic features to detect and classify malicious URLs rather than relying solely on Lexical or URL Network Traffic features. To achieve our goal, we examined and assessed the usage of URL Lexical and Network Traffic features to detect malicious URLs. In the study, three methodologies are used: Complete Features, KMO test as a features selection method, and PCA as a dimensionality method, which are tested by LR, SVM, and KNN classification algorithms and evaluated by the Confusion Matrix Accuracy measure. Using Network Traffics features (ISCXURL dataset), the W/O approach: LR, SVM, and KNN has 92%, 94%, and 93% accuracy. The KMO approach: SVM has 91% accuracy. The PCA approach: LR and SVM have 92% and 94% accuracy, surpassing the use of Lexical features (UCI dataset). In contrast, using Lexical features (UCI dataset), the KMO approach: LR and KNN has 90% and 94% accuracy. The PCA approach: KNN has 95% accuracy, surpassing the use of Network Traffic features (ISCXURL dataset). As a result, we are confident in proceeding with the next step of designing and developing a decision-making application that detects and classifies malicious URLs utilizing URL Lexical and Network Traffic features.
2018

PHeDHA: Protecting Healthcare Data in Health Information Exchanges with Active Data Bundles

2018-08
2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE)
Health Information Exchanges (HIEs) collect and disseminate electronic patient healthcare data (EHRs/EMRs) among different healthcare providers to improve the quality and reduce the cost of healthcare services. However, the dissemination of patient data raises privacy and security concerns due to ease of copying and unauthorized dissemination of electronic data. This paper proposes a HIE system called PHeDHA (Protecting Healthcare Data in HIEs with Active Data Bundles), which provides privacy and security protection for patient data during their transmission via an HIE among different healthcare providers. PHeDHA uses as its basis the scheme named Active Data Bundles with Trusted Third Party (ADB-TTP). As the name suggests, ADB-TTP is based on an integration of a trusted third party (TTP) with Active Data Bundles (ADBs). An ADB is a software object that keeps patient healthcare data as sensitive data; includes metadata describing these sensitive data and prescribing their use (via data access and privacy policies specified within metadata); and encompasses a policy enforcement engine (called a virtual machine or VM), which controls and manages how the ADB behaves. In particular, the VM assures ADB's data integrity and enforces its policies specified as a part of metadata. We describe and discuss the conceptual model for PHeDHA, based on ADB-TTP. We are currently evaluating PHeDHA via simulation experiments.
2017

On Feature Selection for the Prediction of Phishing Websites

2017-11
2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)
with the rise of the big data paradigm, large data sets are being made available for knowledge mining. While this open up possibilities for new insights being gained every day, it also exposes data consumers to an increase in low quality, unreliable, redundant or noisy portions of the data. This would negatively affect the process of harvesting knowledge and recognizing patterns. Therefore, efficient feature selection methods to empower for real-time prediction or classification systems. Feature selection is the process of identifying the most relevant attributes and removing the redundant and irrelevant attributes. In this study, we implemented Kaiser-Meyer-Olkin (KMO) Test as a feature selection method and applied that to a publicly available phishing dataset, namely, the UCI of phishing website. furthermore, we used Logistic Regression and Support Vector Machine as classification methods to validate the feature selection method. Our results show just a slight difference in accuracy between implementation using full dataset features and the proposed much smaller dataset (almost 63% of original features set). This reduction in dimensionality is significant for the realtime systems especially when the accuracy reduction is slight. From there, we present a framework enabling a significant reduction in features. This opens the door for future work under which a wider set of classification algorithms will be tested in order to achieve the dimensionality reduction and an increase in performance accuracy.

Back