Phishing websites are fraudulent sites that impersonate a trusted party to gain access to sensitive information of an individual person or organization. Traditionally, phishing website detection is done through the usage of blacklist databases. However, due to the current, rapid development of global networking and communication technologies, there are numerous websites and it has become difficult to classify based on traditional methods since new websites are created every second. In this paper, we are proposing a real-time, anti-phishing system. In the first step, we extract the lexical and host-based properties of a website. In the second step, we combine URL (Uniform Resource Locator) features, NLP and host-based properties to train the machine learning and deep learning models. Our detection model is able to detect phishing URLs with a detection rate of 94.89%.
If you use this code or idea for your research, please cite our papers.
@inproceedings{10.1145/3388142.3388170,
author = {Sirigineedi, Surya Srikar and Soni, Jayesh and Upadhyay, Himanshu},
title = {Learning-Based Models to Detect Runtime Phishing Activities Using URLs},
year = {2020},
isbn = {9781450376440},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3388142.3388170},
doi = {10.1145/3388142.3388170},
booktitle = {Proceedings of the 2020 the 4th International Conference on Compute and Data Analysis},
pages = {102–106},
numpages = {5},
keywords = {NLP, Phishing URLs, Deep Learning, Machine learning},
location = {Silicon Valley, CA, USA},
series = {ICCDA 2020}
}