Research Article


DOI :10.26650/acin.1521835   IUP :10.26650/acin.1521835    Full Text (PDF)

Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset

Taha EtemMustafa Teke

Phishing attacks continue to pose a major challenge in today’s digital world; thus, sophisticated detection techniques are required to address constantly changing tactics. In this paper, we have proposed an innovative method to identify phishing attempts using the extensive PhiUSIIL dataset. The proposed dataset comprises 134,850 legitimate URLs and 100,945 phishing URLs, providing a robust foundation for analysis. We applied the t-SNE technique for feature extraction, condensing the original 51 features into only 2, while preserving high detection accuracy. We evaluated several machine learning algorithms on both full and reduced datasets, including Logistic Regression, Naive Bayes, k-Nearest Neighbors (kNN), Decision Trees, and Random Forest. The Decision Tree algorithm showed the best performance on the original dataset, achieving 99.7% accuracy. Interestingly, the proposed kNN demonstrated remarkable results on feature-extracted data, achieving 99.2% accuracy. We observed significant improvements in Logistic Regression and Random Forest performance when using the feature-extracted dataset. The proposed method offers substantial benefits in terms of computational efficiency. The feature-extracted dataset requires less processing power; thus, it is well-suited for systems with limited resources. These findings pave the way for developing more powerful and flexible phishing detection systems that can identify and neutralize emerging threats in real-time scenarios.


PDF View

References

  • Aburrous, M., Hossain, M. A., Dahal, K., & Thabtah, F. (2010). Intelligent phishing detection system for e-banking using fuzzy data mining. Expert Systems with Applications, 37(12), 7913-7921. doi:10.1016/J.ESWA.2010.04.044 google scholar
  • Adebowale, M. A., Lwin, K. T., & Hossain, M. A. (2019). Deep learning with convolutional neural network and long short-term memory for phishing detection. 2019 13th International Conference on Software, Knowledge, Information Management and Applications, SKIMA 2019. doi:10.1109/SKIMA47702.2019.8982427 google scholar
  • Alam, M. N., Sarma, D., Lima, F. F., Saha, I., Ulfath, R. E., & Hossain, S. (2020). Phishing Attacks Detection using Machine Learning Approach. 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), 1173-1179. doi:10.1109/ICSSIT48917.2020.9214225 google scholar
  • Alhudhaif, A., Almaslukh, B., Aseeri, A. O., Guler, O., & Polat, K. (2023). A novel nonlinear automated multi-class skin lesion detection system using soft-attention based convolutional neural networks. Chaos, Solitons & Fractals, 170, 113409. doi:10.1016/J.CHAOS.2023.113409 google scholar
  • Alsaç, A., Yenisey, M. M., Ganiz, M., Dagtekin, M., & Ulusinan, T. (2023). The Efficiency of Regularization Method on Model Success in Issue Type Prediction Problem. Acta Infologica, 7(2), 360-383. doi:10.26650/ACIN.1394019 google scholar
  • Atawneh, S., & Aljehani, H. (2023). Phishing Email Detection Model Using Deep Learning. Electronics 2023, Vol. 12, Page 4261, 12(20), 4261. doi:10.3390/ELECTRONICS12204261 google scholar
  • Bergholz, A., De Beer, J., Glahn, S., Moens, M. F., PaaB, G., & Strobel, S. (2010). New filtering approaches for phishing email. Journal of Computer Security, 18(1), 7-35. doi:10.3233/JCS-2010-0371 google scholar
  • Bibal, A., Delchevalerie, V., & Frenay, B. (2023). DT-SNE: t-SNE discrete visualizations as decision tree structures. Neurocomputing, 529, 101-112. doi:10.1016/J.NEUCOM.2023.01.073 google scholar
  • Bibi, H., Shah, S. R., Baig, M. M., Sharif, M. I., Mehmood, M., Akhtar, Z., & Siddique, K. (2024). Phishing Website Detection Using Improved Multilayered Convolutional Neural Networks. Journal of Computer Science, 20(9), 1069-1079. doi:10.3844/JCSSP.2024.1069.1079 google scholar
  • Buyrukoğlu, S., & Savaş, S. (2023). Stacked-Based Ensemble Machine Learning Model for Positioning Footballer. Arabian Journal for Science and Engineering, 48(2), 1371-1383. doi:10.1007/s13369-022-06857-8 google scholar
  • Divakaran, D. M., & Oest, A. (2022). Phishing Detection Leveraging Machine Learning and Deep Learning: A Review. IEEE Security and Privacy, 20(5), 86-95. doi:10.1109/MSEC.2022.3175225 google scholar
  • Doğruel, M., & Soner Kara, S. (2023). Determining the Happiness Class of Countries with Tree-Based Algorithms in Machine Learning. Acta Infologica, 7(2), 0-0. doi:10.26650/ACIN.1251650 google scholar
  • Efeoğlu, E. (2022). Kablosuz Sinyal Gücünü Kullanarak İç Mekan Kullanıcı Lokalizasyonu için Karar Ağacı Algoritmalarının Karşılaştırılması. Acta Infologica, 0(0), 0-0. doi:10.26650/ACIN.1076352 google scholar
  • Etem, T., & Teke, M. (2024). Enhanced deep learning based decision support system for kidney tumour detection. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 4(2), 100174. doi:10.1016/J.TBENCH.2024.100174 google scholar
  • Garera, S., Provos, N., Chew, M., & Rubin, A. D. (2007). A framework for detection and measurement of phishing attacks. WORM’07 -Proceedings of the 2007 ACM Workshop on Recurring Malcode, 1-8. doi:10.1145/1314389.1314391 google scholar
  • GitHub - judger90/phishing_detection_tsne. (n.d.). Retrieved 19 September 2024, from https://github.com/judger90/phishing_detection_tsne google scholar
  • Gopali, S., Namin, A. S., Abri, F., & Jones, K. S. (2024). The Performance of Sequential Deep Learning Models in Detecting Phishing Websites Using Contextual Features of URLs. In SAC ’24: Proceedings ofthe 39th ACM/SIGAPP Symposium on Applied Computing (pp. 1064-1066). Association for Computing Machinery (ACM). doi:10.1145/3605098.3636164 google scholar
  • Güler, O., & Yücedağ, İ. (2022). Hand Gesture Recognition from 2D Images by Using Convolutional Capsule Neural Networks. Arabian Journal for Science and Engineering, 47(2), 1211-1225. doi:10.1007/S13369-021-05867-2/TABLES/8 google scholar
  • Jain, A. K., & Gupta, B. B. (2022). A survey of phishing attack techniques, defence mechanisms and open research challenges. Enterprise Information Systems, 16(4), 527-565. doi:10.1080/17517575.2021.1896786 google scholar
  • Jiang, D., Shi, X., Liang, Y., & Liu, H. (2024). Feature extraction technique based on Shapley value method and improved mRMR algorithm. Measurement, 237, 115190. doi:10.1016/J.MEASUREMENT.2024.115190 google scholar
  • Jishnu, K. S., & Arthi, B. (2023). Review of the effectiveness of machine learning based phishing prevention systems. AIP Conference Proceedings, 2917(1). doi:10.1063/5.0175593/2919402 google scholar
  • Prasad, A., & Chandra, S. (2024). PhiUSIIL: A diverse security profile empowered phishing URL detection framework based on similarity index and incremental learning. Computers & Security, 136, 103545. doi:10.1016/J.COSE.2023.103545 google scholar
  • Thakur, K., Ali, M. L., Obaidat, M. A., & Kamruzzaman, A. (2023). A Systematic Review on Deep-Learning-Based Phishing Email Detection. Electronics 2023, Vol. 12, Page 4545, 12(21), 4545. doi:10.3390/ELECTRONICS12214545 google scholar
  • Tülay, E. (2023). Detection of Orienting Response to Novel Sounds in Healthy Elderly Subjects: A Machine Learning Approach Using EEG Features. Acta Infologica, 0(0), 0-0. doi:10.26650/ACIN.1234106 google scholar
  • Türk, F., Lüy, M., & Barışçı, N. (2020). Kidney and Renal Tumor Segmentation Using a Hybrid V-Net-Based Model. Mathematics 2020, Vol. 8, Page 1772, 8(10), 1772. doi:10.3390/MATH8101772 google scholar
  • Yaman, O., & Tuncer, T. (2023). Plant Classification Method Using Histogram and Machine Learning for Smart Agriculture Applications. Acta Infologica, 0(0), 0-0. doi:10.26650/ACIN.1070261 google scholar
  • Yang, L., Zhang, J., Wang, X., Li, Z., Li, Z., & He, Y. (2021). An improved ELM-based and data preprocessing integrated approach for phishing detection considering comprehensive features. Expert Systems with Applications, 165, 113863. doi:10.1016/J.ESWA.2020.113863 google scholar

Citations

Copy and paste a formatted citation or use one of the options to export in your chosen format


EXPORT



APA

Etem, T., & Teke, M. (2024). Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. Acta Infologica, 8(2), 213-221. https://doi.org/10.26650/acin.1521835


AMA

Etem T, Teke M. Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. Acta Infologica. 2024;8(2):213-221. https://doi.org/10.26650/acin.1521835


ABNT

Etem, T.; Teke, M. Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. Acta Infologica, [Publisher Location], v. 8, n. 2, p. 213-221, 2024.


Chicago: Author-Date Style

Etem, Taha, and Mustafa Teke. 2024. “Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset.” Acta Infologica 8, no. 2: 213-221. https://doi.org/10.26650/acin.1521835


Chicago: Humanities Style

Etem, Taha, and Mustafa Teke. Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset.” Acta Infologica 8, no. 2 (Mar. 2025): 213-221. https://doi.org/10.26650/acin.1521835


Harvard: Australian Style

Etem, T & Teke, M 2024, 'Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset', Acta Infologica, vol. 8, no. 2, pp. 213-221, viewed 10 Mar. 2025, https://doi.org/10.26650/acin.1521835


Harvard: Author-Date Style

Etem, T. and Teke, M. (2024) ‘Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset’, Acta Infologica, 8(2), pp. 213-221. https://doi.org/10.26650/acin.1521835 (10 Mar. 2025).


MLA

Etem, Taha, and Mustafa Teke. Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset.” Acta Infologica, vol. 8, no. 2, 2024, pp. 213-221. [Database Container], https://doi.org/10.26650/acin.1521835


Vancouver

Etem T, Teke M. Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset. Acta Infologica [Internet]. 10 Mar. 2025 [cited 10 Mar. 2025];8(2):213-221. Available from: https://doi.org/10.26650/acin.1521835 doi: 10.26650/acin.1521835


ISNAD

Etem, Taha - Teke, Mustafa. Advanced Phishing Detection: Leveraging t-SNE Feature Extraction and Machine Learning on a Comprehensive URL Dataset”. Acta Infologica 8/2 (Mar. 2025): 213-221. https://doi.org/10.26650/acin.1521835



TIMELINE


Submitted24.07.2024
Accepted11.12.2024
Published Online13.12.2024

LICENCE


Attribution-NonCommercial (CC BY-NC)

This license lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.


SHARE




Istanbul University Press aims to contribute to the dissemination of ever growing scientific knowledge through publication of high quality scientific journals and books in accordance with the international publishing standards and ethics. Istanbul University Press follows an open access, non-commercial, scholarly publishing.