K-Means Clustering of Attack Tools for Threat Attribution in Network Security Incidents
DOI:
https://doi.org/10.22399/ijcesen.3948Keywords:
K-means clustering, Cybersecurity, Attack attribution, Machine learning, Attack tools, Network securityAbstract
This research investigates the application of K-means clustering for attack tool attribution in network security (cybersecurity) incidents using a large-scale open-source dataset of 14,133 records. The dataset includes attributes such as attack type, tools, vulnerabilities, MITRE techniques, impacts, and detection methods, providing a comprehensive foundation for clustering analysis. Unlike previous attribution studies emphasizing malware signatures, anomaly detection, or insider behaviors, this research uniquely focuses on tool-based clustering to uncover consistent adversarial tradecraft across incidents. The methodology involved data preprocessing with tokenization and TF-IDF encoding of tools, dimensionality reduction through Principal Component Analysis, and clustering with K-means optimized using the Elbow Method evaluation, employing silhouette scores, entropy, and purity metrics. Results indicated an optimal clustering configuration at k = 6, with a silhouette score o.71, reflecting well-defined separation. Frequently recurring tools such as BurpSuite, Wireshark, curl, Python, and PowerShell formed distinct clusters aligning closely with MITRE ATT&CK tactics, including reconnaissance, exploitation, persistence, and exfiltration. Statistical tests confirmed significant relationships between clustered tools and vulnerabilities, validating the approach for attribution. The novelty of this research lies in demonstrating that attack tool clustering provides a resilient and interpretable forensic perspective, offering defenders actionable intelligence. This study contributes a scalable, explainable, and data-driven framework for advancing threat attribution and proactive defense by shifting focus from payloads to persistent tool usage.
References
[1] Chen, Z. S., Vaitheeshwari, R., Wu, E. H. K., Lin, Y. D., Hwang, R. H., Lin, P. C., ... & Ali, A. (2024). Clustering apt groups through cyber threat intelligence by weighted similarity measurement. IEEE Access. DOI: https://doi.org/10.1109/ACCESS.2024.3469552
[2] Xiang, X., Liu, H., Zeng, L., Zhang, H., & Gu, Z. (2024). IPAttributor: Cyber attacker attribution with threat intelligence-enriched intrusion data. Mathematics, 12(9), 1364. DOI: https://doi.org/10.3390/math12091364
[3] Haddadpajouh, H., Azmoodeh, A., Dehghantanha, A., & Parizi, R. M. (2020). MVFCC: A multi-view fuzzy consensus clustering model for malware threat attribution. IEEE Access, 8, 139188-139198. DOI: https://doi.org/10.1109/ACCESS.2020.3012907
[4] Al-Sabbagh, A., Hamze, K., Khan, S., & Elkhodr, M. (2024). An Enhanced K-Means Clustering Algorithm for Phishing Attack Detections. Electronics, 13(18), 3677. DOI: https://doi.org/10.3390/electronics13183677
[5] Aziz, Z., & Bestak, R. (2024). Insight into anomaly detection and prediction and mobile network security enhancement leveraging k-means clustering on call detail records. Sensors, 24(6), 1716. DOI: https://doi.org/10.3390/s24061716
[6] Nisioti, A., Mylonas, A., Yoo, P. D., & Katos, V. (2018). From intrusion detection to attacker attribution: A comprehensive survey of unsupervised methods. IEEE Communications Surveys & Tutorials, 20(4), 3369-3388. DOI: https://doi.org/10.1109/COMST.2018.2854724
[7] Sadegh-Zadeh, S. A., & Tajdini, M. (2025). An unsupervised machine learning approach for cyber threat detection using geographic profiling and Domain Name System data. Decision Analytics Journal, 100576. DOI: https://doi.org/10.1016/j.dajour.2025.100576
[8] Li, J., Liu, J., & Zhang, R. (2024). Advanced persistent threat group correlation analysis via attack behavior patterns and rough sets. Electronics, 13(6), 1106. DOI: https://doi.org/10.3390/electronics13061106
[9] Baugher, J., & Qu, Y. (2024). Create the taxonomy for unintentional insider threat via text mining and hierarchical clustering analysis. European Journal of Electrical Engineering and Computer Science, 8(2), 36-49. DOI: https://doi.org/10.24018/ejece.2024.8.2.608
[10] Kaliyaperumal, P., Periyasamy, S., Thirumalaisamy, M., Balusamy, B., & Benedetto, F. (2024). A novel hybrid unsupervised learning approach for enhanced cybersecurity in the IoT. Future Internet, 16(7), 253. DOI: https://doi.org/10.3390/fi16070253
[11] Kida, M., & Olukoya, O. (2022). Nation-state threat actor attribution using fuzzy hashing. IEEE Access, 11, 1148-1165. DOI: https://doi.org/10.1109/ACCESS.2022.3233403
[12] Mohasseb, A., Aziz, B., Jung, J., & Lee, J. (2020). Cyber security incidents analysis and classification in a case study of Korean enterprises. Knowledge and Information Systems, 62(7), 2917-2935. DOI: https://doi.org/10.1007/s10115-020-01452-5
[13] Nikiforova, O., Romanovs, A., Zabiniako, V., & Kornienko, J. (2024). Detecting and identifying insider threats based on advanced clustering methods. Ieee Access, 12, 30242-30253. DOI: https://doi.org/10.1109/ACCESS.2024.3365424
[14] Liu, C., Gu, Z., & Wang, J. (2021). A hybrid intrusion detection system based on scalable K-means+ random forest and deep learning. Ieee Access, 9, 75729-75740. DOI: https://doi.org/10.1109/ACCESS.2021.3082147
[15] Kakani, T. A. (2025). Optimization of Serverless Mobile Cloud Applications for Enhanced Security and Resource Efficiency. Optimization, 5(1).
[16] Barot, T., et al. (2023). Cybersecurity attack and defence dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/tannubarot/cybersecurity-attack-and-defence-dataset
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Computational and Experimental Science and Engineering

This work is licensed under a Creative Commons Attribution 4.0 International License.