A Compliance-Driven Framework for Data Curation and Gating in Machine Learning Training for Enterprise Privacy Infrastructure
DOI:
https://doi.org/10.22399/ijcesen.4805Keywords:
Federated Learning, Privacy-Preserving Machine Learning, Data Governance, Usage Control, Anomaly Detection, Blockchain ProvenanceAbstract
Large-scale machine learning systems spread over distributed infrastructures are confronted with crucial issues of managing sensitive data and, at the same time, abiding by regulatory requirements. In general, training pipelines do not have the means by which they can monitor the way in which protected data is introduced to model development; thus, there are quite significant privacy risks in decentralized environments. Also, the lack of complete visibility hinders the organizations' capability to trace data sources, grasp the movement of information between the systems, and check the conformity to the compliance requirements. In many cases, sensitive data is not properly safeguarded and is even allowed to be exploited beyond authorized purposes, both during training and inference stages. Automated classification systems detect sensitivity indicators within datasets and apply metadata tags specifying permissible uses at the precise moment information feeds into training operations. Gating mechanisms function as policy enforcement points that validate access requests against predefined rules, ensuring models access only data appropriate for declared purposes. Attribute-based access control looks at a variety of factors that include attributes of the subject, classes of the resources as well as certain conditions of the environment, and, based on all these factors, it dynamically makes the decision about the authorization. Machine learning anomaly detection is a kind of vigilant system that constantly watches the access patterns and, through behavioral analysis, it can pinpoint the variations from already established compliance standards. Distributed logging that is supported by blockchain keeps very detailed and at the same time very secure audit trails that enable, in the future, the checking of data usage throughout the lifecycle of the models.
References
[1] Stacey Truex et al., "A Hybrid Approach to Privacy-Preserving Federated Learning," arXiv, 2019. [Online]. Available: https://arxiv.org/pdf/1812.03224
[2] Robin C. Geyer et al., "Differentially Private Federated Learning: A Client Level Perspective," arXiv, 2018. [Online]. Available: https://arxiv.org/pdf/1712.07557
[3] H. Brendan McMahan et al., "Communication-Efficient Learning of Deep Networks from Decentralized Data," Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017. [Online]. Available: https://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf
[4] Peva Blanchard et al., "Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent," 31st Conference on Neural Information Processing Systems, 2017. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/file/f4b9ec30ad9f68f89b29639786cb62ef-Paper.pdf
[5] Vincent C. Hu et al., "Attribute-Based Access Control," IEEE COMPUTER SOCIETY, 2015. [Online]. Available: https://profsandhu.com/cs5323_s18/Hu-2015.pdf
[6] Sahand Harir et al., "Extended Isolation Forest," IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021. [Online]. Available: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8888179
[7] Xueping Liang et al., "ProvChain: A Blockchain-based Data Provenance Architecture in Cloud Environment with Enhanced Privacy and Availability," 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2017. [Online]. Available: https://www.researchgate.net/profile/Sachin-Shetty/publication/317182541
[8] Beatriz Pérez et al., "A systematic review of provenance systems," Springer Nature, 2018. [Online]. Available: https://www.researchgate.net/profile/Carlos-Saenz-Adan/publication/323242431_A_systematic_review_of_provenance_systems/links/5b34ae1caca2720785effb1a/A-systematic-review-of-provenance-systems.pdf
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Computational and Experimental Science and Engineering

This work is licensed under a Creative Commons Attribution 4.0 International License.