Most enterprises are in the stage of moving their ML/AI use-cases from innovation to industrialization with proper governance. MLOps platforms from cloud providers such as Azure and GCP are good and still maturing. However, security of the models in most platforms are currently an afterthought. Enterprise Security architectures will need to consider the ML/AI security as well in their radar. In this post, I introduce the ‘MLSecOps’ focusing on the security side of ML/AI model operationalization based on the data rather than the images alone (there are also some good papers on how AI models using images as their training data) ML/AI models can be attacked mainly by using the evasion and poisoning methods. Also, it is possible to replace the model with another one deployed by the attacker himself.
Below is the simplified and suggested MLSecOps Framework which will be used as a basis…
Model development – Data Preparation: Data Scientists need to be aware that their models can be compromised. As a resolution, they need to make sure the decision boundaries are smooth. They may also introduce adversarial examples into the dataset (noisy/perturbed inputs) per model. Another method would be to apply filtering into the training dataset so that the poisoned data coming from the live inferencing production platform is prevented to be included in the training blindly. Besides, the model should consider the data privacy compliances when defining the training data sets.
During training, test cases should be written for each model to test adversarial scenarios. Additionally, penetration testing could be introduced into the training pipeline. For penetration testing, several flavours of automated tools are available as open source projects. Before the model is released to production, make sure the model is explainable (how it makes the decisions), trained data obeys the data privacy compliances, tested against adversarial attacks and penetration.
When the model is deployed to production, there needs to be input data validation – based on rules. Light data validation frameworks can be used to filter/alert the input data based on expectations by the model. That will also help identify anomalies in the input data set into the model before it is too late. There are also other non-invasive ways to detect the input data drift. Especially during these COVID days, a lot of models failed to deliver the expected accuracy rates because of this drift. It will be helpful also to prevent/filter/alert any poisoned data sets into the model and take action on time before it gets out of control. Automated monitoring approaches/solutions would be needed to remedy such scenarios. Additionally, the models need to be protected with hashes as 'model watermarks', that can be validated during inferencing.
Moreover, attackers might use several techniques; such as using bots to learn the behaviour of these models. The online inferencing models exposed through REST APIs can be protected using well thought-out web service security techniques such as authentication, authorization using OAuth, risk based authorization policies, TLS, rate limiting, and DDOS attack prevention, etc. Then again, it is possible to skip such measures and yet send requests to ML/AI models to figure out their behaviour. In those cases, it should be possible to detect such anomalies through automated monitoring solutions.
Ml/AI adoption strategy needs to consider model security and automated monitoring solutions to detect such unexpected model usage scenarios so that the models may remain healthy in production environments. We need more tools and solutions in this space integrated into well known MLOps platforms...