Security in Machine Learning by Plausible Deniability


When a machine learning model is trained from data, the data may be subject to security requirements and even be classified as sensitive. If the trained model is intended for use by untrusted parties, this raises the question of how much information about the training data is extractable from the machine learning model, once it is given away. The talk presents two results in this regard, based on the security notion of plausible deniability. First, we look at supervised learning and show how the training can be manipulated to produce any desired, and hence deniable, result. We illustrate the method on examples from normal and logistic regression and some examples of neural networks and discuss the practical implications. Second, we look at unsupervised learning, especially clustering algorithms, and demonstrate how to manipulate the space topology towards producing any desired, and hence deniable, classifications. Both attacks work without touching the input data, and hence succeed while respecting data integrity constraints.

These possibilities of attacks offer opportunities and dangers, but in both cases merit deeper thinking of extending security not only to sanitizing data, but, by cryptographic means, also verifiably committing to algorithms and their configurations. Without a “Kerckhoffs’ principle” in AI, we get “obscure AI” and hence manipulatable AI.


Stefan Rass

Stefan Rass (Dr. Dipl.Ing Dipl.Ing) ist Full Professor für Secure Systems an der Johannes Kepler Universität Linz und Associate Professor am Institut für Artificial Intelligence und Cybersecurity der Universität Klagenfurt.

Stefan Rass ist leitet die Forschungsgruppe Secure Systems im Secure and Correct Systems Lab des Linz Institute of Technology. Er forscht im Bereich modellbasierter Security, IT Risiko-Management und Angewandter Kryptographie. Darüber hinaus ist er einer der Leiter der Arbeitsgruppe „Kritische Infrastrukturen“ des Disaster Competence Network Austria.