Exploring and Exploiting Data-Free Model Stealing

Hong, Chi; Huang, Jiyue; Birke, Robert; Chen, Lydia Y.

doi:10.1007/978-3-031-43424-2_2

Deep machine learning models, e.g., image classifier, are increasingly deployed in the wild to provide services to users. Adversaries are shown capable of stealing the knowledge of these models by sending inference queries and then training substitute models based on query results. The availability and quality of adversarial query inputs are undoubtedly crucial in the stealing process. The recent prior art demonstrates the feasibility of replacing real data by exploring the synthetic adversarial queries, so called data-free attacks, under strong adversarial assumptions, i.e., the deployed classier returns not only class labels but also class probabilities. In this paper, we consider a general adversarial model and propose an effective data-free stealing algorithm, TandemGAN, which not only explores synthetic queries but also explicitly exploits the high quality ones. The core of TandemGAN is composed of (i) substitute model which imitates the target model through synthetic queries and their inferred labels; and (ii) a tandem generator consisting of two networks, {\$}{\$}{\backslash}mathcal {\{}G{\}}{\_}x{\$}{\$}and {\$}{\$}{\backslash}mathcal {\{}G{\}}{\_}e{\$}{\$}, which first explores the synthetic data space via {\$}{\$}{\backslash}mathcal {\{}G{\}}{\_}x{\$}{\$}and then exploits high-quality examples via {\$}{\$}{\backslash}mathcal {\{}G{\}}{\_}e{\$}{\$}to maximize the knowledge transfer from the target to the substitute model. Our results on four datasets show that the accuracy of our trained substitute model ranges between 96--67{\%} of the target model and outperforms the existing state-of-the-art data-free model stealing approach by up to 2.5X.