Learning Plaintext-Ciphertext Cryptographic Problems via ANF-based SAT Instance Representation

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Bibtex Paper

Authors

Xinhao Zheng, Yang Li, Cunxin Fan, Huaijin Wu, Xinhao Song, Junchi Yan

Abstract

Cryptographic problems, operating within binary variable spaces, can be routinely transformed into Boolean Satisfiability (SAT) problems regarding specific cryptographic conditions like plaintext-ciphertext matching. With the fast development of learning for discrete data, this SAT representation also facilitates the utilization of machine-learning approaches with the hope of automatically capturing patterns and strategies inherent in cryptographic structures in a data-driven manner. Existing neural SAT solvers consistently adopt conjunctive normal form (CNF) for instance representation, which in the cryptographic context can lead to scale explosion and a loss of high-level semantics. In particular, extensively used XOR operations in cryptographic problems can incur an exponential number of clauses. In this paper, we propose a graph structure based on Arithmetic Normal Form (ANF) to efficiently handle the XOR operation bottleneck. Additionally, we design an encoding method for AND operations in these ANF-based graphs, demonstrating improved efficiency over alternative general graph forms for SAT. We then propose CryptoANFNet, a graph learning approach that trains a classifier based on a message-passing scheme to predict plaintext-ciphertext satisfiability. Using ANF-based SAT instances, CryptoANFNet demonstrates superior scalability and can naturally capture higher-order operational information. Empirically, CryptoANFNet achieves a 50x speedup over heuristic solvers and outperforms SOTA learning-based SAT solver NeuroSAT, with 96\% vs. 91\% accuracy on small-scale and 72\% vs. 55\% on large-scale datasets from real encryption algorithms. We also introduce a key-solving algorithm that simplifies ANF-based SAT instances from plaintext and ciphertext, enhancing key decryption accuracy from 76.5\% to 82\% and from 72\% to 75\% for datasets generated from two real encryption algorithms.