Microsoft researchers train AI to use hide-and-seek to find software bugs
Microsoft researchers have been working on deep learning models that are trained to find software errors without learning any real-world errors.
Although there are dozens of tools available for static analysis of code in various languages to find security vulnerabilities, researchers have been exploring techniques that use machine learning to improve the ability to detect and repair defects. This is because finding and fixing errors in the code can be difficult and expensive, even if AI is used to finding them.
Every remote worker should consider using a virtual private network to stay safe online.
Researchers from the Microsoft Research Institute at the University of Cambridge, UK, detailed their work on BugLab, which is a Python implementation of "a self-supervised learning error detection and repair method". It is "self-supervised" because the two models behind BugLab are trained without labeled data.
This untrained ambition is due to the lack of annotated real-world errors to train error-detecting deep learning models. Although there is a large amount of source code available for this type of training, most of them are uncommented.
BugLab aims to find hard-to-detect errors and critical errors that have been discovered through traditional program analysis. Their method promises to avoid the expensive process of manually coding models to find these errors.
The organization claims to have found 19 previously unknown errors in PyPI's open-source Python package, as described in the paper "Self-Supervised Error Detection and Repair" published this week at the Neural Information Processing System (NeurIPS) 2021 conference.
Microsoft Research Principal Researcher Miltos Allamanis and Microsoft Senior Principal Research Manager Marc Brockschmidt explained: "BugLab can be taught to detect and fix errors without using tagged data, through the game of ‘hide and seek’." Both are the authors of the paper.
In addition to reasoning about the structure of a piece of code, they believe that errors can be found "by understanding the vague natural language hints left by software developers in code comments, variable names, etc.".
Their approach in BugLab uses two competing models and builds on existing self-supervised learning work in the field using deep learning, computer vision, and natural language processing (NLP). It is also similar to or "inspired" by GANs or generative adversarial networks-neural networks are sometimes used to create deep forgeries.
"In our case, our goal is to train the error detection model without using training data from real errors," they pointed out in the paper.
BugLab’s two models include error selector and error detector: “Given some existing code, the assumption is correct, the error selector model determines whether it should introduce an error, where to introduce it, and its exact form ( For example, replace a specific "+" and "-". Given the selector option, the code is edited to introduce errors. Then, another model, the error detector, tries to determine whether an error was introduced in the code and if so, then Locate it and fix it."
Their model is not GAN, because BugLab's "wrong selector does not generate a new piece of code from scratch, but rewrites an existing piece of code (assuming it is correct)."
From the researchers' test data set of 2,374 real Python package errors, they show that 26% of errors can be automatically discovered and repaired.
However, their technique also flags too many false positives or errors that are not actually false. For example, although it detects some known errors, only 19 of the 1000 warnings reported by BugHub are actually real errors.
Training a neural network without using really wrong training data sounds like a difficult problem to crack. For example, some errors are clearly not errors but are marked as errors by the neural model.
They pointed out in the paper: "Some of the reported issues are so complicated that it took us a few minutes to conclude that the warning was false."
"At the same time, there are some warnings that are ‘obviously’ wrong to us, but the reason for the warnings raised by the neural model is unclear."
As for the 19 zero-day vulnerabilities they discovered, they reported 11 on GitHub, of which 6 have been merged and 5 are awaiting approval. Some of the 19 people are too young to report.