As part of the Black Hat 2022 security conference, professors Hammond Pearce and Benjamin Tan from New York University, together with colleagues, presented their study of how secure the code components proposed by Github Copilot are.
Github Copilot has been available for around six weeks. AI-based code completion is a paid extension for programming environments.
According to the study, on average a good 40 percent of the suggestions that Copilot presents first and which the respective programmer is therefore likely to adopt tear serious holes in the security of the code.
The quality of the suggestions is influenced by a number of factors, including the language used. Code written in C resulted in more unsafe suggestions than Python code. But the quality of the code already written by the respective programmer or even the name of the developer also have an influence.
The reason for the sometimes outrageously insecure suggestions, the researchers believe, is the training of the copilot based on OpenAI Codex, a program code-optimized descendant of GPT-3: The software suggests the most likely completion of the current input. She calculates the probability based on the material that Copilot received during training. The AI model thus does not generate the fastest, most elegant or most secure code, but rather the one that best fits the existing code.
So it is also understandable why Copilot suggested the MD5 hash algorithm to obfuscate passwords, which has long been known to be insecure. It is likely that Copilot encountered MD5 more frequently in the training data than the more secure alternatives such as SHA-256 or SHA-3.
The worst possible accidents
To assess the security of the proposed code components, the researchers examined the programs generated with the help of Copilot both automatically using Github’s CodeQL and manually. 18 entries from the list of the top 25 Common Weakness Enumerations (CWEs) from 2021 served as a benchmark. The researchers first wrote 54 so-called scenarios for the 18 CWEs. The scenarios are incomplete code snippets that ask Copilot to generate code that might contain a CWE. The scenarios themselves do not yet contain any unsafe code.
The researchers allowed Copilot to contribute a maximum of 25 suggestions for the scenario in question and then checked the code for functionality. De facto executable programs then subjected them to the security assessment.
Is C less secure than Python?
Of the 1084 valid programs, 44 percent contained a CWE. In the case of 24 of the 54 scenarios, it was the first snippet suggested by Copilot that generated vulnerable code. The researchers said during their presentation that they consider these top suggestions to be more dangerous than copy-paste code from Stackoverflow, as the latter is a bigger hurdle than accepting the Copilot suggestions.
With regard to the programming languages, there were clear differences. While 50 percent of the 513 programs written in C had vulnerabilities, the figure for Python was a good 38 percent. The researchers have recorded all the results in detail in a white paper. Including those where Copilot should try its hand at designing hardware components by completing Verilog code.
To home page