Finding bugs and fixing code without humans
Tech
#LLM #AIxCC #취약점 분석 자동화
Andrew WesieTheoriCTOAndrew Wesie is a cybersecurity expert specializing in offensive security research and development. He has been a member of the Plaid Parliament of Pwning CTF team since 2009, with whom he has won DEFCON CTF 7 times. He is currently CTO at Theori, with a recent focus on solving cybersecurity with AI.
Large Language Models (LLMs) have been unavoidable in the news for advances in tasks such as generating code and solving math problems. Large software companies such as Google and Meta report using LLMs extensively to write unit tests and assist in writing production code. The US Government is now holding a competition called AIxCC to use LLMs to find and fix bugs in open source software without human intervention. We will discuss the approach our team took to automating security work which led us to win the first round of this competition: successfully identifying and producing POCs for 13 bugs in 4 of the 5 projects given, and generating patches for 11 of those bugs. We will also discuss the areas where LLMs continue to struggle. Some of the key areas of research include using agents to overcome LLM hallucinations, exploit generation with LLMs, and limiting analysis costs. This work will be open source after the final round of AIxCC in August 2025.