Organizations across both the private and public sector are pondering the best ways to manage emerging security risks in generative AI models, and independent evaluation processes have repeatedly been at the heart of the strategies proposed so far by the DHS, White House and more. But it’s not so simple.
These types of assessments aim to provide accountability by measuring AI systems against various risks, such as levels of data protection, privacy, potential biases or others. But the ecosystem of AI third-party auditors, and even the frameworks looking at how to best evaluate systems, are all still nascent. Many technology companies already conduct internal or contract-based AI assessments or offer bug bounty programs, but top generative AI companies are self-selective when it comes to the external research teams that they work with. The teams that are able to perform independent research on AI models are running into issues around a lack of transparency and understanding of the large language models (LLMs) behind AI systems that add further complexity to these assessments.
“The ecosystem for assessing and auditing AI models is still in its formative stages, but is growing rapidly,” said Casey Ellis, founder and chief strategy officer at Bugcrowd. “We're seeing a mix of traditional cybersecurity firms expanding their services to include AI security, as well as new startups specifically focused on AI risk management.”
When evaluating AI, there are several different assessment factors, including biases and ethical principles, transparency, accountability and explainability; as well as areas that fall more solidly into the cybersecurity bucket like data protection, privacy and consent. NIST has developed various frameworks that can help organizations better implement security into the development and use of AI systems, including its AI risk management framework released in January 2023 (with a draft version for generative AI released more recently in April). In May, NIST released an Assessing Risks and Impacts of AI program aiming to help organizations better understand how an AI technology would be “valid, reliable, safe, secure, private and fair once deployed.”
However, these frameworks and standards for helping third-party companies better assess AI systems are still fairly new or in development. A report from last year by the United Nations Educational, Scientific and Cultural Organization (Unesco) and Montreal-based artificial intelligence research institute Mila titled “Missing Links in AI Governance,” found that “despite recent policy developments in AI accountability, we are still a far cry from an AI policy ecosystem that enables the effective participation of third-party auditors.”
“We do not yet have the standards and regulatory framework that we need to ensure that third-party auditors are accredited, protected and supported to play their part,” according to the report. “To ensure equity and accountability in the deployment of AI systems, the communities that are most likely to be harmed by these systems must be better represented in the audit, assessment or evaluation process. Third-party auditors, who can play that role, need to be accredited and supported within a policy ecosystem that ensures their independence, integrity, and effectiveness.”
The Emerging Third-Party Ecosystem
Some companies have already set up internal teams to evaluate the safety and security of their AI efforts. For example, Google’s Responsible AI and Human Centered Technology team aims to create AI principles and make sure that products are built on those principles, as well as improve “consistent access, control, and explainability” of AI models. However, the third-party assessment ecosystem has no contractual obligation with technology companies and instead provides important and independent assessments of potential weaknesses with AI systems - including potential cybersecurity issues. A security research ecosystem is emerging around AI, and companies like Bugcrowd are looking at ways to integrate this into their existing platforms by accommodating AI-specific flaws and encouraging crowd-sourced security research.
“Interest from security researchers in AI is very high and it continues to grow,” said Bugcrowd’s Ellis. “There's a lot of curiosity and drive to understand how AI models can be exploited and protected, and to taxonomize the understanding of AI-specific flaws. Hackers are quickly adapting traditional security principles to the AI context and developing new techniques to uncover AI-specific vulnerabilities. Meanwhile, there’s an enormous amount of research and development going into leveraging AI to accelerate attacker workflows.”
AI can be assessed both as a threat - including taking a close look at the risks that could develop in an organization should it be deployed in a hasty or poorly secured manner - and as a target. Assessments in the latter category, relating to cybersecurity, share some similarities with traditional security audits, including researchers’ focus on data security, integrity and the potential for exploitation, said Ellis.
“That said, they are also very different - while it’s relatively easy to draw a bright line around the presence or absence of a vulnerability like IDOR or SQL Injection, generative AI is fuzzy by design so the definition of ‘vulnerable’ becomes more difficult to define,” said Ellis.
“Security researchers assess how well models can withstand attempts to manipulate inputs to produce erroneous or harmful outputs,” said Ellis. “They also examine data poisoning risks, model inversion attacks, and the confidentiality of training data. Additionally, the security of the deployment environment and the potential for API abuse are critical factors.”
Transparency and Terms of Service Roadblocks
Shayne Longpre, PhD candidate at the MIT Media Lab, has talked to dozens of other researchers from different teams, and said that one major concern is how terms of service conditions of popular AI models often prohibit research into vulnerabilities related to things like bypassing safety measures or jailbreaks.
“While some elite research teams that already had connections to OpenAI or Google or elsewhere were more comfortable doing the initial research, other teams were experiencing a lot of chilling effects… there’s some chilling effects on disclosing the results of the research, there are disincentives to tackle certain problems, so people tend to choose less sensitive ones over more sensitive ones, there’s limited transparency into the closed corporations’ systems, so it’s hard to be able to do good research in many cases,” said Longpre.
Longpre in a March open letter to companies like OpenAI and Meta, along with 350 other independent researchers (as well as professors, executives and analysts), urged generative AI companies to make voluntary commitments for both legal safe harbor, which would protect good-faith evaluation research that is conducted via established security vulnerability disclosure practices, and technical safe harbor, which would protect evaluation research from account termination.
“Whereas security research on traditional software has established voluntary protections from companies (“safe harbors”), clear norms from vulnerability disclosure policies, and legal protections from the DOJ, trustworthiness and safety research on AI systems has few such protections,” according to the letter. “Independent evaluators fear account suspension (without an opportunity for appeal) and legal risks, both of which can have chilling effects on research.”
Overall, a lack of transparency for third-party researchers into data limits their ability to uncover the problems. For instance, AI models have safeguards and moderation systems attached, which look at both inputs - to prevent misuse from users - and revise the outputs on the fly if the model seems to be saying anything inappropriate, said Longpre. However, for researchers that don’t have a good level of transparency into the system, it’s difficult to know if an error was caught by the input or output detector, said Longpre.
“If you know the training data, it’s easier to search through that training data to try to understand what kind of risks there might be, [such as] does it have knowledge about very sensitive [information],” said Longpre. “These are the sorts of things that are very important to be able to diagnose and fix these systems… and improve the safety in the long run.”
The Future of AI Accountability
Third-party assessments are key to AI accountability, and various U.S. government agencies have recognized that in their guidelines around AI over the last year. The DHS in its roadmap for AI cybersecurity initiatives, for instance, said it plans to create a number of independent evaluation processes for AI systems used by the department, which will include a test facility that will look at pilots, algorithm training and use cases. It also plans to hold a HackDHS for AI Systems assessment where vetted researchers will be asked to hunt for security flaws in DHS systems that leverage AI. On the defense side, the DHS said it plans to evaluate AI-enabled vulnerability discovery and remediation tactics that can be used for federal civilian government systems.
As AI’s role in the cybersecurity and broader tech industry continue to evolve, Unesco and Mila in their “Missing Links in AI Governance” report said that third-party audits across the AI systems lifecycle are necessary accountability measures, particularly as they represent a wider range of perspectives that could help identify key issues.
“Third-party auditors can shine a light on problems that are unforeseen, deprioritized, or ignored by those who develop, purchase, deploy, or maintain AI systems,” according to the report. “Third-party audits may also be used to focus attention on disparate impacts against various marginalized stakeholders who are too often excluded from consideration. As they have no contractual relationship with the audit target, third-party auditors are less likely to be influenced by the preferences, expectations or priorities of the audit target.”