We have no science of safe AI
David Janků, Max Reddel, Roman Yampolskiy, Jason Hausenloy | November 2024
The rapid development of advanced AI capabilities coupled with inadequate safety measures poses significant risks to individuals, society, and humanity at large. Racing dynamics among companies and nations to achieve AI dominance often prioritize speed over safety – despite lofty statements about AI safety from the biggest players. The desire to be first amplifies the risk of deploying systems without adequate safeguards and increases the potential for unforeseen negative impacts.
The reality is, right now, there is no ‘safe AI’ to aim for – it is an unsolved scientific problem. If it remains unsolved while AI capabilities continue to skyrocket, society will be increasingly exposed to widespread and systemic risks.
Policymakers must take a proactive role in establishing a science of safe AI, as the industry is unlikely to prioritise it on its own and has shown few signs of doing so to date. Current AI development is overwhelmingly focused on enhancing capabilities, with minimal investment in comprehensive safety research, leaving critical gaps in understanding and managing the risks associated with advanced AI. To bridge this gap, policymakers should lead efforts to create rigorous safety standards, support independent safety research, and establish clear accountability for AI risks.
By investing in a dedicated science of safe AI, governments can ensure that the rapid evolution of AI is balanced by responsible oversight, safeguarding both public interests and the long-term viability of AI advancements.
Key findings include:
- Undefined Safety Standards: Unlike in other safety-critical industries, there is no established science or framework guiding AI risk management. Traditional safety standards are inadequate for AI because of its general-purpose nature, which allows models to operate across multiple high-stakes environments simultaneously. Setting safety standards proportional to AI’s capabilities—rather than its specific use cases—is crucial. The current lack of rigorous safety metrics and assessment tools means we are unprepared to manage AI’s complex, dynamic risks.
- Insufficient Safety Techniques: Wide-spread safety measures used by leading AI companies are inadequate. RLHF, capability evaluations, and interpretability research, while useful and meaningful, all face fundamental limitations that prevent them from providing strong assurances against advanced AI–related harms. Current methods often suppress rather than eliminate dangerous capabilities, making them vulnerable to exploitation.
- Low Investment in Safety: Safety research in AI receives only a fraction of the resources devoted to capability development, a stark contrast to high-stakes industries like pharmaceuticals or nuclear power, where safety investments often exceed those for performance and capability. This critical mismatch highlights how AI safety is not prioritised despite the potential large-scale risks AI models might pose in the not too distant near future. The situation is worsened by safety-washing practices, where capability improvements are misleadingly presented as safety progress. This conflation blurs the line between genuine safety advancements and capability growth, making it difficult to accurately assess and address the true progress in making AI systems safer.
The report concludes with a call for developing a dedicated and well-funded science of safe AI. Without robust, evidence-based safety research, we risk advancing AI technologies without the necessary safeguards, with potentially irreversible consequences for our future.
Read the full report here (PDF)