Submission to the OECD AI Risk Thresholds Public Consultation

7 October 2024

1. What publications and/or other resources have you found useful on the topic of AI risk thresholds?

Koessler et al.’s Risk Thresholds for Frontier AI (https://arxiv.org/abs/2406.14713), 2024: Provides a comprehensive overview of risk threshold regimes common in other high-risk industries and discusses the potential use of risk thresholds and capability thresholds for AI safety.

U.S. NRC’s Safety Goals for Nuclear Power Plant Operation (https://www.nrc.gov/docs/ML0717/ML071770230.pdf), 1983, pages 24-25: Describes the NRC’s risk threshold regime consisting of qualitative safety goals with quantitative design objectives, which serves as an illuminating example for defining safety goals for advanced AI technology.

2. To what extent do you believe AI risk thresholds based on compute power appropriate to mitigate risks from advanced AI systems?

Governing advanced AI via compute thresholds is one of the best AI governance tools available at this time, as researchers have observed a clear correlation between training compute, AI system size, and capabilities, and the compute supply chain is narrow and highly specialised. The more powerful AAI systems are, the more risks they pose. Additionally, compute is a physical resource, which is easier to govern than the non-physical product it underpins, as governing non-physical assets is particularly challenging.

However, basing AI risk thresholds on compute doesn’t resolve the crucial issue of defining clear AI safety goals and concrete risk thresholds delineating acceptable from unacceptable risk levels. This will be necessary, regardless of whether risk threshold implementation and measurement rely on compute power, AI capabilities, or other measures. Similarly, an effective risk assessment framework should not only rely on one metric, but be a hybrid one that takes both compute and other metrics, such as the existence of certain capabilities, into account.

3. To what extent do you believe that other types of AI risk thresholds (i.e., thresholds not explicitly tied to compute) would be valuable, and what are they?

Compute is a quantifiable and trackable resource and a rare, workable proxy for now, but also a tool that’s on the blunter side, and which doesn’t answer crucial questions about safety goals and acceptable risk levels. Therefore, once clear AI safety goals have been set, additional, more precise, well-defined risk thresholds that delineate AI systems that pose acceptable risks from those that pose unacceptable risks will be needed. However, presently we lack the necessary scientific understanding of how advanced AI functions to define precise risk thresholds.

Until our scientific understanding of advanced AI has improved sufficiently, we can use some advanced AI capabilities that are likely dangerous as additional proxies to distinguish low-risk from high-risk advanced AI systems, chief among them self replication and self improvement. Self replication refers to the ability of an AAI system to create copies of itself. This capability is considered particularly dangerous because it could lead to the rapid spread of such AAI systems, making containment nearly impossible, whether the system is safe or not. Self improvement refers to an AAI system’s capability to edit and improve its own source code. It is considered a dangerous capability because it could lead to a runaway process of one AAI system creating a more capable version of itself, which then again can create a more capable version of itself, and on and on, which could quickly lead to the emergence of AAI systems which are beyond human control, no matter what other safeguards are in place.

In summary, interim risk assessment regimes could utilise self replication and self improvement capabilities of advanced AI systems in addition to compute thresholds to distinguish high-risk from low-risk AI systems, as they are generally considered to be capabilities that denote AI systems that may pose catastrophic risks to public safety.

4. What strategies and approaches can governments or companies use to identify and set out specific thresholds and measure real-world systems against those thresholds?

Before specific risk thresholds can be defined, AI safety goals need to be chosen, and precise qualitative or quantitative parameters and safety standards need to be defined to measure whether safety goals are reached. Choosing safety goals and defining acceptable levels of risk the public can be exposed to is a value-based judgement that should be carried out in the public interest. Hence, governments should be the ones choosing safety goals and setting specific, binding risk thresholds, not private companies. In other high-risk industries, acceptable levels of risk are for example benchmarked on what the public considers acceptable in everyday life, or on the risks posed by existing, similar products or industry sectors. Carrying out public consultations, or using the UK Health and Safety Executive’s general risk threshold of 10-4 deaths per year per capita denoting unacceptable levels of risk could be a good starting point.

Governments should implement AI licensing regimes to measure real-world systems against these risk thresholds. Developers should obtain a licence to receive permission to execute AI training runs above a certain training compute threshold from the government of the country in which the training run takes place. Governments should set up independent government agencies that define AI risk thresholds, grant AI training licences and monitor training runs to verify that licence holders adhere to safety standards and fulfil the licence conditions.

5. What requirements should be imposed for systems that exceed any given threshold?

If an advanced AI system exceeds any given risk threshold, the system’s developer should be required to switch it off immediately. Such incidents should also be reported to the developer’s board of directors and the AI regulatory government body in the developer’s country of residence or operation. The developer should then be required to implement additional risk mitigation measures and submit a safety case demonstrating that the system does not exceed the relevant risk thresholds anymore before they are granted permission to reactivate their system.

6. What else should the OECD and collaborating organisations keep in mind with regard to designing and/or implementing AI risk thresholds?

The measures described above should be internationally standardised because of the transnational or even global nature of negative externalities a single ‘bad’ AI system can cause. To kick off this process, the OECD.AI Policy Observatory, possibly in conjunction with the Global Partnership on AI, or the AI Action Summit could commission an international expert working group that conducts public consultations on acceptable risk levels and chooses concrete risk thresholds that prioritise public safety. Initially, such risk thresholds may inevitably rely on compute thresholds and AAI capabilities that are deemed dangerous, like self-replication and self-improvement, as proxies. But risk thresholds should undergo regular re-examinations and updates as the sector develops and changes rapidly, and as our understanding of AAI systems improves and we can introduce more refined parameters to assess AAI systems’ risk levels.

The OECD is particularly well-suited to taking the lead in developing internationally standardised, concrete AAI risk thresholds. It was the first international organisation to publish a structured approach to AI risk management with its AI Principles in 2019, which were updated in 2024. Since 2019, the OECD has also been building a network of AI experts and an active knowledge exchange hub with its AI Policy Observatory. With its AI expertise and its extensive work on governing transformative technologies and using foresight practices to develop policies for an uncertain future, the OECD is well-equipped to develop fitting governance frameworks for advanced AI, a rapidly evolving, novel technology. However, in driving such a process, the OECD should closely collaborate with the GPAI countries, the UN or international fora like the international AI Summits which represent a broader, geographically more diverse group of member states, to ensure the process is inclusive of all world regions.

Lastly, the OECD.AI Policy Observatory correctly points out that societies should aim to reap the benefits of AI while mitigating its potentially significant downsides. In the pursuit of this dual goal, government agencies tasked with ensuring AI safety should be wholly separate from government agencies furthering the innovation and adoption of AI technologies. Examples from the Norwegian government’s petroleum resource management and the US government’s history of governing civilian nuclear technology illustrate that making safety-focused agencies independent from promotion-focused entities is key to avoiding conflicts of interest on either side.