Safeguarded AI TA1

Overview

As AI becomes more capable, it has the potential to power scientific breakthroughs, enhance global prosperity, and safeguard us from disasters. But only if it’s deployed wisely.

Current techniques working to mitigate the risk of advanced AI systems have serious limitations, and can’t be relied upon empirically to ensure safety. To date, very little R&D effort has gone into approaches that provide quantitative safety guarantees for AI systems, because they’re considered impossible or impractical.

By combining scientific world models and mathematical proofs we will aim to construct a ‘gatekeeper’, an AI system tasked with understanding and reducing the risks of other AI agents.

In doing so we’ll develop quantitative safety guarantees for AI in the way we have come to expect for nuclear power and passenger aviation.

Scope

This programme is split into three technical areas (TAs), each with its own distinct solicitations. The first solicitation within TA1 – TA1.1 Theory – is open for applications now.

Whilst this solicitation focuses on TA1.1, as laid out in the programme thesis, the wider Safeguarded AI programme is divided into several technical areas (TAs), as follows:

TA1 Scaffolding

TA1.1 Theory: this solicitation
TA1.2 Backend: to develop a professional-grade computational implementation of the Theory, yielding a distributed version control system for all the above, as well as computationally efficient (possibly GPU-based) type-checking and proof-checking APIs.
TA1.3 Human-computer interface: to create a very efficient user experience for eliciting and composing components of world-models, goals, constraints, interactively collaborating with AI-powered “assistants” (from TA2),
and run-time monitoring and interventions.

TA2 Machine learning

TA2.1 World-modelling ML: to develop fine-tuned AI systems that are fluent in the TA1.1 language of world-models, and interact with users as assistants.
TA2.2 Proof-search ML: to develop fine-tuned AI systems as search heuristics for automated proving techniques that interact in the TA1.1 language of proofs with the TA1.2 proof-checker.

TA2.3 Training for certifiable ML: to develop an automated “training loop” for autonomous systems that can be certified as meeting their specifications. The most promising approach is training “backup controllers” that can take over and certifiably ensure safety anywhere in a local neighbourhood of state space containing the reachable set over a short time horizon under an
advanced AI system’s policy, see also the Black-Box Simplex Architecture.
TA2.4 Sociotechnical: to leverage social-choice theory to develop collective deliberation and decision-making processes about AI specifications and about AI deployment/release decisions

TA3 Applications: to elicit functional and nonfunctional requirements from customers in a particular sector, design simplified test problems on a spectrum of complexity, and
ultimately to demonstrate deployable solutions leveraging TA1 and TA2 tools.

For further information on this funding call, please see here

Eligibility

ARIA welcome applications from across the R&D ecosystem, including individuals, universities, research institutions, small, medium and large companies, charities and public sector research organisations.

ARIA’s primary focus is on funding those who are based in the UK. However, funding will be awarded to organisations outside the UK if we believe it can boost the net impact of a programme in the UK. If you are a non-UK applicant, you must therefore outline any proposed plans or commitments that will contribute to the programme in the UK within the project’s duration.

Exclusions

ARIA are not looking for complete proposals that purport to address all the objectives above, but rather identifying key subproblems that are plausibly critical in order to
do so.
In this solicitation we are not yet looking for proposals to build software, except as part of “experimental mathematics”, development of theory using a proof assistant, or as early proof-of-concept research code to demonstrate a new idea.
● ARIA are unlikely to be interested in proposals for approaches that have substantial conceptual friction with a probabilistic approach or with a causal approach to
modelling.

Funding Costs

ARIA are looking to fund this Technical Area 1.1 with up to £3.5M in total for the first year, and expect to make 10 to 16 awards in this Technical Area.

Back

R&D Tax Credits

Grant Funding

Other Services

Insights

Events

Safeguarded AI: TA1.1 Theory

Overview

Scope

Eligibility

Exclusions

Funding Costs