Monitor and accelerate
progress in the science of AI safety

Scientific software and KPIs for faster, better AI safety research

The science of understanding and controlling AI isn't moving fast enough

We're fixing this by:

Developing indicators of research quality & performance

Monitoring and reporting these indicators

Building products that help accelerate the rate of progress

Problems we're solving

Evaluations: Open-source benchmarks are (too often) broken, contaminated, saturated, difficult to run, and/or poorly documented
Data: Experimental data (e.g. eval logs) are rarely shared or reused
Experiments: generalisation and predictive accuracy aren't monitored, because experiments aren't updated or extended with new models or eval settings

Current projects

Inspect Evals

We are improving Inspect Evals to align with established best practices across other areas of science - such as FAIR data principles and reproducible data analysis standards (e.g. CERN's REANA).

Evaluation audits

We're developing and validating KPIs for AI evaluations, and will be reporting them publicly across new and existing benchmarks.

Things you might be wondering

Will you publish research?: Yes. Our research will focus on clarifying quality and performance indicators for measurement and prediction, often extending existing frameworks and applying them practically. For example, there is significant work required to map standards from the science of measurement to AI evaluation.
Who are your main collaborators?: In the past we've collaborated closely with scientists and engineers across the UK AI Security Institute, Epoch AI, and Meridian Labs, and will continue to work with them closely in the future.
Where is Generality Labs located?: We're a remote-first organisation, based in London. We're incubated and sponsored by Arcadia Impact, a UK-based charity.

Join us

We're a group of engineers and scientists, spinning off from Arcadia Impact. If you're a scientist, engineer, or generalist who thinks our mission is important and you have skills to bring, we invite you to introduce yourself.

Introduce yourself

2 minute form

Monitor and accelerate progress in the science of AI safety