Inspect: A framework for large language model evaluations
Department for Science, Innovation and Technology (DSIT)
Ministerial department
https://www.gov.uk/government/organisations/department-for-science-innovation-and-technology
Total FTE: 2,275·Digital & data FTE: 130
Sub-organisations: Government Digital Service (GDS), Intellectual Property Office (IPO), Ordnance Survey (OS), UK Space Agency, UK Research and Innovation (UKRI), Met Office
Stars of active repositories
3,072
Active repositories
41
Live repositories
150
Unavailable repositories
46
Languages of active repositories
- 1.Python (39%)
- 2.Java (22%)
- 3.C# (12%)
Active: currently on GitHub, not archived, and pushed to within 180 days. Live: currently on GitHub. Unavailable: previously on GitHub but not currently found.
GitHub accounts
UKGovernmentBEIS, CDEIUK (inactive)
Repositories
Showing all 41 active repositories, sorted by stars
Collection of evals for Inspect AI
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
Extract residual-stream activations and apply steering vectors (including activation oracles) to any vLLM model during inference.
A Kubernetes sandbox environment for use with inspect_ai
An Inspect extension for agentic cyber evaluations
Reproducing "Natural Emergent Misalignment from Reward Hacking" (MacDiarmid et al., Anthropic 2025) with open-source models. Includes reward-hackable RL environments, misalignment evaluations, training configs, and evaluation scripts. Models trained on OLMo (7B, 32B) and GPT-OSS (20B, 120B).
Accompanying code for Async Control: Stress-testing Asynchronous Control Measures for LLM Agents paper
Report Official Development Assistance (RODA)
Energy Label Generation Service
A Proxmox sandbox environment for use with inspect_ai
An EC2 sandbox environment for use with inspect_ai
This repository created for public search portal
Granting Authority Schema Service
Source code report for METS (Manage your UK Emissions Trading Scheme reporting service)
Application code for the UK Emissions Trading Scheme Registry
Bulk upload of subsidery awards into data base by using excel file
ORB - Online Corporate Reporting and Risk Management
Climate Change Agreements service application code
A proof-of-concept eval testing whether models continue with (or mention) pre-filled malign behaviours.
Kosmos: An AI Scientist for Autonomous Discovery - An implementation and adaptation to be driven by Claude Code or API - Based on the Kosmos AI Paper - https://arxiv.org/abs/2511.02824
AI identity disclosure evaluations for text and speech interactions
Code relating to the Data Management System Webportal implementation