Inspect: A framework for large language model evaluations
Department for Science, Innovation and Technology (DSIT)
Ministerial department
https://www.gov.uk/government/organisations/department-for-science-innovation-and-technology
Total FTE: 2,275·Digital & data FTE: 130
Sub-organisations: Government Digital Service (GDS), Intellectual Property Office (IPO), Ordnance Survey (OS), UK Space Agency, UK Research and Innovation (UKRI), Met Office
Stars of active repositories
3,302
Active repositories
45
Live repositories
152
Unavailable repositories
48
Languages of active repositories
- 1.Python (44%)
- 2.Java (20%)
- 3.TypeScript (11%)
Active: currently on GitHub, not archived, and pushed to within 180 days. Live: currently on GitHub. Unavailable: previously on GitHub but not currently found.
GitHub accounts
UKGovernmentBEIS, CDEIUK (inactive)
Repositories
Showing all 45 active repositories, sorted by stars
Collection of evals for Inspect AI
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
Extract residual-stream activations and apply steering vectors (including activation oracles) to any vLLM model during inference.
A Kubernetes sandbox environment for use with inspect_ai
An Inspect extension for agentic cyber evaluations
Reproducing "Natural Emergent Misalignment from Reward Hacking" (MacDiarmid et al., Anthropic 2025) with open-source models. Includes reward-hackable RL environments, misalignment evaluations, training configs, and evaluation scripts. Models trained on OLMo (7B, 32B) and GPT-OSS (20B, 120B).
Accompanying code for Async Control: Stress-testing Asynchronous Control Measures for LLM Agents paper
Report Official Development Assistance (RODA)
An EC2 sandbox environment for use with inspect_ai
Energy Label Generation Service
A Proxmox sandbox environment for use with inspect_ai
This repository created for public search portal
Granting Authority Schema Service
Application code for the UK Emissions Trading Scheme Registry
Source code report for METS (Manage your UK Emissions Trading Scheme reporting service)
A proof-of-concept eval testing whether models continue with (or mention) pre-filled malign behaviours.
Bulk upload of subsidery awards into data base by using excel file
ORB - Online Corporate Reporting and Risk Management
Climate Change Agreements service application code
CHPQA Application Repository
Lie Detectors for the Did you lie? paper
Kosmos: An AI Scientist for Autonomous Discovery - An implementation and adaptation to be driven by Claude Code or API - Based on the Kosmos AI Paper - https://arxiv.org/abs/2511.02824
AI identity disclosure evaluations for text and speech interactions