AI Security Wire

Published

- 4 min read

NIST Publishes AI RMF 2.0 with New Guidance on Adversarial Machine Learning

img of NIST Publishes AI RMF 2.0 with New Guidance on Adversarial Machine Learning

NIST has published version 2.0 of the AI Risk Management Framework (AI RMF), introducing substantial new content on adversarial machine learning threats, model supply chain security, and AI-specific incident response procedures. The update represents the most significant revision since the framework’s initial release in January 2023 and reflects a materially more threat-aware posture across all four core functions.

What Changed

The 2.0 update expands the AI RMF from a primarily governance-focused document into one that addresses technical security threats with considerably more specificity. Three areas see the most significant new content.

1. Adversarial ML Threat Taxonomy

The updated framework introduces a formal taxonomy of adversarial ML threats, drawing heavily on NIST’s earlier Special Publication on Adversarial Machine Learning (SP 1270). Key threat categories now explicitly addressed:

Threat CategoryDescription
Evasion attacksTest-time perturbations causing misclassification
Poisoning attacksTraining data or model manipulation
Extraction attacksModel stealing via query access
Inference attacksMembership inference and data reconstruction
Backdoor/trojanTrigger-activated malicious behaviour
Prompt injectionInstruction override in LLM deployments

Each category includes recommended mitigations mapped to the framework’s GOVERN, MAP, MEASURE, and MANAGE functions.

2. Model Supply Chain Security

The 2.0 framework introduces a dedicated section on AI supply chain risk, acknowledging that most deployed AI systems incorporate third-party components — pre-trained models, fine-tuned adapters, training datasets, and ML libraries — each of which represents a potential attack surface.

Key new requirements in the supply chain section:

  • Provenance documentation — organisations deploying AI systems should maintain records of the origin and lineage of all model components, including pre-trained base models, fine-tuning datasets, and significant framework dependencies.
  • Integrity verification — model artifacts should be verified against a cryptographic hash or signature prior to deployment. NIST recommends treating model weights as code artifacts subject to the same integrity controls.
  • Third-party risk assessment — AI vendors and model providers should be assessed against a set of minimum security criteria, including secure development practices, incident disclosure procedures, and adversarial robustness testing.

3. AI Incident Response

The most operationally significant new section covers AI-specific incident response. NIST identifies several categories of AI security incident that require distinct response procedures:

Model compromise incidents — incidents where the integrity of a deployed model is suspected. Recommended immediate actions include:

  1. Take the model offline pending investigation
  2. Preserve model weights and configuration (do not overwrite with a new version until provenance is established)
  3. Retrieve and preserve inference logs for the relevant period
  4. Initiate behavioural comparison against a known-clean baseline

Inference attack incidents — evidence of systematic model querying consistent with extraction or inversion attacks. Recommended actions include rate limiting and query anomaly detection while investigating the scope of extraction.

Data poisoning incidents — suspected manipulation of training data. Requires investigation of the full data pipeline from ingestion to training, with particular attention to any third-party data sources or annotation vendors.

What the Framework Does Not Address

Despite the improvements, several gaps remain:

Generative AI specifics — the framework’s treatment of generative AI and LLM-specific risks (hallucination with security implications, prompt injection, agentic system risks) remains relatively thin compared to its coverage of classical ML threats.

Enforcement and compliance — like its predecessor, AI RMF 2.0 is a voluntary framework. It does not carry the force of regulation and provides no audit or certification mechanism. Organisations seeking regulatory compliance (for example, under the EU AI Act) will need to map the framework to specific regulatory requirements, which NIST has committed to supporting with additional mapping documentation.

Quantitative risk measurement — the MEASURE function remains largely qualitative. Security teams looking for specific metrics for adversarial robustness, model confidence calibration, or privacy leakage will need to supplement the framework with more technically specific guidance.

Practical Implications for Security Teams

For organisations already using AI RMF 1.0 as a baseline:

  1. Map existing controls to the new adversarial ML taxonomy — identify which threat categories you have mitigations for and which have gaps.
  2. Prioritise model supply chain review — assess the provenance and integrity controls in place for all third-party model components in production.
  3. Develop AI-specific incident response runbooks — the framework now provides sufficient structure to build IR procedures around; use it.
  4. Engage AI vendors on their AI RMF alignment — use the supply chain section as the basis for vendor security questionnaires.

The full AI RMF 2.0 document, accompanying playbooks, and crosswalk to the EU AI Act are available on the NIST AI RMF website.