Published
- 3 min read
AI Software Bill of Materials: Tracking Model Components
The software bill of materials (SBOM) — a machine-readable inventory of software components and their dependencies — is now a well-established security practice for traditional software. The AI equivalent, the AI-SBOM, applies the same principle to the more complex component graph of deployed AI systems. EU AI Act Article 11 explicitly requires technical documentation covering model provenance; the NIST AI RMF 2.0 supply chain section treats model component tracking as a baseline control. This guide covers what an AI-SBOM should contain and how to implement it.
Why AI Systems Need an Extended SBOM
Traditional SBOMs cover code and library dependencies. An AI system has additional components that standard SBOM tools don’t capture:
| Component | Why It Matters |
|---|---|
| Base model (weights + architecture) | Backdoors, biases, and capabilities are properties of the weights |
| Fine-tuning dataset | Dataset provenance affects copyright, PII, and poisoning risk |
| RLHF / alignment data | Determines safety behaviour; manipulation here affects all downstream uses |
| LoRA / adapter weights | Can override base model behaviour; need independent provenance |
| Prompt templates / system prompts | Define application behaviour; versioning and integrity matter |
| Inference framework | Serialisation vulnerabilities, hardware-specific behaviour |
| Embedding model (for RAG) | Affects retrieval; poisoning here affects all downstream queries |
A security incident or compliance audit may require answering questions about any of these components. Without an AI-SBOM, organisations cannot answer them.
Minimum AI-SBOM Fields
Drawing on CycloneDX ML extensions and emerging regulatory guidance, a minimum AI-SBOM for a deployed model should capture:
Base Model Record
{
"type": "machine-learning-model",
"name": "llama-3-70b-instruct",
"version": "3.0",
"purl": "pkg:huggingface/meta-llama/Meta-Llama-3-70B-Instruct@3.0",
"hashes": [
{ "alg": "SHA-256", "content": "a3f2...b91c" }
],
"supplier": { "name": "Meta AI", "url": "https://ai.meta.com" },
"licenses": [{ "id": "LLAMA-3-COMMUNITY" }],
"properties": [
{ "name": "training-compute-flops", "value": "1.8e24" },
{ "name": "training-data-cutoff", "value": "2023-12" },
{ "name": "parameters", "value": "70000000000" }
]
}
Fine-Tune / Adapter Record
{
"type": "machine-learning-model",
"name": "llama-3-70b-customer-service-lora",
"version": "1.4.2",
"hashes": [{ "alg": "SHA-256", "content": "7c4a...e230" }],
"dependencies": ["pkg:huggingface/meta-llama/Meta-Llama-3-70B-Instruct@3.0"],
"properties": [
{ "name": "adapter-type", "value": "LoRA" },
{ "name": "training-dataset-id", "value": "ds-customer-service-v3" },
{ "name": "training-date", "value": "2026-03-15" },
{ "name": "trainer", "value": "ml-team@example.com" }
]
}
Training Dataset Record
{
"type": "data",
"name": "customer-service-training-v3",
"version": "3.0",
"hashes": [{ "alg": "SHA-256", "content": "1b9f...4d72" }],
"properties": [
{ "name": "record-count", "value": "142000" },
{ "name": "pii-assessed", "value": "true" },
{ "name": "pii-assessment-date", "value": "2026-02-28" },
{ "name": "data-sources", "value": "internal-crm,synthetic-generation" },
{ "name": "collection-date-range", "value": "2024-01/2026-01" },
{ "name": "data-controller", "value": "example-corp" }
]
}
Tooling
CycloneDX ML
The CycloneDX specification includes machine learning extensions (cdx:ml) that extend the standard BOM format. The cyclonedx-python-lib supports generating AI-SBOMs programmatically:
from cyclonedx.model.bom import Bom
from cyclonedx.model.component import Component, ComponentType
from cyclonedx.model import HashType, HashAlgorithm, XsUri
from packageurl import PackageURL
bom = Bom()
model_component = Component(
component_type=ComponentType.MACHINE_LEARNING_MODEL,
name='llama-3-70b-instruct',
version='3.0',
purl=PackageURL(
type='huggingface',
namespace='meta-llama',
name='Meta-Llama-3-70B-Instruct',
version='3.0'
),
hashes=[HashType(
alg=HashAlgorithm.SHA_256,
content='a3f2...b91c'
)]
)
bom.components.add(model_component)
Model Registry Integration
AI-SBOMs should be generated at model registration time and stored alongside the model artefact. An example MLflow integration:
import mlflow
import json
def register_model_with_sbom(model_path: str, sbom: dict, model_name: str):
with mlflow.start_run():
mlflow.log_artifact(model_path, "model")
mlflow.log_dict(sbom, "ai-sbom.json")
mlflow.set_tags({
"sbom.version": sbom["version"],
"sbom.base-model": sbom["components"][0]["name"],
"sbom.training-date": sbom["metadata"]["timestamp"]
})
mlflow.register_model(
f"runs:/{mlflow.active_run().info.run_id}/model",
model_name
)
Using the AI-SBOM
Incident Response
When a vulnerability is disclosed in a base model or dependency, query the AI-SBOM registry to identify all deployed systems using that component:
def find_deployments_using_model(base_model_purl: str, registry) -> list:
return [
deployment for deployment in registry.all_deployments()
if base_model_purl in deployment.sbom.dependency_graph()
]
Regulatory Compliance
The EU AI Act requires documentation of training data and model provenance for high-risk systems. An AI-SBOM that captures the dataset records above, including PII assessment status and data controller identity, directly satisfies Article 11 documentation requirements.
Supply Chain Auditing
Before deploying a third-party model or adapter, require a signed AI-SBOM from the supplier. Verify that:
- The base model hash matches the published release
- The training dataset provenance is documented
- No known-vulnerable model versions are referenced