Most organisations chasing AI agents in 2025 are running into the same wall.
The models are impressive. The agent frameworks look promising. But once the agents start answering real business questions, they give conflicting answers, hallucinate on stale data, or confidently report metrics that don't match what the CFO sees in their dashboard.
The root cause is almost never the AI. It's the data layer underneath.
I see the same pattern across teams. People invest heavily in AI tooling while their metrics still live in spreadsheets, conflicting dbt models, and tribal knowledge. The result is AI agents that can't be trusted — and self-service analytics that creates more confusion than clarity.
The missing foundation is data contracts plus a semantic layer. This piece is about why that combination has become essential in the AI-agent era, and how to put it in place properly.
Why self-service and AI agents are failing
The problem is simple but painful: no single source of truth for what a metric actually means.
I've worked with companies where "active customer" had seven different definitions across teams. Revenue was calculated three different ways depending on who you asked. Churn rate changed depending on which dashboard you opened.
This is not an unusual position. In dbt Labs's 2024 State of Analytics Engineering survey of 456 data practitioners, 57% cited data quality as one of their top obstacles — the single most common pain point in the report. Behind that headline number is the same root cause: definitions and trust drift faster than teams can hand-fix them.
When an AI agent tries to use this kind of data, it has no way to know which definition is correct. It picks one, or averages them, and gives an answer that sounds authoritative but is fundamentally wrong.
Traditional approaches — dbt tests, documentation wikis, data catalogs — help, but they don't solve the core issue. They document the problem. They don't prevent it.
Data contracts and semantic layers change this. They turn governance from a manual process into enforceable infrastructure.
What a proper data contract actually looks like
A data contract is a machine-readable agreement between data producers and consumers. It defines schema, quality expectations, SLAs, ownership, and semantics — all in version-controlled YAML.
A real production-shape example, in the spirit of the Open Data Contract Standard (now hosted by The Linux Foundation, with PayPal's original template as one of its starting points):
contracts/orders.yaml
apiVersion: v1
kind: DataContract
metadata:
name: orders
version: 2.3.0
owner: payments-team@company.com
description: "Order transactions from all sales channels"
tags: [commerce, transactions, pii]
schema:
type: object
properties:
order_id:
type: string
format: uuid
description: "Unique order identifier"
pii: false
customer_id:
type: string
description: "Customer who placed the order"
order_amount:
type: number
minimum: 0
description: "Total order value in reporting currency"
order_status:
type: string
enum: [pending, paid, shipped, cancelled, refunded]
created_at:
type: string
format: date-time
quality:
- type: freshness
threshold: 1h
severity: error
- type: row_count
minimum: 1000
window: 24h
- type: null_rate
column: customer_id
maximum: 0.01
sla:
freshness: 15m
availability: 99.5%
governance:
producer: payments-platform-team
consumers:
- analytics-team
- finance-team
- customer-success-ai-agent
change_policy: breaking-changes-require-approvalThe teams that succeed treat contracts as code. They lint them, test them in CI, and fail the pipeline if a contract is violated. The teams that fail treat contracts as documentation — they live in Confluence and are ignored in production.
Putting contracts into practice
Contracts only work if they're enforced. The pattern that holds up:
- Store contracts in Git alongside the code that produces the data.
- Validate in CI using tools like Soda, Great Expectations, or the Data Contract CLI.
- Enforce at runtime with scheduled checks that alert when SLAs are breached.
- Version the contracts so consumers know when something changes.
Public examples worth borrowing from:
- Chad Sanderson's The Rise of Data Contracts (Aug 2022) is the foundational essay that re-introduced the idea to the modern data community, drawing on his work at Convoy. Anything written about data contracts since 2022 is in conversation with that piece.
- GoCardless has documented their data-contracts programme in detail — Andrew Jones's Implementing Data Contracts at GoCardless and his six-months-on update walk through what worked and what didn't.
- PayPal's data-contract template is open-sourced and now folded into the Open Data Contract Standard.
- The Open Data Contract Standard itself is becoming the lingua franca; if you're starting from scratch in 2025, it's the format to adopt.
The trap to avoid is trying to contract everything on day one. Pick two or three critical datasets, write contracts, enforce them, and expand from there.
The semantic layer — turning contracts into reliable metrics
Data contracts define what the data should look like. The semantic layer defines how business metrics are calculated from that data — consistently, everywhere. That's what makes contracts valuable to both humans and AI agents.
The main options in 2025:
| Approach | Best for | Trade-offs | AI-agent fit |
|---|---|---|---|
| dbt Semantic Layer (MetricFlow) | dbt-native teams, internal BI | Good governance, more limited caching | Strong |
| Cube | High-concurrency, embedded analytics, external consumers | More infrastructure, excellent caching | Very strong |
| Warehouse-native (Snowflake / Databricks) | Teams deep in one platform | Less portable, very fast | Good |
I've seen teams succeed with both dbt and Cube. The choice usually comes down to who the primary consumers are — internal BI teams (dbt wins) or external customers and AI agents (Cube wins, mostly because of its API surface and caching).
A real shape of a dbt MetricFlow definition (note the top-level keys: semantic_models is a list, metrics is its own top-level config — this trips people up):
semantic_models/orders.yml
semantic_models:
- name: orders
description: "Order transactions from all sales channels"
model: ref('fct_orders')
entities:
- name: order
type: primary
expr: order_id
- name: customer
type: foreign
expr: customer_id
dimensions:
- name: order_status
type: categorical
- name: created_at
type: time
type_params:
time_granularity: day
measures:
- name: order_count
agg: count
expr: 1
- name: total_revenue
agg: sum
expr: order_amount
metrics:
- name: total_revenue
description: "Sum of revenue across all orders"
type: simple
type_params:
measure: total_revenue
- name: monthly_recurring_revenue
description: "MRR derived from total revenue divided by 12"
type: derived
type_params:
expr: total_revenue / 12
metrics:
- name: total_revenueOnce this exists, BI tools, AI agents, and custom applications all query through the same governed logic. The metric is defined once, in one place, and everyone gets the same answer.
How this directly enables reliable AI agents
AI agents need three things to be trustworthy:
- Consistent definitions — the same metric always means the same thing.
- Freshness guarantees — agents know when data is stale.
- Lineage and auditability — when an agent gives an answer, you can trace where it came from.
Data contracts plus a semantic layer give you all three. Connect an agent to your semantic layer and it stops hallucinating on conflicting definitions; it queries governed metrics and can explain its reasoning by showing the contract and calculation logic behind each number.
In practice, an agent does this through a real API. The dbt Cloud Semantic Layer, for example, exposes a GraphQL endpoint that any client — agent, app, BI tool — can call. The metric name and grain come straight from the governed semantic model, so there is no chance of a definition drifting between agent and dashboard:
agent_query.py
import os
import requests
DBT_SL = "https://semantic-layer.cloud.getdbt.com/api/graphql"
# An agent (or any consumer) asks the semantic layer for MRR by month.
# Note what isn't here: hand-written SQL, business logic, or any chance
# of using a definition that drifts from what the CFO sees in the BI tool.
response = requests.post(
DBT_SL,
headers={"Authorization": f"Bearer {os.environ['DBT_TOKEN']}"},
json={
"query": """
mutation {
createQuery(
environmentId: 12345,
metrics: [{name: "monthly_recurring_revenue"}],
groupBy: [{name: "metric_time", grain: MONTH}]
) { queryId }
}
"""
},
)
query_id = response.json()["data"]["createQuery"]["queryId"]
# ... poll the same endpoint for the result, then return it to the agent.AtScale's What Actually Changed in 2025 and Why It Redefined the Semantic Layer makes the same case from a different angle: 2025 was the year AI exposed the semantic inconsistencies organisations had been quietly tolerating for a decade.
This is the difference between an agent that sounds confident and one that is actually reliable.
Common mistakes
- Writing contracts too late. After the data chaos is already in production. Start early, even if the contracts are imperfect.
- Over-engineering the semantic layer. You don't need 200 metrics on day one. Start with the ten to fifteen that matter most to the business.
- Treating the semantic layer as just another BI tool. It's governance infrastructure. The BI benefit is a side effect.
- No change-management process. Breaking changes to contracts or metrics destroy trust faster than anything else. Have a clear approval and communication path.
A 90-day rollout
If you're starting from scratch, the shape of the work is roughly this. In the first month, pick two or three critical data products, write contracts for them, and enforce them in CI — start with schema, freshness, and ownership; the deeper quality rules can come later. In the second month, build a semantic layer on top of those contracted datasets, define the ten to fifteen metrics that actually matter to the business, and connect one BI tool plus one internal AI use case. In the third month, expand the contract surface, add quality rules and SLAs, train the teams who depend on the data, and start measuring what changes — fewer metric disputes, fewer agent hallucinations, faster onboarding for new use cases.
After ninety days, you'll have the foundation most organisations are still missing.
The point
This is the layer most AI initiatives are missing. Get it right and everything downstream — self-service analytics, AI agents, governance — becomes dramatically easier. Get it wrong and no amount of model improvement will fix it.
If you're struggling with metric drift, broken AI agents, or self-service that creates more questions than answers, this is the foundation to build first.
Share this article



