Skip to content
Mohadata Logo
All Insights

Data Contracts and Semantic Layers: The Missing Foundation for Reliable AI Agents

Metric drift and broken AI agents are symptoms of the same problem. Why data contracts and a semantic layer are the foundation that fixes it at the root.

Mohammad Rahman

Mohammad Rahman

Mohadata

15 June 2025
8 min read
Dark dashboard showing time-series metrics — the kind of view a semantic layer is meant to make consistent across every consumer

Most organisations chasing AI agents in 2025 are running into the same wall.

The models are impressive. The agent frameworks look promising. But once the agents start answering real business questions, they give conflicting answers, hallucinate on stale data, or confidently report metrics that don't match what the CFO sees in their dashboard.

The root cause is almost never the AI. It's the data layer underneath.

I see the same pattern across teams. People invest heavily in AI tooling while their metrics still live in spreadsheets, conflicting dbt models, and tribal knowledge. The result is AI agents that can't be trusted — and self-service analytics that creates more confusion than clarity.

The missing foundation is data contracts plus a semantic layer. This piece is about why that combination has become essential in the AI-agent era, and how to put it in place properly.

Why self-service and AI agents are failing

The problem is simple but painful: no single source of truth for what a metric actually means.

I've worked with companies where "active customer" had seven different definitions across teams. Revenue was calculated three different ways depending on who you asked. Churn rate changed depending on which dashboard you opened.

This is not an unusual position. In dbt Labs's 2024 State of Analytics Engineering survey of 456 data practitioners, 57% cited data quality as one of their top obstacles — the single most common pain point in the report. Behind that headline number is the same root cause: definitions and trust drift faster than teams can hand-fix them.

When an AI agent tries to use this kind of data, it has no way to know which definition is correct. It picks one, or averages them, and gives an answer that sounds authoritative but is fundamentally wrong.

Traditional approaches — dbt tests, documentation wikis, data catalogs — help, but they don't solve the core issue. They document the problem. They don't prevent it.

Data contracts and semantic layers change this. They turn governance from a manual process into enforceable infrastructure.

What a proper data contract actually looks like

A data contract is a machine-readable agreement between data producers and consumers. It defines schema, quality expectations, SLAs, ownership, and semantics — all in version-controlled YAML.

A real production-shape example, in the spirit of the Open Data Contract Standard (now hosted by The Linux Foundation, with PayPal's original template as one of its starting points):

contracts/orders.yaml

apiVersion: v1
kind: DataContract
metadata:
  name: orders
  version: 2.3.0
  owner: payments-team@company.com
  description: "Order transactions from all sales channels"
  tags: [commerce, transactions, pii]
 
schema:
  type: object
  properties:
    order_id:
      type: string
      format: uuid
      description: "Unique order identifier"
      pii: false
    customer_id:
      type: string
      description: "Customer who placed the order"
    order_amount:
      type: number
      minimum: 0
      description: "Total order value in reporting currency"
    order_status:
      type: string
      enum: [pending, paid, shipped, cancelled, refunded]
    created_at:
      type: string
      format: date-time
 
quality:
  - type: freshness
    threshold: 1h
    severity: error
  - type: row_count
    minimum: 1000
    window: 24h
  - type: null_rate
    column: customer_id
    maximum: 0.01
 
sla:
  freshness: 15m
  availability: 99.5%
 
governance:
  producer: payments-platform-team
  consumers:
    - analytics-team
    - finance-team
    - customer-success-ai-agent
  change_policy: breaking-changes-require-approval

The teams that succeed treat contracts as code. They lint them, test them in CI, and fail the pipeline if a contract is violated. The teams that fail treat contracts as documentation — they live in Confluence and are ignored in production.

Putting contracts into practice

Contracts only work if they're enforced. The pattern that holds up:

  1. Store contracts in Git alongside the code that produces the data.
  2. Validate in CI using tools like Soda, Great Expectations, or the Data Contract CLI.
  3. Enforce at runtime with scheduled checks that alert when SLAs are breached.
  4. Version the contracts so consumers know when something changes.

Public examples worth borrowing from:

  • Chad Sanderson's The Rise of Data Contracts (Aug 2022) is the foundational essay that re-introduced the idea to the modern data community, drawing on his work at Convoy. Anything written about data contracts since 2022 is in conversation with that piece.
  • GoCardless has documented their data-contracts programme in detail — Andrew Jones's Implementing Data Contracts at GoCardless and his six-months-on update walk through what worked and what didn't.
  • PayPal's data-contract template is open-sourced and now folded into the Open Data Contract Standard.
  • The Open Data Contract Standard itself is becoming the lingua franca; if you're starting from scratch in 2025, it's the format to adopt.

The trap to avoid is trying to contract everything on day one. Pick two or three critical datasets, write contracts, enforce them, and expand from there.

The semantic layer — turning contracts into reliable metrics

Data contracts define what the data should look like. The semantic layer defines how business metrics are calculated from that data — consistently, everywhere. That's what makes contracts valuable to both humans and AI agents.

The main options in 2025:

ApproachBest forTrade-offsAI-agent fit
dbt Semantic Layer (MetricFlow)dbt-native teams, internal BIGood governance, more limited cachingStrong
CubeHigh-concurrency, embedded analytics, external consumersMore infrastructure, excellent cachingVery strong
Warehouse-native (Snowflake / Databricks)Teams deep in one platformLess portable, very fastGood

I've seen teams succeed with both dbt and Cube. The choice usually comes down to who the primary consumers are — internal BI teams (dbt wins) or external customers and AI agents (Cube wins, mostly because of its API surface and caching).

A real shape of a dbt MetricFlow definition (note the top-level keys: semantic_models is a list, metrics is its own top-level config — this trips people up):

semantic_models/orders.yml

semantic_models:
  - name: orders
    description: "Order transactions from all sales channels"
    model: ref('fct_orders')
 
    entities:
      - name: order
        type: primary
        expr: order_id
      - name: customer
        type: foreign
        expr: customer_id
 
    dimensions:
      - name: order_status
        type: categorical
      - name: created_at
        type: time
        type_params:
          time_granularity: day
 
    measures:
      - name: order_count
        agg: count
        expr: 1
      - name: total_revenue
        agg: sum
        expr: order_amount
 
metrics:
  - name: total_revenue
    description: "Sum of revenue across all orders"
    type: simple
    type_params:
      measure: total_revenue
 
  - name: monthly_recurring_revenue
    description: "MRR derived from total revenue divided by 12"
    type: derived
    type_params:
      expr: total_revenue / 12
      metrics:
        - name: total_revenue

Once this exists, BI tools, AI agents, and custom applications all query through the same governed logic. The metric is defined once, in one place, and everyone gets the same answer.

How this directly enables reliable AI agents

AI agents need three things to be trustworthy:

  1. Consistent definitions — the same metric always means the same thing.
  2. Freshness guarantees — agents know when data is stale.
  3. Lineage and auditability — when an agent gives an answer, you can trace where it came from.

Data contracts plus a semantic layer give you all three. Connect an agent to your semantic layer and it stops hallucinating on conflicting definitions; it queries governed metrics and can explain its reasoning by showing the contract and calculation logic behind each number.

In practice, an agent does this through a real API. The dbt Cloud Semantic Layer, for example, exposes a GraphQL endpoint that any client — agent, app, BI tool — can call. The metric name and grain come straight from the governed semantic model, so there is no chance of a definition drifting between agent and dashboard:

agent_query.py

import os
import requests
 
DBT_SL = "https://semantic-layer.cloud.getdbt.com/api/graphql"
 
# An agent (or any consumer) asks the semantic layer for MRR by month.
# Note what isn't here: hand-written SQL, business logic, or any chance
# of using a definition that drifts from what the CFO sees in the BI tool.
response = requests.post(
    DBT_SL,
    headers={"Authorization": f"Bearer {os.environ['DBT_TOKEN']}"},
    json={
        "query": """
            mutation {
              createQuery(
                environmentId: 12345,
                metrics: [{name: "monthly_recurring_revenue"}],
                groupBy: [{name: "metric_time", grain: MONTH}]
              ) { queryId }
            }
        """
    },
)
query_id = response.json()["data"]["createQuery"]["queryId"]
# ... poll the same endpoint for the result, then return it to the agent.

AtScale's What Actually Changed in 2025 and Why It Redefined the Semantic Layer makes the same case from a different angle: 2025 was the year AI exposed the semantic inconsistencies organisations had been quietly tolerating for a decade.

This is the difference between an agent that sounds confident and one that is actually reliable.

Common mistakes

  • Writing contracts too late. After the data chaos is already in production. Start early, even if the contracts are imperfect.
  • Over-engineering the semantic layer. You don't need 200 metrics on day one. Start with the ten to fifteen that matter most to the business.
  • Treating the semantic layer as just another BI tool. It's governance infrastructure. The BI benefit is a side effect.
  • No change-management process. Breaking changes to contracts or metrics destroy trust faster than anything else. Have a clear approval and communication path.

A 90-day rollout

If you're starting from scratch, the shape of the work is roughly this. In the first month, pick two or three critical data products, write contracts for them, and enforce them in CI — start with schema, freshness, and ownership; the deeper quality rules can come later. In the second month, build a semantic layer on top of those contracted datasets, define the ten to fifteen metrics that actually matter to the business, and connect one BI tool plus one internal AI use case. In the third month, expand the contract surface, add quality rules and SLAs, train the teams who depend on the data, and start measuring what changes — fewer metric disputes, fewer agent hallucinations, faster onboarding for new use cases.

After ninety days, you'll have the foundation most organisations are still missing.

The point

This is the layer most AI initiatives are missing. Get it right and everything downstream — self-service analytics, AI agents, governance — becomes dramatically easier. Get it wrong and no amount of model improvement will fix it.

If you're struggling with metric drift, broken AI agents, or self-service that creates more questions than answers, this is the foundation to build first.

Talk to us

Move from AI demos to production.

If you're working through these foundation problems, we'd like to help. We'll send back a short, honest assessment of where you are and what we'd do first.