Modeling the semantic layer

The semantic model is the layer between your warehouse and the Streya agent. It defines, once, what your data means — which tables matter, how they join, what “revenue” or “active customer” actually is — so every answer is computed from the same approved definitions.

Today, semantic models are built together with the Streya team during onboarding. This page explains what the model is made of and what to prepare so that process goes fast.

The two building blocks

Cubes model the data. Each cube maps to a table (or query) in your warehouse and defines:

Dimensions — fields to group and filter by: dates, categories, regions, statuses.
Measures — the metrics: revenue, order count, average basket, distinct customers.
Joins — how this cube relates to others (orders → customers, orders → products).

Views package cubes for end users. A view combines one or more cubes into a single, curated dataset — only the fields that matter, with clear names. Views are what the agent sees and analyzes; cubes are the plumbing underneath.

warehouse tables  →  cubes (modeling)  →  views (curated datasets)  →  agent

A typical workspace has a handful of views, each designed around a job: sales_analytics for revenue questions, inventory_health for stock questions, and so on.

One metric, one definition

This is the most important thing a semantic model does, and it’s worth dwelling on: there should be exactly one path to each metric. Every KPI your business cares about — net revenue, active customers, margin — should have a single, agreed definition, and that definition should live in the model and nowhere else.

The reason is simple. When a metric can be calculated two ways, the agent has to guess which one you meant — and so does every person reading the answer. Two definitions of “net revenue” don’t give you flexibility; they give you two numbers that disagree, and no way to tell which is right.

Take a concrete edge case: do unfulfilled orders count toward net revenue? There’s a reasonable argument either way. But the model’s job isn’t to capture both arguments — it’s to settle the question. Decide upfront, yes or no, encode that one answer in the measure, and stick with it. Don’t define a net_revenue and a net_revenue_incl_unfulfilled and leave the agent to choose; pick the one your business means when it says “net revenue” and make that the definition.

Doing this well is genuinely hard in a real business — KPIs are often fuzzy, owned by different teams, or calculated slightly differently in different spreadsheets. That difficulty is exactly why it’s worth the time. Pinning down a single definition for each metric is the single highest-leverage thing you can do during modeling: every answer the agent gives downstream inherits that clarity.

Descriptions: how the agent learns your business

Every cube field and every view can carry a description, and these descriptions are read by the agent when it analyzes your data. They’re where your business knowledge lives:

On a view: what the dataset is, its grain, and when to use it. “One row per (month × customer × product). Use for revenue trends and customer-mix questions.”
On a field: what it means and how to use it. “Returns are stored as negative quantities.” “Currency is CAD.” “Use net_sales for external reporting, not gross_sales.”

This is the difference between a generic AI and one that answers like someone who knows your business. The more of this you surface during modeling, the better the answers.

What makes a good field description

A field name like gmv or acct_status tells the agent almost nothing. A good description fills in what a new analyst would have to ask a colleague. The most useful things to include:

A plain-language business definition. What the field actually represents, in the terms your business uses. “Gross merchandise value — total value of goods sold before discounts, returns, or fees.”
Alternate terminology and acronyms. The other names people use for this field, so the agent connects a question to the right column. “Also called GMV, top-line, or ‘gross sales’ by the finance team.”
The value list, when the set is finite. For status/category/enum fields, spell out every value and what it means. “active, trial, churned, paused. paused is a temporary hold and still counts as a customer; churned does not.”
Conventions and gotchas. Sign conventions, units, currency, known data-quality quirks. “Returns are negative quantities. Currency is CAD.”

What to prepare for a modeling session

You don’t need to write anything technical. Bring:

The questions you want answered. Five to ten real questions your team asks (“which accounts are declining?”, “what’s our margin by channel?”). The model is designed backwards from these.
Where the data lives. Which tables/exports contain sales, customers, products, targets.
Your definitions. How you calculate the key metrics — what counts as revenue, which statuses are excluded, fiscal calendar quirks.
The gotchas. Sign conventions, duplicate item codes across channels, known data-quality issues, the things you’d warn a new analyst about. These become field descriptions.

Iterating on the model

A semantic model is never finished on day one. When the agent gets something wrong — uses the wrong metric, misreads a convention — that’s usually a missing description, and it’s a one-line fix. Tell your Streya contact, or note it directly in a conversation; refining the model is a normal, continuous part of using Streya.

Going deeper

When you’re ready to read or author model files yourself:

Cubes reference — full YAML schema for cubes.
Views reference — full YAML schema for views.