Backing-dataset mapping — wiring columns to attributes
Connect a real dataset to the entity defined in the previous lesson. 1:1 column-to-attribute mapping, Identity Keys mapping, automatic backend sink sync, and the question of whether one dataset can back multiple entities.
The previous lesson defined the shape of an entity. This lesson connects a dataset so that real data flows into that shape. Once mapping is done, the backend sink loads instances into the graph database automatically, and the next lesson's Graph explorer is where the first nodes appear.
Backing dataset in one line
Entities and relations are the semantic layer. The backing dataset is the real tabular data that holds up the semantic layer.
- One entity has one backing dataset.
- The dataset's References tab automatically lists this entity — coming in from the dataset side also shows you who is using this dataset as a backing at a glance.
- The dataset must live in the same collection to appear in the mapping dropdown. This is the continuation of the "one collection = one ontology scope" rule.
Mapping procedure
The flow of connecting the previous lesson's IOT_Machine entity to the machines dataset in the same collection:
- On the builder canvas, click the
IOT_Machinenode to open the inspector. - Choose the Data Source (or Mapping) tab in the inspector.
- From the Select dataset dropdown, pick the
machinesdataset in the same collection. Datasets in other collections are intentionally hidden. - In the Field mapping area, each entity attribute row gets a dataset-column dropdown next to it. Match them 1:1.
- Identity Keys mapping — Next to the
machine_idyou checked in the previous lesson, specify which dataset column plays the PK role. Without this mapping, per-instance fetch isn't possible.
The IOT_Machine mapping is straightforward because the column names match exactly:
| Entity attribute | Dataset column | Note |
|---|---|---|
machine_id (Identity Key, Display) | machine_id | Plays the PK role |
sensor_count | sensor_count | |
reading_count | reading_count | |
latest_recorded_at | latest_recorded_at | Timestamp(ms) |
anomaly_count | anomaly_count | |
latest_health_score | latest_health_score | Float (0–1) |
Drawing out where a sensor master mirror in the same collection becomes the backing for IOT_Sensor (with sensor_id as the Identity Key) helps the next graph node from lesson 02 settle into place.
When column names don't match
It's common for an operational dataset's column names to differ from your domain vocabulary (e.g. mch_id, ts_ms). This surface does not rename columns — the mapping dropdown only decides which dataset column fills which attribute. To change the column names of the dataset itself, take one of two paths:
- Engineer Path lesson 03's column-select / rename node — Clean up upstream in a pipeline, build a mart dataset, and map that mart as the backing.
- Engineer Path lesson 04's code node — Apply more complex transformations (type casts, unit normalization), land into a mart.
Keep this surface as a clean mapping screen and delegate column cleanup to the engineer surfaces.
Sink — automatic once mapping is done
At the moment mapping completes, the following happens:
- The backend sink infrastructure automatically syncs rows of the backing dataset to nodes in the graph database.
- The previous version's EntitySink / RelationSink UI has been removed. Sink is an infrastructure step the user doesn't trigger directly.
- Right after mapping, the node count next to the corresponding label (e.g.
IOT_Machine) in the Graph explorer starts filling in.
The instance Data tab — row-level confirmation
After mapping, open the Data tab in the inspector. The backing dataset's rows are paginated and the following automatically apply:
- Identity-keys-based fetch — Row-level query by PK columns. Clicking a row opens the instance detail panel.
- Display Column-first sort — The display name is the primary column, system name secondary.
- Column filter and sort — Same UX as any other dataset's Data tab.
- Direct instance editing is not supported — Edits happen via the backing dataset or a pipeline. The surface enforces the boundary between the semantic layer and the data source.
Can one dataset back multiple entities?
Yes. Knowing two common patterns up front shortens your time in your own domain.
- Parent / child split — One transaction dataset becomes the backing of both the Order entity and the Order line entity. Just keep the Identity Keys different (
order_idvs(order_id, line_no)). - Same table, two viewpoints — One store master backs both Branch entity and Site entity at the same time. Map only different attribute subsets.
But using the same dataset as the backing of a relation requires more care — the next lesson covers the relation backing pattern separately.
Self-check
- The entity inspector's Data tab paginates rows of the backing dataset.
- In the Graph explorer's left metadata panel, the node count next to the corresponding label is no longer 0 (wait a few seconds to a minute).
- The backing dataset's own page References tab lists this entity.
- The Identity Keys mapping isn't empty — if it is, go back to step 3 and fill one row.
Next lesson
The next lesson connects two entities with a relation by drag on the builder canvas. We'll cover cardinality (1:1, 1:N, N:N), naming conventions (verb_phrase), and the fact that relations can also have a backing dataset.