본문으로 건너뛰기
Data Engineer Path

Your first pipeline — three nodes and one run

Wire source → transform → sink in the Workflow editor and run them once, turning the source dataset from the previous lesson into a mart dataset that analysts can consume.

8 min

The smallest unit an engineer should take through portal end-to-end is source → transform → sink, plus one run. This lesson takes the source dataset from the previous lesson all the way to a mart dataset that analysts can consume.

Prerequisites

Reuse the dataset produced by the connector you registered in the previous lesson (e.g. src_postgres_orders) as the source. If you skipped that lesson, a CSV-uploaded dataset or the one you built in lesson 03 of the Analyst Path is fine. The output doesn't need to be clean — this lesson is about learning the flow.

Open the Workflow editor

  1. Click Pipelines in the left sidebar, then choose Create pipeline inside the collection.
  2. Pick a name (e.g. mart_orders_daily). Naming the pipeline after the meaning of its output dataset is what saves you during debugging six lessons later.
  3. The Workflow editor opens with an empty canvas.

The node library is on the left of the canvas, the work area is in the center, and the per-node configuration panel is on the right.

Place three nodes

  1. Drag Node library → Source → Read Dataset onto the canvas. In the configuration panel on the right, set collection and dataset to the source dataset you produced in the previous lesson.
  2. Drag Node library → Transform → Select / Rename Columns onto the canvas. Keep only the columns you want to expose to analysts, and rename any raw operational names (e.g. ord_dt_str) into human-readable aliases (e.g. order_date).
  3. Drag Node library → Sink → Write Dataset onto the canvas. In the sink node's configuration panel, enter a new target dataset name (e.g. mart_orders_daily). It will be created automatically in the same collection — or in a separate analyst-facing collection if you choose one.

At this point all three nodes sit on the canvas but aren't connected yet.

Connect the three nodes

Drag from the source node's right port to the transform node's left port, then from the transform node's right port to the sink node's left port. Two arrows form. Incompatible ports refuse the connection — confirm that the data-type cue shown by the port color matches.

Once connected, the Validate indicator in the upper-right of the canvas should turn green. If it's red, reopen one of the three nodes' configuration panels and fill in what's missing (most often an unselected dataset or unmapped columns).

Run it once

Click Run in the upper-right and the pipeline runs once, with execution logs streaming live in the bottom panel. When it finishes, confirm three things:

  • The new dataset (mart_orders_daily) appears in the target collection tree.
  • The row count matches the input dataset (if the transform only selects columns, the row count should be identical).
  • The column names match what you intended.

What changes if you run it again

The sink node's configuration panel has a write mode option.

  • overwrite — Replaces the target dataset entirely on every run. The most common default for mart tables.
  • append — Adds new rows after the existing ones. Natural for event-style data.
  • upsert — Updates existing rows by key, inserts new ones. Suitable for mirroring a production master table.

Leave this lesson on the default overwrite and run two more times. If the row count is identical across runs, overwrite is doing what you expect. The operational implications of each mode are covered further in lesson 05 (scheduling and run modes).

Self-check

  • Does the new mart dataset show up in the collection tree?
  • Briefly step into the Analyst Path's view — does the same mart dataset show up identically in the analyst's collection? Confirm directly that engineer output = analyst input closes inside one portal.
  • If you run the same pipeline twice, does the output dataset's row count stay consistent (overwrite default)?
  • Did you name the three nodes consistently in this lesson? (Future-you, debugging this in two weeks, will thank you.)

Next lesson

The next lesson covers what to do when the standard nodes (select, filter, aggregate, join) aren't enough for a transformation — by dropping a short Python or SQL snippet into a code node.