본문으로 건너뛰기
Data Engineer Path

Scheduling and run modes — pipelines that run on their own

Trigger the same pipeline on a cron or event basis, set overwrite · append · CDC as the run mode, and route failure notifications to the right channel.

8

So far you've had to click Run by hand to move data through the pipeline. To put it into production, two things need to be set: the pipeline runs on its own at a fixed interval, and someone is told when it fails. This lesson covers both at once.

The schedule builder — three triggers

Open Settings → Schedule at the top of the pipeline. You can choose one of three triggers:

  1. Time-based (cron) — Every day at 03:00, every Monday at 06:00, every 15 minutes, and so on.
  2. Event-based — Waits on the success or failure of another pipeline. A mart-stage pipeline commonly waits on a source-stage pipeline's success.
  3. Manual only — Leave it here while the pipeline is still an experiment.

For most mart pipelines, layering (1) "once a day overnight" with (2) "triggered by the prior stage's success" gives you a safety net: if the overnight run is skipped for some reason, the dependent pipeline still notices the missing success event.

Cron expressions through the builder

The schedule builder lets you pick minute · hour · day · month · weekday through dropdowns without writing a cron expression by hand. To enter an expression directly, switch the same panel into Enter expression manually. The default time zone is the instance operating time zone (Asia/Seoul); use an explicit zone only when you mean a different one.

Run modes and idempotency

You saw the sink node's write mode briefly in the previous lesson. With scheduling on top, the meaning sharpens.

Write modeResult on re-runCommon use
overwriteRegenerates the whole datasetMart tables where a daily full scan is fast enough
appendAppends new rows after the existing onesEvent logs, time-series measurements
upsertUpdates by key, inserts the missingMirroring a production master table
CDC (change data capture)Pulls only rows changed since the last runHourly sync of a large production table

Pipelines that go into production debug better when they're safe to re-run — that is, idempotent. overwrite and upsert are naturally idempotent. append needs you to guard against the same key arriving twice on the input side.

CDC is only available when the external system exposes a change timestamp column (updated_at) or a change log (e.g. PostgreSQL logical replication). If you specified an incremental column during the connector mapping step in lesson 02, portal activates CDC mode automatically.

Failure notifications — channels and ladders

In the same Settings panel, the Notifications tab handles failure alerts.

  • Channels — Email, Slack, and Webhook by default. If your ops team runs a separate pager system, forward through the Webhook.
  • Trigger conditionsRun failure, time-out exceeded, and data-quality check failure can each be toggled independently.
  • Ladder — Common shape: the first alert goes to you; if there's no response for 30 minutes, the team channel is added; after an hour, the oncall rotation is paged. Portal expresses this ladder with a single escalation policy setting.

If only one person receives the alert, the pipeline is silent whenever that person is away. The safest default is two channels at once — yourself plus one team channel.

Trip it on purpose, once

After turning on scheduling and notifications, confirm the alert actually fires by tripping it deliberately.

  1. Add raise ValueError("test alert") to the body of one code node and save.
  2. Click Run by hand once.
  3. Confirm the alert reaches each channel (email, Slack) and that the message includes which pipeline, which node, at what time.
  4. Remove the line and save. The next scheduled run passes again.

That one deliberate failure will save you many times over in production. Discovering that the alert itself wasn't wired at 04:00 in the morning, only after the first real failure, is too late.

Self-check

  • Does your pipeline appear in the schedule builder with both interval and next run time?
  • Does the sink node's write mode match what you intended (overwrite, append, upsert, CDC)?
  • Are notification channels registered in at least two places at once?
  • Did the deliberate failure actually deliver an alert to both channels?
  • Did you stagger the start minute away from 03:00 sharp?

Next lesson

The final lesson covers what to do when a real failure happens — narrowing it down through logs and node states — and how to hand off the datasets you produced to an analyst cleanly. Both in one pass.