Data Engineer Path
From connectors to pipelines, code nodes, scheduling, and debugging — follow one full cycle of ingesting and automating data in portal across six lessons.
0/6 complete
About this Path
This Path teaches D.Hub portal from a data engineer's point of view. If the Analyst Path is about building result screens on top of data that already exists, the Engineer Path is about how that already-existing data got there — pulled in from external systems, transformed, scheduled to run on its own every day, and caught when something fails. Six lessons walk you through the full cycle. Together, about 50 minutes.
Each lesson is sized at 5–10 minutes. Do them in one sitting, or spread one or two a day.
Prerequisites
- Portal access at the Editor level or higher. You'll need permission to register connectors and run pipelines.
- Either finishing the Analyst Path, or familiarity with the basics of collections and datasets.
- One of the following practice data sources:
- A read-only account on an internal production database, or
- A single CSV around 10,000 rows (some steps in the connector lesson can be skipped)
Prior experience with dbt, Airflow, or Snowflake helps map things faster but isn't required. Lesson 01 lays out the correspondence between those tools and portal surfaces in a single table.
What you'll be able to do
- Describe the four surfaces engineers usually live in inside portal (Connectors, Pipelines, Codes, Datasets), along with the inputs, outputs, and boundaries of each.
- Register one connector against an external system and land production data as a portal dataset.
- Take one full cycle of source → transform → sink through the Workflow editor as your first pipeline.
- Write Python or SQL in a code node for transformations the standard nodes can't cover.
- Make the same pipeline run automatically on a fixed schedule and route failure alerts.
- Narrow down a failed run using logs and node states, and decide between a full re-run and a partial re-run.
- Hand the datasets you produced over to an analyst cleanly by tidying up permissions.
What comes after this Path
If you want to go deeper inside the engineer flow, these are the natural next steps.
- Workshop: Retail Inventory Intelligence — Every surface in this Path (connectors, pipelines, code nodes, scheduling) is wired together inside a real domain scenario. You walk through one full cycle with an analyst alongside you. About 90 minutes.
- Tutorial: Quick scenario import — Load one scenario from
dhub2-examplesinto your environment with a single command. Touches the same tools as step 1 of the workshop. - Analyst Path — When you want to see how your datasets show up in the analyst's collection tree. Acts as a second pass over the handoff step just before lesson 06.
The checkboxes next to each lesson record progress automatically. Pick one up and start.
Lessons
- 01Engineer workflow overviewThe four surfaces engineers use in portal — Connectors, Pipelines, Codes, Datasets — mapped against dbt and Airflow in a single table.7분
- 02Pulling external data with connectorsRegister one connector against an external database, S3 bucket, or REST API, and land production data as a portal dataset — organized along three axes (authentication, schema, permissions).9분
- 03Your first pipeline — three nodes and one runWire source → transform → sink in the Workflow editor and run them once, turning the source dataset from the previous lesson into a mart dataset that analysts can consume.8분
- 04Code nodes — writing transforms in Python or SQLWhen standard nodes (filter, aggregate, join) can't finish a transform, drop a short Python or SQL snippet into a code node and keep the work inside the same pipeline.9분
- 05Scheduling and run modes — pipelines that run on their ownTrigger the same pipeline on a cron or event basis, set overwrite · append · CDC as the run mode, and route failure notifications to the right channel.8분
- 06Debugging, monitoring, and handing off to the analystNarrow down a failed run through three views (log · node status · data preview), then close the cycle by tidying up dataset permissions and handing off to an analyst.9분