manufacturing

90분

IoT Smart Factory — from anomaly detection to daily rollup and ontology

Connect 30-second telemetry from 12 CNC machines to 3-sigma anomaly detection, a daily health score, and ontology materialization in one scenario.

Workshop goal

By the time you finish this Workshop, you will have walked through one full cycle of the following flow inside D.Hub.

Load a single scenario into portal so that two sub-collections (raw and processed), three datasets, four code nodes, three pipelines, an ontology with three entities and two relations, and a dashboard are all registered together.
Inspect the shape of machine_sensors raw rows arriving every 30 seconds and the distribution of quality_flag status signals.
Run the anomaly_detection pipeline once and see how HIGH / MEDIUM / LOW severities fall into anomaly_detections.
Watch how equipment_health_rollup produces a 0–100 daily health score with weighted penalties (HIGH × 10, MEDIUM × 4, LOW × 1), and configure it to run automatically every day at 1am.
Run ontology_materialization once and expand the IOT_Machine ─ IOT_reads_from ─ IOT_Sensor and IOT_Machine ─ IOT_triggers ─ IOT_MaintenanceEvent paths in the Graph explorer.
(Optional) Adjust the 3-sigma threshold in anomaly_detection.py directly, re-run, and observe qualitatively how the severity distribution shifts.

This is an integrated exercise that bundles the code node, pipeline option, and scheduling lessons of the Engineer Path on top of a single manufacturing IIoT scenario, walked through end to end as one cycle. The recommended duration is 90 minutes, stretching to 100 if you include the optional section.

Prerequisites

An engineer account with access to D.Hub portal (Editor or higher, with pipeline execution and schedule registration rights).
~50 KB of download room for one scenario zip.

No terminal, Python, or dhub2-examples clone needed. Finishing the entry-level tutorial Import a scenario zip in one shot first makes step 1 flow smoothly. The tutorial page is reachable from the prerequisite card at the top of this Workshop.

1. Load the scenario (10 min)

Download one zip and feed it to portal's Import dialog — every asset bundled in the scenario gets registered in a single pass.

iot.zip 다운로드

With the zip in hand, go to Collections in the left sidebar. The more (⋯) menu next to the Explorer header (the page title Collections area) has an Import (가져오기) item. Open it, pick the zip you just downloaded, and watch the dialog show upload progress. When it finishes, two collections appear automatically.

Following manifest.json, the import creates in order:

Two collections — raw (alias: IoT raw ingestion), processed (alias: processed IoT data)
Three datasets — machine_sensors (raw), anomaly_detections (processed), equipment_health (processed)
Four codes (Python) — sensor_normalization, anomaly_detection, equipment_health_rollup, build_iot_ontology
Three pipelines — anomaly_detection, equipment_health_rollup, ontology_materialization
Ontology — three entities (IOT_Machine, IOT_Sensor, IOT_MaintenanceEvent), two relations (IOT_reads_from, IOT_triggers)
One knowledge — maintenance_manual (maintenance SOP)
One dashboard — equipment_health (alias: equipment health overview)

Step 1 is complete once the portal's left collection tree shows raw and processed side by side. Hold on to the pattern: the scenario deliberately splits a raw landing zone from processed and derived data — only machine_sensors lives in raw, while every pipeline output and ontology materialization table lands in processed.

2. Browse the data (10 min)

Open the raw collection and start with the machine_sensors dataset. In the Preview tab you'll see six columns.

reading_id — Unique identifier for the measurement record
machine_id — One of the 12 CNC machines (format M-001 ~ M-012)
sensor_type — One of vibration / temperature / pressure
value — Unit depends on sensor_type. Vibration in m/s², temperature in °C, pressure in bar
recorded_at — UTC timestamp. Roughly every 30 seconds
quality_flag — OK for normal. HIGH_VIBRATION / HIGH_TEMP / HIGH_PRESSURE are rows where the sensor already raised a status signal

In the Schema tab, note that value and quality_flag are nullable. When a sensor fails, value can arrive as null, and the normalization pipeline forward-fills it for short intervals as a working assumption.

A quick look at the distribution of quality_flag makes the next step's anomaly result easier to interpret. Most rows are OK, but HIGH_* status signals are already embedded in some rows, and those rows are promoted straight to HIGH severity regardless of the 3-sigma z-score.

Next, open the processed collection and confirm that anomaly_detections and equipment_health are present but zero-row. They start empty right after load; the next-step pipelines fill the first rows.

3. First pipeline — `anomaly_detection` (15 min)

In the processed collection's Pipelines section, click anomaly_detection to open the Workflow editor. Two nodes appear.

normalize_sensors — Script sensor_normalization. Input machine_sensors → output machine_sensors (in-place normalization).
detect_anomalies — Script anomaly_detection. Input machine_sensors → output anomaly_detections. Depends on normalize_sensors.

Click the normalize_sensors node and expand Script preview in the right panel. The core one-liner: keep only rows whose quality_flag is one of ["OK", "HIGH_VIBRATION", "HIGH_PRESSURE", "HIGH_TEMP"], sort by machine and sensor, and forward-fill value for up to three consecutive rows. The key point is that HIGH_* status signal rows are not dropped during normalization. They are intentionally passed through so the next node can promote them straight to HIGH severity.

Now open the detect_anomalies node. Its script body has three threshold constants.

DEFAULT_SIGMA_MULTIPLIER = 3.0 — If |value − mean| / std exceeds this multiplier on per-sensor_type mean and std, the row is flagged as an anomaly.
DEFAULT_HIGH_THRESHOLD = 5.0 — z-score ≥ 5.0, or any HIGH_* quality_flag, becomes HIGH.
DEFAULT_MEDIUM_THRESHOLD = 4.0 — z-score ≥ 4.0 becomes MEDIUM. The rest (3.0 to 4.0) become LOW.

Press the top Run button to execute once. When it finishes, go back to the left tree and open the Preview tab on anomaly_detections. The severity column should hold one of LOW / MEDIUM / HIGH. Pick a row or two and manually verify the zscore and severity follow the threshold rules above.

4. Daily rollup + scheduling (15 min)

The next step compresses the anomaly log into a form the maintenance team can read at a glance to set priority, without going through every row. In the processed collection's Pipelines section, click equipment_health_rollup. There is a single node — the rollup_health script reads anomaly_detections and produces the equipment_health dataset.

In the rollup_health script body, the core formula is one line:

health_score = clip( 100 − (high_severity × 10 + medium_severity × 4 + low_severity × 1), 0, 100 )

Read the meaning of the weights (10, 4, 1) once: a single HIGH anomaly drops that machine's daily score by 10 points, a MEDIUM by 4, a LOW by 1. Pile up ten or more HIGH anomalies and the score clips to 0, marking the machine as an urgent inspection target.

Press Run once. When it finishes, open the equipment_health dataset preview and confirm that for each machine × day combination the columns health_score (0–100), anomaly_count, and high_severity are populated.

Now register this pipeline to run automatically every day at 1am. In the editor's top-right Pipeline settings or Schedule area, configure:

Repeat unit: daily
Start time: 01:00 UTC
Next run: confirm that the next upcoming 1am shows up

Save, and the pipeline card header should display the next automatic run time. The Pipeline scheduling manual covers the background of this configuration screen.

5. Ontology + graph exploration (15 min)

The final step is the graph viewpoint. In the processed collection's Pipelines section, click ontology_materialization. One node — the build_ontology_tables script reads machine_sensors + anomaly_detections + equipment_health in batch mode and lands the following five artifacts in upsert mode.

Artifact	Kind	Meaning
`IOT_Machine`	Entity	The 12 CNC machines. Key: `machine_id`. Attributes: `sensor_count`, `reading_count`, `latest_recorded_at`, `anomaly_count`, `latest_health_score`
`IOT_Sensor`	Entity	A physical sensor attached to a machine. Key: `sensor_id` (`machine_id__sensor_type` format). Attributes: `unit`, `latest_quality_flag`
`IOT_MaintenanceEvent`	Entity	One anomaly ↔ one maintenance event. Key: `event_id` (`ME-<anomaly_id>` format). Attributes: `severity`, `detected_at`, `completed`
`IOT_reads_from`	Relation	`IOT_Sensor` → `IOT_Machine`. 30-second sampling interval (`interval_seconds = 30`)
`IOT_triggers`	Relation	`IOT_Machine` → `IOT_MaintenanceEvent`. One anomaly triggers one maintenance event

Press Run once. When it finishes, go to the left sidebar's Ontology area and confirm the three entities are loaded. Click the IOT_Machine card and in the Builder tab confirm that Backing dataset is set to a machine_sensors-derived aggregate and the identifier key is machine_id. The Ontology Builder manual covers the mapping screen.

Now open the Graph explorer. Expand the following two paths by hand.

Path one — which sensors are attached to a machine
IOT_Machine (M-001) ── IOT_reads_from ── IOT_Sensor (M-001__vibration, M-001__temperature, M-001__pressure)

Path two — which maintenance events were triggered for one machine
IOT_Machine (M-001) ── IOT_triggers ── IOT_MaintenanceEvent (ME-<anomaly_id>)

Double-clicking one IOT_Machine node expands its neighbors and spreads three IOT_Sensor nodes (one per sensor_type) at once. If the same machine has had at least one HIGH severity anomaly, IOT_MaintenanceEvent nodes appear alongside. Which of the 12 machines triggers the most maintenance events is visible in a single graph view.

6. (Optional) Deep dive into a code node (10 min)

Up to here is the Workshop's core path. If you have time, open anomaly_detection.py from §3 again, nudge a threshold slightly, and watch how results move.

In the detect_anomalies node's Script preview, change DEFAULT_SIGMA_MULTIPLIER = 3.0 to 3.15 (about a 5% increase), save, and Run again. The threshold is a touch tighter, so rows with z-scores between 3.0 and 3.15 (sitting on the edge) drop out of anomalies.

Three changes to confirm in the anomaly_detections preview:

Total anomaly count drops. HIGH_* status signal rows are preserved regardless of z-score, so the drop concentrates in LOW and MEDIUM rows that were on the edge.
HIGH ratio stays roughly the same. The 5.0 threshold isn't shaken by a 5% nudge.
No LOW becomes MEDIUM or vice versa. All three threshold constants are absolute z-score cutoffs and we didn't touch the LOW/MEDIUM/HIGH cutoffs (4.0, 5.0).

Exact numbers will vary by the volume of rows the learner loaded. The point of this step isn't how much it drops but which result dimension the threshold change moves, felt by hand.

Once you're done, revert the threshold back to 3.0 and run once more so the result distribution returns to the baseline of §3. This is one turn of the read · modify · re-run cycle from the Engineer Path's code node lesson.

7. Next steps and a retrospective (5 min)

By this point you've walked through five flows on a single scenario:

Scenario load → raw vs processed sub-collection split
Code node spot-check → 3-sigma threshold + HIGH_* status signal preservation
Pipeline run → severity distribution check
Daily rollup + every day at 1am automated schedule
Ontology materialization → visualize machine, sensor, and maintenance-event paths in the Graph explorer

Pick the most appealing direction for one step further.

Go to the Analyst Path First dashboard lesson and look at the equipment_health dashboard this Workshop just registered from the analyst's seat. Compare how the same dataset is expressed via a widget's SQL query mode and via the graph viewpoint.
Go to the Engineer Path Pipeline scheduling lesson and dig into how the every day at 1am registration from §4 generalizes (cron expression, owner, after-dependency).
Add your own domain scenario to dhub2-examples/scenarios/. This Workshop's flow — raw telemetry collection → normalize and detect pipeline → daily rollup → ontology materialization — works almost identically as a template for other manufacturing lines or infrastructure telemetry.

A learner who has gone through one cycle from the engineer's seat has felt how a scenario in dhub2-examples unfolds into living resources inside portal, in the IIoT domain specifically. Recalling that the analyst Workshop (retail-inventory-intelligence) walked the same template through the analyst's lens — widgets and a graph — fixes the structure in your head: one template, two viewpoints.

Validation checklist

You can self-check whether this Workshop is complete via these five points.

Both raw and processed collections appear in the collection tree, with machine_sensors in raw and anomaly_detections · equipment_health in processed.
After one run of the anomaly_detection pipeline, anomaly_detections preview shows the severity column distributed across the three labels LOW / MEDIUM / HIGH.
The equipment_health_rollup pipeline card header shows the next run at the next 1am UTC (i.e. the schedule registration was actually saved).
In the Graph explorer, starting from a single IOT_Machine node and expanding neighbors brings up three IOT_Sensor nodes, and if at least one HIGH anomaly happened, IOT_MaintenanceEvent nodes appear alongside.
(If you did optional §6) After reverting the threshold to 3.0 and re-running, the result distribution matches the §3 baseline.