
Inline pass/fail decisions stay resident on the edge - deterministic, PLC-integrated, and cloud-independent. The cloud handles model lifecycle, rollout orchestration, and governance asynchronously, never touching the control path. The article compares edge-only and edge-plus-cloud hybrid patterns across their operational consequences, covering latency budgets, PLC integration, CI/CD with hardware-in-the-loop validation, canary rollout, typed rollback triggers, and resilient OTA updates.
Deploying a model to a production line is not the finish line it is the start of a continuous operational cycle of monitoring, validation, and controlled evolution. In automotive quality control, this cycle carries real stakes: a model that silently degrades allows defects to escape into the supply chain; a model update that introduces over-rejection generates scrap, throughput loss, and operator distrust.
Part 5 closes the series with the full operational layer.
We cover a two-layer model registry that separates source models from compiled device-class artifacts; shadow and canary deployment patterns that let you test new models on live production traffic with zero risk to line decisions; five specific, quantitative rollback triggers; and a three-layer monitoring framework that spans device health, inference-path performance, and model drift detection.
This is the operational architecture that makes edge AI sustainable at fleet scale not just on day one.
In automotive, a flawed model can either leak defects (safety/brand risk) or cause over-rejection (throughput and cost risk), so you must combine strong versioning with risk-free deployment patterns.
Use a central registry (custom or via ML platform) that stores a two-layer artifact structure — separating the source model from its compiled, device-specific deployment artifacts. Conflating these two layers is one of the most common causes of deployment failures and traceability gaps in production edge AI programmes: a source model and its compiled edge artifact are different objects with different versioning, compatibility, and lifecycle concerns, and the registry must treat them as such.
The source model record captures everything about the model as produced by the training pipeline, independent of any deployment target:
For each source model, the registry maintains one or more compiled edge artifact records one per supported target device class.
These are distinct registry entries, not attachments to the source model record, because they have independent versioning, compatibility constraints, and deployment lifecycles:
registry/artifacts/defect-detector-v18/jetson-orin-nx-8gb/compatibility.yaml
source_model: defect-detector-v18_opset17.onnx
source_model_sha256: a3f1c8...
compiled_artifact: defect-detector-v18_orin-
nx_trt8.6_cuda12.0.engine
artifact_sha256: b7e2d4...
accelerator_class: jetson-orin-nx-8gb
gpu_architecture: sm_87
runtime:
tensorrt: "8.6.1"
cuda: "12.0"
cudnn: "8.9"
ort: "1.17.3" # if using TRT EP via ORT
jetpack: "6.0"
driver_min: "535.86" # minimum compatible NVIDIA driver
quantization:
precision: int8
calibration_ref: cal-dataset-bih-weld-v3
calibration_sha256: c9a1f2... validation:
hil_rig: lab-jetson-orin-nx-01
hil_run_id: hil-2026-03-27-0412
p99_latency_ms: 38.4
gpu_mem_peak_mb: 2841
passed: true
Deployment status per site/line : Independent of the source model's deployment status. A source model may be in production on Line A while its compiled artifact for a new device class is still in staged on Line B. The two statuses must be tracked separately.
Why this two-layer structure matters in practice:

Shadow deployment lets you test a new model on real production traffic with zero effect on line decisions. The active model continues to drive all PLC actuation; the candidate model runs in parallel on the same input frames and its predictions are logged for offline analysis.
No candidate prediction ever reaches the IO adapter or the PLC during shadow mode — this is a non-negotiable invariant.
The three operational steps are:

This tiered approach ensures that the frames most valuable for shadow analysis — disagreements and boundary cases — are always retained, while the high-volume, high-agreement frames that carry little analytical value are discarded after their score record is written.
Configuration — complete and self-contained
Rather than presenting a partial YAML snippet that assumes undefined state, the shadow configuration is best expressed as a complete, validated config file that the inference service loads at startup and the deployment orchestrator updates during mode transitions:
#/etc/auto-qc/model-config.yaml
#Managed by deployment orchestrator — do not edit manually.
#Validated against schema at service startup; invalid config prevents startup.
active:
model_name: "defect-detector-v17"
artifact_path: "/opt/models/defect-detector-v17_orin-nx_trt8.6.engine"
threshold: 0.52
device_class: "jetson-orin-nx-8gb"
candidate:
model_name: "defect-detector-v18"
artifact_path: "/opt/models/defect-detector-v18_orin-nx_trt8.6.engine"
threshold: 0.52
device_class: "jetson-orin-nx-8gb"
#Candidate is only loaded and run if mode is 'shadow' or 'candidate_active'.
#If artifact_path does not exist at startup, service starts in active_only mode
#and logs a warning — it does not fail to start.
mode: "shadow"
#active_only — candidate session not loaded; no shadow logging.
# shadow — both sessions loaded; active drives PLC; candidate logged only.
# candidate_active — candidate drives PLC; active session retained for rollback.
# Transition to candidate_active requires explicit orchestrator command;
# inference service cannot self-promote.
shadow_logging:
score_log: true # always on; negligible cost
divergence_frames: true # retain compressed frame on label disagreement boundary_band_low: 0.40 # retain frame if either score falls in [0.40, 0.60] boundary_band_high: 0.60
full_frame_capture: false # disabled by default; enable only for burst capture retention_days: 30
max_disk_mb: 4096 # shadow log evicts oldest entries if limit reached
Implementation — complete and self-consistent
The following implementation is complete and handles all mode and session state combinations explicitly — there are no assumed globals or undefined fallback paths
#shadow_inference.py
import time
import threading
from dataclasses import dataclass, field
from typing import Optional
import numpy as np
import onnxruntime as ort
from frame_context import FrameContext
from shadow_logger import ShadowLogger
@dataclass
class ModelHandle:
model_name: str
session: ort.InferenceSession
input_name: str
output_name: str
threshold: float
@dataclass
class ShadowConfig:
mode: str # active_only | shadow | candidate_active
active: ModelHandle
candidate: Optional[ModelHandle] # None when mode is active_only
score_log: bool = True
divergence_frames: bool = True
boundary_band: tuple = (0.40, 0.60)
full_frame_capture: bool = False
@dataclass
class InferenceResult:
label: str
score: float
latency_ms: float
model_name: str
def _run_session(handle: ModelHandle, frame: np.ndarray) -> tuple[float, float]:
"""Run a single session; return (score, latency_ms)."""
start = time.perf_counter()
scores = handle.session.run(
[handle.output_name], {handle.input_name: frame}
)
latency_ms = (time.perf_counter() - start) * 1000.0
# Assuming scores[0] contains the scalar float we need
return float(scores[0][0] if isinstance(scores, list) else scores), latency_ms
def infer(
frame: np.ndarray,
ctx: FrameContext,
config: ShadowConfig,
logger: ShadowLogger,
) -> InferenceResult:
"""
Run inference according to current shadow config.
Only the active model drives the return value in shadow mode.
Candidate results are logged but never returned to the caller.
"""
active_score, active_latency = _run_session(config.active, frame)
active_label = "defective" if active_score >= config.active.threshold else "ok"
# --- Shadow / candidate path ---
if config.mode in ("shadow", "candidate_active") and config.candidate is not None:
cand_score, cand_latency = _run_session(config.candidate, frame)
cand_label = (
"defective" if cand_score >= config.candidate.threshold else "ok"
)
# Determine whether to retain the frame image alongside the score log
retain_frame = False
if config.divergence_frames and active_label != cand_label:
retain_frame = True
# FIX: Properly unpack the tuple bounds for comparison
if (config.boundary_band[0] <= active_score <= config.boundary_band[1] or
config.boundary_band[0] <= cand_score <= config.boundary_band[1]):
retain_frame = True
if config.full_frame_capture:
retain_frame = True
logger.log_shadow_pair(
frame_ctx=ctx,
active_score=active_score,
active_label=active_label,
active_latency_ms=active_latency,
active_model=config.active.model_name,
cand_score=cand_score,
cand_label=cand_label,
cand_latency_ms=cand_latency,
cand_model=config.candidate.model_name,
retain_frame=retain_frame,
frame=frame if retain_frame else None,
)
# --- Return value: active in shadow mode; candidate in candidate_active mode ---
# NOTE: mode transition to candidate_active is set by the orchestrator via
# config reload (/reload-config endpoint) — the inference service cannot
# self-promote. This prevents accidental promotion during a shadow run.
if config.mode == "candidate_active" and config.candidate is not None:
# Use cand_score/cand_label/cand_latency from the block above
return InferenceResult(
label=cand_label,
score=cand_score,
latency_ms=cand_latency,
model_name=config.candidate.model_name,
)
return InferenceResult(
label=active_label,
score=active_score,
latency_ms=active_latency,
model_name=config.active.model_name,
)
When shadow-mode analysis confirms that the candidate model meets promotion criteria acceptable divergence rate, no regression on critical defect classes, latency within budget on HIL promote to canary.
Canary promotion routes a bounded subset of the fleet (e.g., one station on one line, or one shift's worth of production) to the candidate model while the remainder continues on the active model.
The canary window is a live production trial under controlled exposure: the candidate drives real PLC actuation on real parts, and its behaviour is measured against five distinct rollback trigger categories, each of which has different detection mechanisms, response times, and risk profiles.
Canary scope and duration
Scope the canary to the smallest unit of the fleet that gives statistically meaningful volume typically one station running one product family for a minimum of one full shift (8 hours) or a defined part count (e.g., 10,000 inspected parts), whichever comes first.
Extend the canary window if production volume is low or if the product mix during the canary period does not represent the full variant range the model will encounter in production.
The five rollback trigger categories
Generic "SLO breach" language is not precise enough to act on in an automotive inspection context. Each failure mode has a different detection signal, a different urgency, and a different appropriate response. Treat these as five separate monitoring channels, each with its own alert threshold and rollback policy:
What it is: The candidate model's p99 end-to-end inference latency — measured from frame acquisition to PLC write — exceeds the station's documented actuation budget.
Why it is distinct: A model that is more accurate but slower may cause missed rejections not because of wrong predictions but because the decision arrives after the part has passed the actuator. Latency degradation can also be gradual — the model performs within budget on a cold device but drifts over budget as the GPU thermals rise during a full shift.
Detection: Prometheus histogram on p99_latency_ms per station, measured continuously during the canary window. Compare against the HIL-validated p99 baseline recorded at promotion time.
Rollback threshold and policy:
latency_breach:
trigger: p99_latency_ms > actuation_budget_ms # station-specific, from design doc
sustained_window: 60s # breach must persist for 60s to exclude transient spikes
immediate_trigger: p99_latency_ms > actuation_budget_ms * 1.5 # hard ceiling — instant rollback
action: rollback_to_active
notify: ops_team, quality_team
What it is: The candidate model's output score distribution shifts significantly from its shadow-mode baseline scores cluster near 0 or 1 when they previously spread across the distribution, or the mean score drifts upward or downward without a corresponding change in ground-truth defect rate.
Why it is distinct: Confidence-distribution shifts often precede visible accuracy degradation they are an early warning that the model is encountering inputs outside its training distribution (e.g., a lighting change, a fixture adjustment, or a new part variant introduced without retraining). Acting on distribution shifts before they manifest as missed defects or over-rejection is the difference between a proactive canary and a reactive incident.
Detection: Track the rolling score distribution histogram (p10, p25, p50, p75, p90) per defect class during the canary window. Compare against the shadow-mode baseline distribution using a statistical distance metric (e.g., KL divergence or Population Stability Index). A PSI > 0.2 on any defect class is conventionally treated as a significant distribution shift requiring investigation.
Rollback threshold and policy:
confidence_distribution_anomaly:
metric: population_stability_index # computed per defect class, rolling 30-min window
warning_threshold: 0.10 # flag for investigation — do not yet rollback
rollback_threshold: 0.20 # significant shift — suspend canary, revert to active
action: suspend_canary_and_investigate
notify: ml_team, quality_team
# NOTE: PSI breach triggers investigation, not blind rollback — the shift may indicate
# a genuine process change (e.g., new part batch) rather than model failure.
# Quality team must adjudicate before full rollback or canary continuation.
What it is: The candidate model rejects a significantly higher proportion of parts than the active model on the same product family, without a corresponding confirmed increase in actual defect rate.
Why it is distinct: Over-rejection has direct, measurable commercial and operational impact: scrap cost, rework cost, line throughput reduction, and operator confidence erosion. In automotive programmes, a sudden increase in reject rate is immediately visible to production supervisors and will generate pressure to override or disable the inspection system — making over-rejection a threat not just to quality but to the long-term viability of the AI inspection programme.
Detection: Track reject_rate_pct per station per product family in a rolling window during the canary window. Compare against the active model's reject rate on the same product family over the preceding 5 shifts (the rolling baseline). A reject rate increase beyond the threshold that cannot be explained by a confirmed upstream process change triggers rollback.
Rollback threshold and policy:
over_rejection:
metric: reject_rate_pct
baseline_window: 5_shifts # rolling baseline from active model
rollback_threshold_relative: +15% # candidate reject rate > baseline + 15%
rollback_threshold_absolute: +3pct # or absolute increase > 3 percentage points
sustained_window: 30min # must persist for 30 minutes to exclude shift-start variation
action: rollback_to_active
notify: ops_team, quality_team, production_supervisor
# NOTE: Before rollback executes, system checks whether a confirmed upstream
# process change (e.g., new material batch, tooling change) was logged in the
# MES during the canary window. If yes, alert is escalated for human adjudication
# rather than automatic rollback.
What it is: The candidate model misses defects that the active model would have caught — confirmed by downstream quality events: re-inspection station escapes, end-of-line measurement failures, or customer-reported field escapes traceable to parts inspected during the canary window.
Why it is distinct: Defect leakage is the highest-severity rollback trigger in an automotive context it represents parts with confirmed defects that passed inspection and entered the supply chain or reached end customers. Unlike over-rejection, which is a cost and throughput problem, defect leakage is a safety, warranty, and regulatory compliance problem. It must trigger an immediate rollback with no sustained-window grace period, and the incident must be escalated to quality engineering regardless of the decision on the model.
Detection: Requires a feedback loop from downstream quality gates back to the inspection system typically via the MES or a dedicated quality event bus. Parts are tracked by carrier ID or part serial number; a downstream escape event is joined to the canary window inspection log via the frame correlation ID established at acquisition.
Rollback threshold and policy:
defect_leakage:
trigger: confirmed_escape_count >= 1 # zero tolerance — any confirmed escape triggers immediate rollback
sustained_window: none # immediate — no grace period
action: immediate_rollback_to_active
notify: ml_team, quality_team, production_supervisor, quality_manager
post_rollback: mandatory_incident_review
# NOTE: Confirmed escape = downstream quality event joined to a frame inspected
# by the candidate model during the canary window. Unconfirmed escapes (suspect
# but not yet verified) trigger a canary suspension pending investigation.
What it is: The edge node running the candidate model exhibits infrastructure-level degradation GPU memory exhaustion, thermal throttling, process crashes, watchdog timeouts, or disk saturation that is not present on nodes running the active model.
Why it is distinct: System-health failures indicate that the candidate model or its serving configuration is incompatible with the production hardware environment in a way that was not caught by HIL validation for example, a memory leak in the candidate's session management, higher sustained GPU memory consumption that causes OOM under thermal load, or a larger model footprint that causes disk pressure on nodes with smaller SSDs.
Detection: Prometheus gauges on GPU utilisation, GPU memory, CPU utilisation, thermal zone temperature, process restart count, and disk utilisation. Compare canary nodes against active-model nodes of the same device class during the same time window.
system_health_failure:
triggers:
- metric: gpu_memory_used_mb
threshold: 90%_of_total # or absolute: 7372 MB on 8GB device
sustained_window: 5min
- metric: thermal_throttle_active
threshold: true
sustained_window: 2min
- metric: inference_process_restarts
threshold: 1 # any restart during canary is a signal
sustained_window: none # immediate
- metric: disk_used_pct
threshold: 85%
sustained_window: 10min
action: rollback_to_active
notify: ops_team, ml_team
AWS MLOps guidance explicitly identifies canary, shadow, and blue-green deployment strategies alongside three rollback options: revert to prior model, fallback to heuristics, and roll forward to a patched version.
Applied to automotive edge inspection, these map to concrete responses:

The key principal AWS guidance establishes and which applies directly to edge inspection is that rollback strategies must be defined, tested, and rehearsed before the canary begins, not designed during an incident. On an automotive line, a rollback decision made under production pressure without a pre-defined playbook will be made incorrectly.
The five trigger categories above, with their specific thresholds and actions, are the pre-defined playbook.
Effective monitoring for edge AI inspection requires three distinct, non-overlapping layers each answering a different operational question, owned by a different team, and acting on a different time horizon.
Conflating them into a single "monitoring" bucket makes it harder to identify which layer is signalling a problem and who should respond.

Device health monitoring covers the physical and OS-level state of each edge node. For automotive factory deployments this is especially important for fanless industrial PCs and Jetson modules operating in environments with welding heat, vibration, and dust ingress — conditions that cause thermal throttling and disk saturation long before they cause outright hardware failure.
# monitoring.py — Layer 1: device health metrics
from prometheus_client import Counter, Gauge, Histogram, start_http_server
GPU_UTIL = Gauge(
"autoqc_gpu_utilization_pct",
"GPU utilization percentage (0–100)",
)
GPU_MEM_USED = Gauge(
"autoqc_gpu_memory_used_mb",
"GPU memory currently in use (MB)",
)
CPU_UTIL = Gauge(
"autoqc_cpu_utilization_pct",
"CPU utilization percentage (0–100)",
)
THERMAL_ZONE = Gauge(
"autoqc_thermal_zone_celsius",
"Thermal zone temperature in degrees Celsius",
["zone"], # e.g. "gpu", "cpu", "board"
)
DISK_USED_PCT = Gauge(
"autoqc_disk_used_pct",
"Disk utilization percentage (0–100)",
["mount"], # e.g. "/", "/opt/models"
)
PROC_RESTARTS = Counter(
"autoqc_process_restarts_total",
"Total inference process restarts since node boot",
)
def record_device_health(
gpu_util_pct: float,
gpu_mem_used_mb: float,
cpu_util_pct: float,
thermal_readings: dict[str, float], # e.g. {"gpu": 72.5, "board": 61.0}
disk_readings: dict[str, float], # e.g. {"/": 42.3, "/opt/models": 61.7}
) -> None:
"""Update all device health gauges. Call on a regular polling interval."""
GPU_UTIL.set(gpu_util_pct)
GPU_MEM_USED.set(gpu_mem_used_mb)
CPU_UTIL.set(cpu_util_pct)
for zone, temp in thermal_readings.items():
THERMAL_ZONE.labels(zone=zone).set(temp)
for mount, pct in disk_readings.items():
DISK_USED_PCT.labels(mount=mount).set(pct)
Export these metrics on a local Prometheus scrape port. Display on a local cell-level dashboard so line supervisors can see device health without cloud connectivity. Forward to a central fleet dashboard for cross-site visibility when connectivity is available.

Inference-path monitoring covers the end-to-end timing and throughput of the inspection pipeline — from frame acquisition through to PLC write.
This is where you validate in production that your p99 latency budget is being met on every cycle, and where you detect per-stage bottlenecks before they compound into actuation failures. Instrument at each stage boundary using the FrameContext timestamps assigned at acquisition (as defined in Step 1 of the edge inference stack):
# monitoring.py — Layer 2: inference-path performance metrics
from prometheus_client import Counter, Histogram
from frame_context import FrameContext
# Buckets aligned to automotive actuation budgets (1–200 ms range).
# Fine resolution below 50 ms where budget breaches are most consequential.
STAGE_LATENCY = Histogram(
"autoqc_stage_latency_ms",
"Per-stage pipeline latency in milliseconds",
["stage"], # see VALID_STAGES below
buckets=[1, 2, 5, 10, 20, 30, 50, 75, 100, 150, 200, 500],
)
VALID_STAGES = frozenset({
"acquisition_to_preprocess",
"preprocess",
"inference",
"decision_to_plc_write",
"end_to_end",
})
PREDICTIONS = Counter(
"autoqc_predictions_total",
"Total predictions by outcome label and model version",
["label", "model_version"], # label values: "ok", "defective", "error", "timeout"
)
STALE_DROPS = Counter(
"autoqc_stale_decision_drops_total",
"Decisions discarded by the IO adapter due to staleness threshold breach",
["station_id"],
)
def init_metrics(port: int = 9100) -> None:
"""Start the Prometheus HTTP scrape server on the given port."""
start_http_server(port)
def record_inference(
ctx: FrameContext,
label: str,
model_version: str,
) -> None:
"""
Record per-stage latencies from FrameContext timestamps and
increment the prediction counter.
All FrameContext timestamps are in nanoseconds (int).
Latency values are converted to milliseconds before observation.
"""
ns_to_ms = 1_000_000.0
STAGE_LATENCY.labels(stage="acquisition_to_preprocess").observe(
(ctx.preprocess_start_ns - ctx.hw_timestamp_ns) / ns_to_ms
)
STAGE_LATENCY.labels(stage="preprocess").observe(
(ctx.preprocess_end_ns - ctx.preprocess_start_ns) / ns_to_ms
)
STAGE_LATENCY.labels(stage="inference").observe(
(ctx.inference_end_ns - ctx.inference_start_ns) / ns_to_ms
)
STAGE_LATENCY.labels(stage="decision_to_plc_write").observe(
(ctx.plc_write_ts_ns - ctx.decision_ts_ns) / ns_to_ms
)
STAGE_LATENCY.labels(stage="end_to_end").observe(
(ctx.plc_write_ts_ns - ctx.hw_timestamp_ns) / ns_to_ms
)
PREDICTIONS.labels(
label=label,
model_version=model_version,
).inc()
def record_stale_drop(station_id: str) -> None:
"""Increment stale-decision drop counter for the given station."""
STALE_DROPS.labels(station_id=station_id).inc()
Display per-stage latency histograms (p50/p95/p99) on both the local cell dashboard and the central fleet dashboard. Alert on p99 end-to-end latency approaching the actuation budget threshold — this is a leading indicator of missed rejections, not a lagging one.

Model and data behaviour monitoring covers whether the model's predictions remain accurate and well-calibrated as real-world production inputs evolve. This layer cannot be fully automated it requires ground-truth feedback from downstream quality events, human review of distribution anomalies, and a defined escalation path to retraining when drift is confirmed.
Ownership is shared between ML engineering and quality engineering.
Model behaviour :track per defect class, per product family, per shift:
Label feedback and in-field accuracy:
Where the line has a re-inspection station, end-of-line measurement system, or CMM, tie inspection decisions back to downstream ground-truth outcomes using the frame correlation ID established at acquisition:
Data drift and retraining triggers : three concrete quantitative signals:
In hybrid architectures this layer is operationally easier because sampled frames, score logs, and feature statistics stream to the central data lake drift detection runs centrally without additional edge tooling. In edge-only architectures, drift monitoring requires periodic manual export of score logs and representative frame samples for offline analysis.
Where possible, tie inspection decisions back to downstream quality events:
Monitor:
When feature distributions (e.g., brightness, contrast, defect morphology) drift significantly from the training dataset, schedule retraining or at least a data review.
Hybrid architectures make this easier because sampled images and features are already streaming to a central data lake.
Automotive plants are noisy RF and electrical environments, with welding, large motors, and maintenance activities causing frequent micro-outages and transient network issues. Your architecture should assume:
A simple sync agent pattern:
Idempotency and duplicate-safe ingestion on the cloud side
In factory OT environments, network interruptions frequently occur mid-upload the bundle is transmitted, the cloud ingests it, but the acknowledgement never reaches the edge node.
The sync agent correctly treats the missing acknowledgement as a failure and retries the upload on the next cycle. Without idempotency on the cloud side, this retry delivers a duplicate bundle that is ingested a second time creating duplicate metric entries, double-counted defect events, and inflated inference volumes that corrupt fleet-wide analytics and make incident reconstruction unreliable.
The cloud ingestion endpoint must therefore be designed to be idempotent by bundle ID: every bundle is assigned a unique, deterministic ID at creation time on the edge node (e.g., a SHA-256 of the bundle contents, or a structured ID combining station ID, timestamp, and sequence number).
The cloud gateway checks this ID against a deduplication store before processing:
This pattern ensures that any number of retries produces exactly one ingested record per bundle, regardless of how many times the upload is attempted. The deduplication store requires only the bundle ID and ingestion timestamp a lightweight entry that can be retained for a rolling window (e.g., 7 days) covering the maximum realistic retry period before being expired.
Without this guarantee, connectivity instability which is normal, not exceptional, on a factory OT network becomes a source of data quality corruption that compounds silently over time and is expensive to reconcile after the fact.
Putting it all together, a reference blueprint for an automotive quality control system looks like this:
.jpg)
When you design or review an automotive AI quality‑control system, validate these items explicitly.
If these aspects are explicitly addressed in your design documents and implementation, you can move from fragile cloud‑centric PoCs to robust, low‑latency edge deployments that remain reliable under real automotive factory‑floor conditions.
Across this five-part series, we have traced the full lifecycle of edge AI for automotive quality control — from the physical constraints that make cloud AI structurally incompatible with inline inspection, through architecture selection, inference stack design, deployment engineering, and fleet-scale operations.
The consistent thread across all five parts is this: edge AI reliability is not a property of any single component — it is an emergent property of the entire system, designed with explicit latency budgets, defined failure semantics, enforced architectural boundaries, and continuous operational discipline.
A correct model with a brittle deployment pipeline will fail. A robust pipeline with poor observability will degrade silently. The programmes that succeed are those that treat every layer — hardware, software, MLOps, and operations — as a first-class engineering concern from day one.
If you are starting a new edge AI inspection programme, begin with Part 1 and let the physical constraints drive your architecture. If you are operating an existing programme and hitting operational challenges, the monitoring framework in Part 5 and the deployment engineering patterns in Part 4 are the right entry points.