The Identity Inversion: Why Predictive Maintenance Fails Without a Golden A

Why predictive maintenance needs unified asset intelligence

Predictive maintenance (PdM) has moved from PoC to a core capability in both discrete manufacturing and construction equipment management, delivering measurable gains in uptime, maintenance cost, and safety when done correctly. Predictive maintenance engines do not fail because of algorithmic inaccuracy they fail because of Anomalous State Discontinuity: an absolute structural decoupling between the physical asset state captured by industrial telemetry and the logical entity managed by the Enterprise Asset Management (EAM) system. A vibration sensor on a rotating compressor may surface a statistically valid outlier score at 03:14 AM, but if that signal cannot be resolved back to a canonically identified, maintenance-ready asset record in the EAM, the insight dies at the boundary layer. The non-deterministic ML model has done its job correctly and yet the output converts into operational noise, not action.

‍

This is not a data quality problem in the conventional sense. It is a structural bottleneck embedded in the architecture itself: two systems one tracking physical states across time, the other managing logical asset lifecycles operating without a shared identity layer. Until the gap between what the sensor knows and what the EAM owns is closed by design, high-value anomaly signals will continue to be absorbed by organizational entropy rather than translated into maintenance interventions.

‍

However, most of these gains only materialize when you can reliably answer one deceptively simple question: “Which physical asset does this sensor or telemetry stream belong to, right now?” Without that mapping, even the best ML models degenerate into dashboard theater accurate anomalies that no one can action in the CMMS, EAM, or project controls systems.

‍

This post series is about that gap: building Unified Asset Intelligence as a data engineering discipline. In Part 1 we focus on the conceptual and architectural foundations Golden Records, asset/event models, and lifecycle views. In Part 2 we’ll dive into concrete implementation patterns: identity resolution pipelines, schema designs, mapping tables, and governance practices that keep the Golden Record accurate over time.

The problem: OT/IT fragmentation by design

The fragmentation between OT and IT layers is not an accident of poor planning, it is an inevitable byproduct of fundamentally incompatible system design constraints. The OT layer operates on stream-based, low-latency execution loops: PLC tags fire at millisecond intervals, SCADA historians record absolute position states, and telematics units emit continuous telemetry keyed to device-native identifiers that exist entirely outside the enterprise naming convention. Time is the primary axis. Identity is implicit, embedded in the tag hierarchy at the point of physical installation and rarely revisited.

‍

The IT layer inhabits an entirely different ontological space. CMMS and EAM systems rely on relational, batch-processed, human-entered identifiers asset numbers assigned during procurement, functional location codes structured around maintenance hierarchies, and work-order records updated asynchronously by technicians on the floor. Time here is transactional, not continuous. The asset exists as a logical record, not a live state.

‍

This Temporal and Semantic Disconnect is where predictive maintenance architectures silently collapse. A historian tag reading COMP-L3-VIB-AX and an EAM record carrying asset number EQ-00471-COMP may describe the same physical compressor but without a master entity index that explicitly resolves that equivalence, no system can structurally confirm it. Attempting to bridge this gap by ingesting both streams into a flat data lake table is an architectural anti-pattern: it creates the appearance of integration while preserving the underlying identity ambiguity. The query will return rows. The rows will not mean the same thing. Anomaly scores will be computed against entities that do not correspond to any actionable, maintainable record on the other side of the boundary and the predictive signal, however accurate, will find no structural address to resolve itself against.

From asset lists to Golden Records

Master Data Management has long defined the Golden Record as a single, trusted representation of an entity synthesized from multiple, inconsistent source systems. For asset-intensive industries, that entity is the physical asset , the specific pump, compressor, excavator, or tower crane operating in the field. The Golden Asset Record is not a cleansed copy of any one system's data. It is a conflict-resolved, survivorship-governed master record that persists across the entire asset lifecycle: from commissioning through relocation, overhaul, and decommissioning, independent of which source systems are replaced, migrated, or reconfigured beneath it.

‍

What makes this non-trivial is the survivorship problem. When the same physical asset carries contradictory attribute values across four source systems, different commissioning dates, conflicting component hour readings, mismatched maintenance hierarchies the Golden Asset Record cannot simply last-write-win or defer to the most recently updated source. It must apply Source Authority Weighting: a structured, domain-partitioned assignment of record-of-truth responsibility where each system governs only the attribute domain it is architecturally positioned to own with authority.

‍

In practice, these partitioning resolves as follows:

ERP remains in the system of record for financial capitalization state, asset acquisition value, depreciation schedule, fixed-asset register ID, and disposal classification. No other system overrides these attributes.

CMMS/EAM holds authoritative context for operational hierarchy, functional location, maintenance strategy assignment, work-order lineage, and spare-parts linkage. Its asset number and location path are the canonical identifiers for maintenance planning.

OEM telematics portal is the trusted source for raw component-level engine hours, odometer readings, diagnostic trouble codes, and firmware version state values that only the manufacturer's embedded telemetry chain can certify without inference or interpolation.

PLC/historian tag namespaces govern signal-level identity: absolute position tags, loop references, and historian paths that map to physical measurement points and cannot be meaningfully overridden by any IT-layer identifier.

Without this authority partitioning, survivorship rules collapse into heuristic guesswork. A PdM model that ingests engine hours from the CMMS where they are manually updated by technicians rather than from the OEM telematics portal will accumulate drift silently, producing anomaly thresholds calibrated against inaccurate baselines. The Golden Asset Record layer enforces the discipline that each attribute value carries a provenance claim, and that claim is only as strong as the source authority assigned to the system that produced it.

Asset identity as a first-class data model

The structural model underpinning the Golden Asset Record is not a three-layer table hierarchy — that framing, while conceptually accessible, is too brittle for practical implementation across heterogeneous OT/IT landscapes. The correct architectural primitive is a Multi-Relational Graph Entity Model, where the physical asset is represented as an immutable node and all contextual roles, system identifiers, and operational affiliations are modeled as transient, time-bound edges emanating from it.

‍

The immutable node carries only the attributes that are intrinsic to the physical thing and cannot be invalidated by any operational event: the OEM model designation, manufacturer-assigned serial number, rated capacity, power classification, and commissioning date. These attributes are write-once at the point of asset induction into the graph. The node's asset_uid is a system-agnostic, globally stable identifier — it is never recycled, never reassigned, and never derived from any source system's native key.

‍

Everything else is an edge. Contextual role edges encode how the asset participates in a plant or project at a specific point in time production line assignment, functional area, construction sub-location, or project phase each carrying an explicit valid_from / valid_to interval. When a gearbox migrates from Line 1 to Line 3, the old role edge is closed with a terminal timestamp and a new edge is opened. The immutable node is untouched. Identity is preserved. History is retained without mutation.

‍

Technical endpoint edges operate on the same temporal principle. A PLC tag, historian point ID, SCADA signal path, telematics device ID, CMMS equipment number, or ERP fixed-asset reference are all modeled as edges not as columns on a master record. Each endpoint edge carries its own validity interval, its source authority weight, and its namespace context. When a SCADA system is replaced and tag naming conventions change, the old signal edges are retired and new ones are opened, without any structural modification to the asset node itself or to the edges that remain valid.

‍

This graph topology directly resolves the identity drift problem: a crane reassigned from Project A to Project B does not produce a duplicate record, a broken join, or an orphaned sensor stream. It produces a new contextual role edge on the same immutable node, with all prior telemetry, work-order linkages, and cost history still resolvable through the graph's temporal query layer. The pivot is not a table it is the node. And the node never changes.

Example: core asset identity tables

The relational schema underpinning the Golden Asset Record must be bitemporal by design ; not as an optional audit enhancement, but as a structural prerequisite for industrial predictive analytics. A monotemporal schema that tracks only valid_from / valid_to captures when an identifier was operationally valid, but it cannot answer the question a feature engineering pipeline actually needs: what did we know about this asset, and when did we know it? Bitemporal modeling separates two orthogonal time axes , valid time (when the fact was true in the physical world) and system time (when that fact was asserted into the record system) ,making exact time-travel queries structurally resolvable rather than inferential.

‍

This is non-negotiable for predictive maintenance feature pipelines. When an anomaly is detected at a specific millisecond, the feature engineering layer must be able to reconstruct the precise operational asset topology that existed at that exact moment which historian tag was mapped to which asset, under which CMMS equipment ID, within which site and line context. Without bitemporal columns, any retrospective reconstruction is contaminated by late-arriving corrections, retroactive remappings, and system migration events that post-date the anomaly but have already overwritten the mapping record. The result is a feature set that was never actually true at the moment of the anomaly and a model trained or evaluated against a topology that did not yet exist.

‍

-- Golden Physical Asset (write-once immutable node)  

CREATE TABLE dim_asset ( 
    asset_uid UUID PRIMARY KEY, 
    asset_type VARCHAR(100), -- pump, excavator, crane, CNC 
    oem_model VARCHAR(100), 
    oem_serial_number VARCHAR(100), 
    manufacturer VARCHAR(100), 
    commissioning_date DATE, 
    decommissioning_date DATE, 
    lifecycle_state VARCHAR(50), -- Commissioned, InService, UnderRepair, Retired 

    -- Bitemporal: system assertion tracking 
    system_asserted_at TIMESTAMP NOT NULL, -- when this record version was written into the system 
    system_retracted_at TIMESTAMP, -- NULL = currently asserted; set on correction/retraction 
    created_at TIMESTAMP, 
    updated_at TIMESTAMP 
); 

-- System-Specific Identifiers with Bitemporal Tracking  

CREATE TABLE dim_asset_identifier ( 
    asset_uid UUID REFERENCES dim_asset(asset_uid), 
    system_type VARCHAR(50), -- PLC, HISTORIAN, CMMS, TELEMATIC, ERP 
    system_name VARCHAR(100), -- e.g. Ignition_SCADA, SAP_PM, OEM_Telematics 
    external_id VARCHAR(255), -- tag name, equipment ID, telematics device ID 

    -- Valid time axis: when the identifier was operationally true in the physical world 
    valid_from TIMESTAMP NOT NULL, 
    valid_to TIMESTAMP, -- NULL = currently valid 

    -- System time axis: when this mapping was asserted into the Golden Record layer 
    system_asserted_at TIMESTAMP NOT NULL, 
    system_retracted_at TIMESTAMP, -- NULL = currently asserted; set on correction 

    source_authority_weight VARCHAR(50), -- PRIMARY, SECONDARY, DERIVED 
    is_primary BOOLEAN, 
    PRIMARY KEY (asset_uid, system_type, system_name, external_id, valid_from, system_asserted_at) 
);

‍

With this bitemporal structure, a feature engineering pipeline can execute an as-of query against any point in time across both axes simultaneously. For example: "Reconstruct the full identifier topology for asset_uid = X as it was operationally valid at 2026-03-14 03:14:22 UTC and as it was known to the system at that same moment" isolating only the mapping rows where valid_from ≤ target_ts < valid_to AND system_asserted_at ≤ target_ts AND (system_retracted_at IS NULL OR system_retracted_at > target_ts). This query pattern guarantees that anomaly feature windows are reconstructed against the topology that was both physically true and system-known at the millisecond of the event not a topology that was retroactively corrected or remapped after the fact.

‍

The system_retracted_at column is particularly critical during asset migrations and CMMS upgrades: when a CMMS equipment ID is reassigned or a historian tag is renamed, the old mapping row is retracted rather than deleted or overwritten, preserving the full evidentiary chain that a compliance audit or root-cause investigation may require months later.

Event models: unifying sensor data and maintenance history

PdM requires combining high-frequency sensor data with low-frequency business events like inspections and repairs. To do that consistently, you need an event model that:

Normalizes telemetry and sensor streams into a canonical format.

Normalizes maintenance and operational events (work orders, inspections, failures, operating regime changes).

Uses the Golden asset_uid as the primary foreign key.

A simple event model separation:

Time series / measurements

Vibration, temperature, pressure, current, GPS, engine speed, fuel rate, etc.

Potentially millions of points per asset per day, optimized for columnar storage and time-range scans.

Discrete events

Work orders opened/closed, findings, parts replaced.

Operating context changes (mode change, operator change, site relocation).

Fault codes and alarms.

Example canonical schemas:

‍

Storing industrial telemetry in a traditional row-oriented relational fact table is an architectural anti-pattern at operational scale. A single rotating asset instrumented with vibration, temperature, pressure, and current signals can produce millions of measurement rows per day and a fleet of hundreds of assets across multiple sites will saturate row-store I/O budgets within weeks, producing query latencies that make interactive feature engineering and real-time anomaly scoring structurally impossible. The measurement layer must be redesigned for a Medallion Architecture on a columnar lakehouse or time-series engine, where physical layout, partitioning strategy, and compression are treated as first-class engineering constraints, not storage afterthoughts.

‍

The core routing principle is narrow, compressed Parquet files partitioned by asset_uid and truncated hourly time buckets. This partition geometry ensures that a feature pipeline requesting 30 days of bearing temperature data for a single compressor performs a partition-pruned scan across a bounded, predictable file set rather than a full-table scan across a monolithic fact store shared with thousands of unrelated assets. On Delta Lake, liquid clustering on (asset_uid, date_hour) replaces static Hive-style partitioning, allowing the storage engine to co-locate frequently co-queried signals without requiring manual partition management as the asset fleet grows.

‍

--Bronze Layer Raw ingestion, append-only, schema-on-read. 

--Delta Lake table with liquid clustering: 

CREATE TABLE bronze.raw_measurement ( 
    asset_uid STRING NOT NULL, -- resolved at ingestion via dim_asset_identifier 
    signal_uid STRING, -- FK to dim_signal (nullable at bronze) 
    source_system STRING, 
    external_tag_id STRING, -- original OT tag name, pre-resolution 
    ts TIMESTAMP NOT NULL, 
    raw_value DOUBLE, 
    unit STRING, 
    quality_code STRING, -- GOOD, BAD, ESTIMATED, UNCERTAIN 

    -- Bitemporal ingestion tracking 
    ingested_at TIMESTAMP NOT NULL, -- system time: when the row landed in the lakehouse 
    source_event_ts TIMESTAMP NOT NULL, -- valid time: when the measurement occurred at the source 

    -- Partitioning metadata (derived columns for pruning) 
    date_hour TIMESTAMP GENERATED ALWAYS AS (date_trunc('hour', ts)) 
) USING DELTA CLUSTER BY (asset_uid, date_hour); 

Liquid clustering avoids the overhead of static partitions while preserving efficient pruning. 

--Silver Layer: Measurements Cleaned, signal-resolved, quality-filtered measurements. 

CREATE TABLE silver.fact_measurement ( 
    asset_uid STRING NOT NULL, 
    signal_uid STRING NOT NULL, -- resolved FK to dim_signal 
    signal_name STRING, 
    ts TIMESTAMP NOT NULL, 
    value DOUBLE NOT NULL, 
    unit STRING, 
    quality_code STRING, 
    source_system STRING, 

    -- Bitemporal columns inherited from bronze 
    ingested_at TIMESTAMP NOT NULL, 
    source_event_ts TIMESTAMP NOT NULL, 
    date_hour TIMESTAMP GENERATED ALWAYS AS (date_trunc('hour', ts)) 
) USING DELTA CLUSTER BY (asset_uid, date_hour) 
TBLPROPERTIES ( 
    'delta.autoOptimize.optimizeWrite' = 'true', 
    'delta.autoOptimize.autoCompact' = 'true', 
    'delta.dataSkippingNumIndexedCols' = '4' 
); 

--Silver Layer: Maintenance and Operational Events Normalized, low-volume discrete events where row-store semantics remain acceptable. 

CREATE TABLE silver.fact_maintenance_event ( 
    maintenance_event_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, 
    asset_uid STRING NOT NULL, 
    event_ts TIMESTAMP NOT NULL, 
    event_type STRING, -- WorkOrderCreated, WorkOrderClosed, Inspection, Failure 
    cmms_work_order_id STRING, 
    cmms_system_name STRING, 
    severity STRING, 
    cost_amount DECIMAL(18,2), 
    currency STRING, 
    source_system STRING, 

    -- Bitemporal tracking 
    system_asserted_at TIMESTAMP NOT NULL, 
    system_retracted_at TIMESTAMP -- NULL = currently asserted 
) USING DELTA CLUSTER BY (asset_uid, event_ts); 

--Gold Layer Pre-aggregated feature store for PdM model consumption. 

CREATE TABLE gold.asset_signal_hourly_features ( 
    asset_uid STRING NOT NULL, 
    signal_uid STRING NOT NULL, 
    date_hour TIMESTAMP NOT NULL, 
    value_mean DOUBLE, 
    value_stddev DOUBLE, 
    value_min DOUBLE, 
    value_max DOUBLE, 
    value_p95 DOUBLE, 
    good_sample_count BIGINT, 
    total_sample_count BIGINT, 
    quality_ratio DOUBLE -- good_sample_count / total_sample_count 
) USING DELTA CLUSTER BY (asset_uid, date_hour);

‍

The Bronze → Silver → Gold medallion routing enforces a clean separation of concerns across the measurement pipeline. Bronze absorbs raw OT streams append-only, preserving the original external_tag_id and ingested_at timestamp before identity resolution has occurred this is critical for retroactive remapping audits. Silver carries fully resolved signal_uid linkages, quality-filtered values, and inherited bitemporal columns, making it the canonical source for feature engineering pipelines. Gold materializes pre-aggregated hourly statistics per (asset_uid, signal_uid) bucket, reducing the scan surface for model training and anomaly threshold computation from billions of raw rows to a bounded, pre-computed feature store.

‍

For environments where sub-second query latency on raw telemetry is a hard requirement real-time dashboards, streaming anomaly detection TimescaleDB hypertables with chunk_time_interval => INTERVAL '1 hour' and compress_segmentby => 'asset_uid, signal_name' provide equivalent partition pruning semantics within a PostgreSQL-compatible time-series engine, without requiring a full lakehouse deployment. The architectural principle is identical: route narrow, compressed, time-bucketed chunks by asset identity, so that any time-range query against a specific asset touches the minimum possible data volume regardless of total fleet size.

Lifecycle views across projects and plants

In construction, heavy equipment is redeployed across projects frequently; in manufacturing, critical assets are often reconfigured, repurposed, or upgraded during plant turnarounds. Both scenarios make naive asset IDs unreliable for longitudinal analysis.

‍

To support lifecycle analytics and predictive models that span years and sites, you need an explicit lifecycle view:

‍

CREATE TABLE dim_asset_lifecycle ( 

    asset_uid       UUID REFERENCES dim_asset(asset_uid), 

    site_id         VARCHAR(100),  -- plant, project, yard 

    area_id         VARCHAR(100),  -- line, work-front, sub-area 

    role            VARCHAR(100),  -- e.g. "BatchMixer_Line1", "TowerCrane_ZoneC" 

    valid_from      TIMESTAMP, 

    valid_to        TIMESTAMP, 

    PRIMARY KEY (asset_uid, site_id, area_id, role, valid_from) 

);

This enables queries like:

“Show all failures of this pump across all lines it has served.”

“Compute MTBF for this excavator across all projects it has worked on.”

“Train a remaining useful life (RUL) model using all historical data for this gearbox, even though it has moved across lines.”

PdM platforms built on top of such lifecycle-aware views can deliver fleet-level insights and transfer learning across contexts, instead of being locked into specific lines or projects.

The ID reconciliation problem: mapping sensors to assets

All of the above hinges on solving the ID reconciliation problem: mapping OT and IT identifiers to asset_uid reliably. Industry guidance for CMMS–IoT integration emphasizes field mapping and asset ID alignment across PLCs, historians, and CMMS before you attempt real-time PdM workflows.

‍

In practice, you need an entity resolution pipeline that:

Ingests all candidate identifiers and attributes from OT and IT systems.

Applies deterministic matching rules where possible (e.g., exact serial number matches).

Applies probabilistic or rules-based matching where deterministic keys do not exist (e.g., fuzzy matching on description, location, model).

Generates or updates Golden Records and the dim_asset_identifier mappings.

This is where most PdM projects quietly fail: they underestimate the complexity and ongoing nature of ID reconciliation, treating it as one-time “data cleaning” instead of a core data product with its own SLAs. In Part 2, we will go deep into concrete matching patterns and pipelines to keep this mapping robust.

Where predictive maintenance platforms fit

Industrial AI and Predictive Asset Intelligence platforms promise unified, real-time asset health views by contextualizing OT and IT data into a “single source of truth.” Many of them ship with:

Asset model libraries and templates for common equipment types.

No-code tools to build asset hierarchies and map signals.

Integrated MLOps studios for anomaly detection and RUL models built on top of unified data.

Even when using such platforms, you still need:

A clear Golden Record strategy for assets, rather than letting each platform invent its own IDs.

An integration pattern to synchronize asset master data and mappings with CMMS/EAM and ERP systems.

Data contracts that guarantee that any new sensor or telematics device is onboarded with a mapping to an existing asset_uid or triggers creation of a new Golden Record.

Treat the platform’s asset model as a projection of your authoritative asset master, not a replacement for it. That ensures PdM insights can be traced back to real financial and operational metrics in your core IT systems.

Conclusion

In Part 1, we established that predictive maintenance pays off only when asset identity is engineered as a first-class concern, not an afterthought. We defined Golden Records for physical assets, proposed foundational data models for assets, identifiers, events, and lifecycles, and highlighted the central challenge of ID reconciliation across OT and IT systems.

‍

In Part 2, we will shift from concepts to implementation: how to design entity resolution rules, build pipelines for sensor–asset mapping, represent asset identity in a data Lakehouse or knowledge graph, and integrate PdM output back into CMMS, EAM, and project controls in both construction and manufacturing environments.

- Authored by Sonal Dwevedi & Tharun Mathew

The Identity Inversion: Why Predictive Maintenance Fails Without a Golden Asset Record Layer Across Segmented OT/IT Landscapes — Foundations