Why Fragmented Machine Data Is Stalling Predictive Maintenance

Predictive maintenance has been a boardroom conversation in automotive manufacturing for years because the financial stakes are immediate and measurable. Every hour of unplanned downtime can cost tens of thousands of pounds in lost production, rescheduling, and downstream disruption, while failed predictive maintenance pilots quietly drain budget through platform investments that never deliver operational value. For functional leaders, fragmented machine data is not a hidden inefficiency; it is an active line-item loss reflected in extended downtime, poor maintenance decisions, and AI programmes that fail to generate ROI.

‍

The technology narrative remains compelling: connect your machines, apply machine learning, detect failures before they happen, and reduce disruption across the production line. Yet across production environments throughout the UK and Europe, the gap between ambition and operational reality remains stubbornly wide. Programmes stall, pilots produce inconclusive results, and models trained on months of data underperform against simple threshold alarms. The reason is rarely the algorithm itself. More often, it is the engineering failure of poor data governance - historical records that cannot be trusted, disconnected systems that cannot be reconciled, and operational data that was never structured to support predictive intelligence in the first place.

‍

Across production environments throughout the UK and Europe, most predictive maintenance models underperform for a simple reason: they lack a reliable ground truth. Models are trained on incomplete asset histories, inconsistent maintenance records, misaligned timestamps, and sensor data that has changed structure over time without proper governance. In these conditions, even the most advanced algorithm is forced to make decisions based on unreliable inputs. It is not predicting failure with confidence; it is estimating probability from a flawed and fragmented view of asset behaviour.

‍

This is not primarily a failure of predictive maintenance technology. It is a failure of data maturity. Many automotive manufacturers are attempting to run sophisticated AI-driven workloads on top of legacy operational environments that were never designed for analytical intelligence. Historian systems, PLCs, CMMS platforms, MES layers, and ERP records often operate as isolated systems with no governed structure connecting them. Asset identifiers do not align, failure events are poorly labelled, and historical continuity is frequently broken by upgrades, migrations, and undocumented process changes. Without addressing this underlying data infrastructure, predictive maintenance becomes an exercise in modelling uncertainty rather than preventing downtime. The result is not reliable foresight, but false positives, missed failures, and growing distrust in the system itself.

The Structural Problem: What "Fragmented Data" Actually Means on the Plant Floor

The term "fragmented data" is used loosely enough that it risks losing its meaning. In the context of an automotive manufacturing environment, it refers to something very specific and operationally consequential.

‍

A typical assembly plant will have multiple generations of equipment on the floor simultaneously. A body-in-white welding cell installed in 2009 operates alongside a newer robotic hemming station commissioned in 2019 and a press line that dates back further still. Each of these assets carries its own data infrastructure: PLCs from different vendors, historian systems with different sampling configurations, maintenance records stored in systems that may have been replaced once or twice over the asset's lifetime, and calibration logs that may exist in spreadsheets, paper records, or not at all. At the OEM level, there will typically be an ERP system and potentially a separate MES layer coordinating production schedules. None of these systems was designed to speak to the others in a semantically consistent way.

‍

The systems are not broken in isolation. Each one does what it was built to do. The failure is architectural: there is no governed layer that reconciles these systems into a coherent, unified representation of asset behaviour. The data exists. The problem is that it cannot be reliably assembled, aligned, or trusted in its current form.

‍

The practical consequence is that each asset has a different data biography. The welding cell from 2009 has historian data going back years, but a significant portion of it was collected under a different sampling rate, with different tag names, before a major control system upgrade changed how values were recorded. The newer hemming station has cleaner telemetry but limited historical depth. The press line has extensive maintenance records in the legacy CMMS, but those records use free-text fields for fault descriptions, making programmatic analysis unreliable without substantial preprocessing. When a data scientist sits down to build a predictive failure model, they are not working with a dataset. They are working with a collection of partially overlapping, inconsistently structured, and variably reliable data artefacts that happen to describe the same physical environment.

‍

This is what fragmentation looks like in practice. It is not simply that the data is spread across multiple systems. It is that the data across those systems is structurally incompatible, contextually inconsistent, and often temporally misaligned in ways that make cross-signal analysis unreliable without significant upstream intervention.

Why Inconsistent Asset Histories Break Predictive Models

Predictive maintenance models, whether based on classical statistical approaches such as ARIMA and Weibull analysis or more contemporary anomaly detection and supervised classification frameworks, share a common dependency: they require a coherent historical record of asset behaviour to establish what normal looks like, and a meaningful record of failure events to learn from. Both of these requirements are directly undermined by the condition of asset histories in most automotive manufacturing environments.

‍

Consider the challenge of establishing a reliable operational baseline. A gradient-boosted classifier trained to predict bearing failures on a conveyor motor needs to understand the sensor signature of a healthy bearing across a range of operating conditions: different load profiles, ambient temperature variations, production speeds, and startup and shutdown cycles. If the historical data contains periods where the sensor was miscalibrated, where the tag was renamed during a system migration and the continuity was never reconciled, or where the asset was running on a reduced-capacity protocol that was never documented in the historian, then the model's baseline will be contaminated. It will learn a representation of normal that blends genuine healthy operating states with artefacts introduced by data quality failures. The model does not know the difference. It treats all ingested data as equally informative about asset behaviour.

‍

The problem compounds at the failure event level. Predictive models are meant to learn the relationship between machine behaviour and actual failure modes, but in fragmented environments they often end up learning the artefacts of the data system instead of the physics of the machine itself. Inconsistent fault labels, missing maintenance records, duplicated interventions, and timestamp mismatches create patterns that reflect system limitations rather than genuine equipment degradation. A model may associate a spike in vibration data with a bearing failure, when in reality it is learning the effect of how a work order was logged, how a sensor tag changed after a PLC upgrade, or how maintenance teams recorded interventions differently across shifts.

‍

This is one of the main reasons predictive maintenance pilots succeed inside a single controlled production cell and then fail to scale across the wider plant. In a tightly managed pilot environment, data structures are cleaner, failure events are closely observed, and the model can produce credible results. Once deployed across multiple lines, older assets, and disconnected systems, that consistency disappears. The model is no longer learning repeatable failure signatures; it is learning the inconsistencies of the infrastructure around them. Scaling fails not because the algorithm is weak, but because the underlying data architecture cannot support reliable learning beyond isolated conditions.

The Data Cleansing Omission: The Step That Gets Skipped Most Often

There is a consistent pattern in how automotive manufacturers approach predictive maintenance investment. The focus goes on model architecture and algorithm selection. Teams spend time evaluating vendors, comparing deep learning frameworks against classical statistical methods, and debating whether to build in-house capability or purchase a platform. These are legitimate questions, but they are secondary to a more fundamental one that rarely receives equivalent attention: is the underlying data fit for modelling?

‍

Data cleansing in an industrial context is not a preprocessing step that can be automated away in an afternoon. It is a sustained, expert-driven engineering programme that requires domain knowledge of the equipment, the processes, and the data infrastructure simultaneously. The work involves identifying and correcting sensor calibration drift, reconciling tag naming inconsistencies across system migrations, detecting and handling missing data in a way that reflects genuine absences rather than artificially imputing values that distort the signal, and aligning timestamps across asynchronous data streams so that multi-sensor correlations are temporally valid.

‍

Most organisations treat this as a one-time data preparation task rather than an ongoing data governance discipline. That distinction matters enormously. Sensor behaviour changes over time. Equipment undergoes modifications. PLCs are reconfigured. Historian systems are upgraded. Each of these events introduces new forms of data quality degradation, and none of them will be automatically detected or corrected without a structured data monitoring and governance framework in place. A predictive model that was trained on clean, validated data in 2023 may be receiving corrupted inputs by 2024 if the pipeline is not actively maintained, and its prediction accuracy will degrade silently without any obvious failure signal to alert the team.

‍

The reason this step is consistently underinvested is partly structural and partly cultural. Structurally, data cleansing is difficult to sell as a proposition. It does not produce a visible product, a dashboard, or a demonstrable AI output. The business case is expressed as a negative: without it, downstream systems will fail more slowly and less obviously. That framing does not compete well for budget against the more tangible proposition of an AI-powered maintenance platform. Culturally, there is a tendency in manufacturing organisations to assume that data from control systems is inherently reliable, because those systems are safety-critical and subject to engineering governance. That assumption conflates the reliability of control outputs, which must be accurate for the machine to function, with the analytical quality of the recorded data, which is subject to entirely different failure modes that engineering governance does not address.

‍

The practical consequence is that predictive maintenance programmes are routinely built on data foundations that would not pass a basic quality audit. The people building the models often lack the domain expertise needed to identify the problem from the data alone. The model trains, validation metrics look acceptable on a held-out portion of the same flawed dataset, and the programme moves into production. There, it generates enough false positives and missed detections to erode engineer trust within months, and the investment is quietly written off as a failed AI experiment, when in reality it was a failed data preparation exercise that the AI was never given the conditions to succeed.

‍

Addressing data quality as a discipline rather than a task is not optional infrastructure. It is the foundation on which reliable predictive intelligence is built. Skipping this step is not a shortcut; it is an expensive form of technical debt that compounds over time and eventually bankrupts the programme. Every model trained on poor-quality data increases false positives, missed failures, and engineer distrust, while every new platform layered on top of unresolved data issues adds cost without improving outcomes. What appears to be faster progress at the start becomes slower and more expensive remediation later, as teams are forced to rebuild pipelines, relabel failures, and restore trust in systems that should have been reliable from the beginning. In predictive maintenance, unresolved data quality debt does not stay hidden. It surfaces as failed ROI, abandoned pilots, and operational decisions made on unreliable intelligence.

Disconnected Operational Systems and the Asset Intelligence Gap

Even in organisations where sensor data quality is taken seriously, predictive maintenance programmes regularly fail because they treat the sensor stream as the only relevant data source. In reality, a sensor reading is only interpretable in context. The same vibration amplitude on a rotating shaft means something fundamentally different during a high-load production sprint, during a run-in period after a recent bearing replacement, and during a period of reduced-capacity operation following a planned maintenance intervention. A predictive model without access to this operational context cannot distinguish these scenarios. It will generate alerts that are technically correct in a narrow signal-processing sense but operationally meaningless, or actively misleading.

‍

Operational context in automotive manufacturing is distributed across precisely those disconnected systems described above. Production schedules sit in the MES or ERP. Maintenance histories, including recent interventions that temporarily alter asset behaviour, sit in the CMMS. Environmental conditions such as ambient temperature in press shops, which directly affects lubrication viscosity and thus bearing friction signatures, may be captured in building management systems or not at all. Shift patterns and operator assignments, relevant because different operators run equipment differently and introduce consistent behavioural patterns into the sensor data, sit in HR systems or in manual logs.

‍

None of these data sources is typically integrated into the predictive maintenance pipeline. They are treated as peripheral context rather than primary modelling inputs, even though their absence directly undermines the model's ability to interpret what the sensor data means. The predictive model is being asked to reason about equipment behaviour without access to the operational narrative that explains it.

‍

This is fundamentally a data engineering problem, not an algorithm problem. The systems exist. The data exists. The failure is in the absence of a governed integration layer that connects these sources, resolves the inconsistencies between them, and makes the combined dataset accessible to the modelling layer in a form that reflects actual asset behaviour. Without MES and ERP integration inside the maintenance loop, predictive models can identify anomalies in isolation but cannot interpret whether those anomalies matter in operational terms. A vibration spike may be technically valid, but without production context the model cannot distinguish between normal behaviour during a high-throughput production run and an early indicator of mechanical failure.

‍

This is exactly why so many predictive maintenance systems generate alerts that are technically correct but operationally useless. Engineers receive warnings without the surrounding context needed to act on them: no visibility into recent maintenance interventions, no awareness of planned production changes, no connection to asset loading conditions, and no understanding of whether the anomaly reflects genuine risk or expected operational variation. Over time, these alerts become noise. Engineers learn to ignore them, trust in the system erodes, and the predictive maintenance programme fails not because the model was mathematically wrong, but because it was disconnected from how the plant operates.

What Unified Asset Intelligence Looks Like in Practice

Unified asset intelligence is not a product category or a vendor claim. It is an architectural outcome achieved through disciplined data engineering, semantic modelling, and governance. When it is done properly, it changes the fundamental capability of a predictive maintenance programme, not by improving the algorithm, but by giving the algorithm something it can actually learn from.

‍

In a production environment that has made this investment, the data pipeline looks fundamentally different from the fragmented architecture described above. At the ingestion layer, telemetry from PLCs and SCADA systems is collected through standardised protocols such as OPC UA, with adapter layers handling legacy vendor-specific formats, including older environments that dominate UK and European plants. Historian data is ingested with full tag mapping, ensuring that tag rename events and sampling rate changes are logged and preserved as metadata rather than silently corrupting the time series. MES and ERP data is connected through governed integration layers that resolve asset identifiers, align timestamps to a common reference frame, and enforce data type consistency before records enter the unified store.

‍

At the quality governance layer, automated monitoring profiles run continuously against incoming data streams, detecting anomalies such as sensor values outside calibration bounds, missing data windows exceeding expected thresholds, statistical distribution shifts indicating sensor drift, and timestamp misalignments flagging synchronisation failures. These are not post-hoc audits. They are real-time quality controls integrated into the pipeline, generating alerts to data engineering teams before degraded data reaches the modelling layer. Automated validation and performance benchmarking at this layer ensures that AI-ready pipelines remain reliable not just at commissioning, but throughout their operational life.

‍

At the semantic layer, assets are represented as governed entities with defined relationships to their operational context. The bearing on a conveyor motor is not merely a node in a sensor graph. It is an entity with a known installation date, a maintenance history that includes specific intervention types and their outcomes, a relationship to the production line it drives, and a set of operating parameters that define its expected behaviour under different load and environmental conditions. When the predictive model reasons about an anomaly in that bearing's vibration signature, it has access to all of this context. It knows whether the bearing was recently replaced and is in a run-in period. It knows whether the production line is running at elevated throughput. It knows whether ambient temperature in the press shop has been unusually high. The model's output is not a raw anomaly score. It is a contextually grounded assessment of failure probability given current operational conditions.

‍

At the feedback and governance layer, every maintenance intervention is captured in a structured, taxonomically consistent format and linked back to the sensor record for the period preceding it. When a technician replaces a bearing, that event is recorded with a standardised fault code, a timestamp, a severity classification, and a link to the asset entity in the unified model. Over time, this creates a labelled dataset of genuine operational quality: failure events that are accurately timestamped, correctly attributed, and structurally consistent enough to serve as reliable training targets. The model improves not because more data accumulates in volume, but because the quality and consistency of that data is actively governed throughout its lifecycle.

‍

This kind of unified architecture does not emerge from deploying a single platform. It is built through specific, governed data engineering components that make reliability measurable and enforceable across the entire pipeline. At the ingestion layer, OPC UA adapters standardise communication across modern and legacy equipment, while connectors for older Siemens, Rockwell, Mitsubishi, and Schneider environments preserve continuity across mixed-vendor plants. Historian feeds are not simply ingested; they are mapped, versioned, and reconciled so that tag changes, sampling rate shifts, and control system upgrades do not silently corrupt historical continuity.

‍

At the validation layer, rules such as SHACL-based schema validation and governed semantic models ensure that asset relationships, timestamps, and operational events remain structurally consistent as data moves across systems. MES, ERP, CMMS, and historian records are continuously checked for identifier mismatches, missing fields, invalid state transitions, and broken timestamp alignment before they reach the modelling layer. This turns unified asset intelligence into a real-time quality control system for data itself, where bad records are treated with the same seriousness as defective physical parts on a production line.

‍

The objective is not simply centralisation, but engineering rigor. Data is monitored, validated, and governed continuously so that predictive models are learning from trusted operational reality rather than accumulated system noise. Unified intelligence works when manufacturers apply the same discipline to information flows that they already apply to physical production quality: standardisation, inspection, traceability, and controlled correction before defects move downstream.

Why the Automotive Manufacturing Context Raises the Stakes

Automotive assembly is an environment that makes data fragmentation particularly consequential. The production processes are tightly interdependent. A failure on a single welding robot in a body shop does not merely halt that cell; it has the potential to cascade through a just-in-time production sequence, stopping downstream operations that depend on its output. The cost of unplanned downtime in automotive assembly ranges from tens of thousands to well over one hundred thousand pounds per hour when downstream losses, rescheduling costs, and overtime are included. That financial exposure makes the difference between a predictive model that generates reliable early warnings and one that produces alert fatigue operationally material at a business level, not just technically interesting.

‍

The asset landscape introduces specific data challenges that general-purpose predictive maintenance frameworks do not adequately address without domain-specific engineering. High-cycle assets such as stamping presses and spot welders accumulate wear in ways that are tightly correlated with cycle count and applied force, not simply elapsed time. A model that does not have access to production throughput data and per-cycle force parameters cannot accurately model wear trajectories for these assets. Robotic systems introduce the additional complexity of path variation and TCP drift, where gradual deviations from the programmed trajectory place increasing stress on gearboxes and joint actuators. These deviations are detectable in servo current and encoder data, but only if that data is correctly integrated with the robot's program logs and compared against known-good trajectory baselines. Paint shop environments add thermal and chemical exposure variables that significantly influence the failure modes of conveyor and handling equipment. Press shops have structural vibration environments that make signal isolation for individual assets technically demanding.

‍

Each of these domain-specific factors represents a variable that the data architecture must capture and expose to the modelling layer if predictions are to be operationally meaningful. This is not a standard data integration challenge. It requires teams who understand both the engineering behaviour of the assets and the data infrastructure that records it.

Where to Start: The Investment Sequence That Actually Works

Manufacturers who want predictive maintenance programmes to work in production need to begin with a data liquidity audit, not a platform selection exercise. Before asking which AI platform to deploy, the first question should be whether the organisation can actually move data from the machine to the model in a governed, reliable, and repeatable way. If sensor data cannot be traced, validated, contextualised, and linked to real maintenance outcomes, then no predictive platform will solve the problem. It will only add another layer of cost on top of existing structural failures.

‍

A data liquidity audit assesses whether operational data can flow across the plant without losing meaning or trust. That means examining the completeness and accessibility of historical sensor data across critical asset classes, the consistency of asset identifiers between PLCs, historian systems, MES, ERP, and CMMS platforms, the quality and taxonomy of maintenance records, and the availability of operational context such as production schedules, environmental conditions, and intervention histories. It also means evaluating whether governance mechanisms exist to detect calibration drift, broken timestamp alignment, missing data windows, and schema changes before they degrade the modelling layer. The objective is not to count how much data exists, but to determine whether that data can move across systems as usable operational intelligence.

‍

In most automotive manufacturing environments, the answer reveals significant gaps—not because engineering teams have failed, but because these systems were designed for operational control, not analytical learning. Data may be available, but not portable. It may be captured, but not governed. It may be technically accessible, but not trustworthy enough to support automated decision-making.

‍

Addressing those gaps requires a structured programme of data engineering work: tag reconciliation and normalisation, entity resolution across systems, OPC UA integration across legacy and modern assets, real-time quality monitoring pipelines, semantic modelling of asset relationships, and governance frameworks that prevent schema evolution from silently degrading data quality over time. This work does not look like innovation from the outside, but it is the only reliable path to predictive maintenance that scales. The goal is not to prepare for the real work later. This is the real work.

Why Merit Data and Technology

At Merit, we focus on the part of predictive maintenance that most AI programmes fail to address: the underlying data engineering problem. Most AI vendors assume the data foundation already exists—that asset histories are complete, maintenance records are structured, timestamps align, and operational systems can be reliably connected. In automotive manufacturing, that assumption is rarely true. The real challenge sits much earlier in the pipeline.

‍

Our strength is the combination of industrial engineering understanding and data architecture expertise. We work in the environments manufacturers actually operate: mixed-vendor PLCs, legacy historian systems, disconnected CMMS records, MES layers, ERP dependencies, and operational context spread across systems that were never designed to work together. Solving predictive maintenance in these environments requires more than model development. It requires knowing how machine behaviour is recorded, where data integrity breaks, and why those failures matter to the physical performance of the asset.

‍

That is where we work. We handle the unglamorous but decisive engineering tasks that determine whether predictive maintenance will succeed at all: tag reconciliation across system migrations, asynchronous timestamp alignment, entity resolution across assets and maintenance records, OPC UA integration across legacy and modern equipment, and real-time quality monitoring that detects sensor drift, missing data windows, and schema failures before they reach the modelling layer. We build governed pipelines where bad data is treated like defective production output - identified early, corrected systematically, and prevented from moving downstream.

‍

We also validate continuously. Automated benchmarking, schema controls, and quality assurance are embedded into the pipeline so predictive systems remain reliable as equipment changes, processes evolve, and production environments scale. This is not about deploying another AI platform. It is about fixing the data problems most AI companies assume are already solved.

‍

The manufacturers who succeed with predictive maintenance are not the ones who start with the most sophisticated algorithms. They are the ones who first make their operational data trustworthy enough for those algorithms to work. That is the engineering problem we are built to solve.

- Authored by Rubaina Rauf & Tharun Mathew

Why Fragmented Machine Data Is Stalling Predictive Maintenance Before It Starts