
Part 1 confronts the reality of architectural amnesia in undocumented, polyglot legacy estates and shows how to engineer living dependency graphs that power risk-aware DevOps automation, safer refactoring, and intelligent change management across complex enterprise landscapes.
Enterprise modernization programs rarely fail because Kubernetes is hard or cloud isn’t mature enough. They fail because nobody can state with confidence what the legacy system really does, how it behaves in production, or what will break when you change it.
Over years, institutional knowledge migrates from architecture decks and runbooks into tribal memory and finally into code, leaving business logic embedded in tangled conditionals, fragile batch jobs, and undocumented integration paths.
When this happens at scale, legacy modernization becomes a strategic risk: Gartner estimates most modernization initiatives exceed budgets or fail to meet expectations, largely due to underestimated complexity and hidden dependencies.
Recent failures like the FAA outage and Southwest’s scheduling meltdown underscore how legacy systems with opaque dependencies can cripple core operations once they cross a fragility threshold.
This is architectural amnesia: the system still runs, but the organization has forgotten why it behaves the way it does.
AI coding assistants and refactoring tools dropped blindly onto such estates only amplify risk; without a machine-readable model of the system’s structure and behavior, you are asking a stochastic model to guess inside a minefield.
Architectural amnesia is not merely a documentation gap it is a symptom of systemic structural volatility, where the accumulated complexity of the estate has outpaced the cognitive capacity of any individual or team to model, reason about, or safely change it. The system continues to function, but its internal state has become epistemically opaque: no single mental model, architecture diagram, or runbook captures how it actually behaves under load, during failure, or at the boundaries between subsystems.
This opacity carries a specific and underappreciated risk when AI enters the picture. Running a stochastic, non-deterministic LLM coding assistant over an undocumented codebase is not a neutral act. Without a machine-readable model of structure, contracts, and runtime behavior, the assistant operates on statistical pattern completion not system comprehension. The implicit side effects of such interventions are difficult to anticipate and harder to roll back: changes that appear locally coherent can introduce unquantifiable state corruption liability across downstream persistence layers, message queues, batch pipelines, and shared data stores that the model never had visibility into.
What we need first is a living system intelligence layer: a continuously updated graph of topology, dependencies, data contracts, and behavioral patterns that exposes the hidden logic of the estate to both humans and AI transforming architectural amnesia from an invisible liability into a mapped, navigable, and auditable structure.
Think of system intelligence not as a stack of disconnected graphs, but as a unified system metagraph: a single multi-relational property graph where each “layer” is a different dimension on the same set of entities, rather than a separate model.
The real engineering work is not just populating these dimensions independently, but cross-layer entity resolution: being able to say, with machine-checked accuracy, that “this specific function token in a code file” corresponds to “this span or block in a runtime trace” and “this logical data contract schema” at the moment it reads or writes persistent state. That alignment has to hold continuously and in near real time as the system evolves, otherwise you are back to disconnected diagrams that drift away from production reality.
.jpg)
Once this metagraph exists, LLMs, AI coding agents, and rule-based analyzers can operate over a living, multi-dimensional model of the estate instead of raw text blobs. That is the difference between “summarize this file” in isolation and “safely refactor the low-risk part of this billing pipeline, knowing exactly which runtime traces, data contracts, and downstream dependencies are in scope, and generating targeted tests for them..
.jpg)
Step 1: Reconstructing system topology in undocumented estates
System topology is the macro view: how applications, services, and infrastructure pieces connect. In a cross-industry legacy landscape, mainframes, client–server apps, ERP customizations, ETL chains, OLTP databases, you cannot rely on a single source of truth. CMDBs, if they exist, are often stale or inaccurate.
A pragmatic extraction strategy uses multiple noisy signals and reconciles them into a consistent topology graph:
Example:
A naive approach is to scan the repository tree for files like Dockerfile, Helm charts, or compose files and treat their parent directories as “service roots.” That can be useful for a quick, one-off inventory, but it is fundamentally tool-level scripting and will drift as soon as the build system, flags, or entry points change.
At scale, you need to anchor the macro skeleton of the estate in the actual compilation and build pipelines, not just the filesystem layout. A more resilient approach is to:
In practice, you end up with a pipeline that looks less like “grep for Dockerfile” and more like “materialize a language-aware build graph from the same artifacts your compilers and CI pipelines already consume.
import json
from pathlib import Path
# Simplified illustration for a C/C++-style estate using compile_commands.json
def load_compilation_db(path: Path):
return json.loads(path.read_text(encoding="utf-8"))
def infer_service_roots(compdb):
"""Group translation units into coarse service roots using simple heuristics."""
roots = {}
for entry in compdb:
file = Path(entry["file"])
parts = file.parts
if "src" in parts:
idx = parts.index("src")
if idx + 1 < len(parts):
root = Path(*parts[:idx + 2])
roots.setdefault(str(root), []).append(str(file))
return roots
if __name__ == "__main__":
compdb = load_compilation_db(Path("compile_commands.json"))
service_roots = infer_service_roots(compdb)
for root, files in service_roots.items():
print(f"SERVICE_ROOT::{root} ({len(files)} units)") In a polyglot estate, you repeat this pattern with language-appropriate AST tooling (e.g., JavaParser, Roslyn, TypeScript compiler API, ABAP/COBOL analyzers) and build metadata for each stack. The key idea is that the macro topology is derived from the same compiler flags, build configurations, and structural ingress points that actually produce binaries and deployable artifacts, making the resulting skeleton far more stable than any ad-hoc directory scan.
This can be enriched with parsing of pom.xml, csproj, package.json, and ABAP packages to tag language and stack.
2. CI/CD pipelines and job chains
Consider a GitLab CI snippet:
stages:
- build
- test
- deploy
build_billing:
stage: build
script:
- ./gradlew :billing-service:build
artifacts:
paths:
- services/billing/build/libs/billing.jar
test_billing:
stage: test
needs: [build_billing]
script:
- ./gradlew :billing-service:test
deploy_billing:
stage: deploy
needs: [test_billing]
script:
- ./deploy/billing-deploy.sh prod At toy scale, you can get away with treating a CI file as a small YAML blob and turning its needs: clauses into a NetworkX DiGraph. In real estates, however, job orchestration is split across GitLab pipelines, Jenkinsfiles, and enterprise schedulers like Control-M or Autosys, often with hundreds or thousands of jobs per environment. The problem stops being “parse a YAML” and becomes DAG syntactic inversion: programmatically deconstructing heterogeneous scheduler syntaxes into a canonical execution model.
The goal is to normalize all of these definitions into a canonical execution matrix:
Once you have that matrix, you can invert it into a DAG that reveals:
Conceptually, the parsing step becomes:
from typing import Any, Dict, List
class CanonicalJob:
def __init__(self, id: str, triggers: List[Dict[str, Any]]):
self.id = id
self.triggers = triggers
# Example trigger: {"type": "job_success", "job": "build_billing"}
def invert_scheduler_defs(raw_defs) -> List[CanonicalJob]:
"""Collapse GitLab, Jenkins, Control-M, and Autosys definitions into canonical jobs with explicit trigger semantics."""
jobs: List[CanonicalJob] = []
# 1) Parse each scheduler format into an intermediate representation.
# 2) Normalize triggers such as job completion, file arrival, or time-based schedules.
# 3) Emit CanonicalJob(id, triggers) objects.
return jobs
def build_execution_dag(jobs: List[CanonicalJob]):
dag = {}
for job in jobs:
dag.setdefault(job.id, set())
for trigger in job.triggers:
if trigger["type"] == "job_success":
dag.setdefault(trigger["job"], set()).add(job.id)
# Track time-window-only triggers separately as temporal edges.
return dag
In an enterprise scheduler, DAG syntactic inversion means:
This is where structural risk becomes visible. The canonical DAG plus execution matrix lets you query for “jobs whose only coupling is time,” “bottlenecks with high transitive fan-out,” or “chains where a single calendar misconfiguration can cascade across multiple business processes.” Those are exactly the places where DevOps automation and AI-driven changes need the strongest guardrails.
3. Infrastructure as code and configuration
Example: parsing Terraform resources for service-to-database relationships:
resource "azurerm_postgresql_flexible_server" "billing_db" {
name = "billing-db"
# ...
}
resource "azurerm_container_app" "billing_service" {
name = "billing-service"
# ...
env {
name = "DB_HOST"
value = azurerm_postgresql_flexible_server.billing_db.fqdn
}
} At first glance, this looks straightforward: a parser walks the HCL, sees DB_HOST wired to billing_db.fqdn, and emits a graph edge Service(billing-service) -> Database(billing-db). In real environments, that is only the most explicit edge. A hardened production VPC or mainframe-adjacent subnet is typically held together by implicit graph edges that never appear as simple resource references:
A topology extractor that only reads static HCL and container env blocks will therefore construct a naive graph: it will show theoretical connectivity rather than the effective connectivity that exists once all environment-specific configurations, secret substitutions, and IAM constraints are applied.
For AI systems, this distinction is critical. If an LLM-based engine reasons over the naive graph, it will systematically miscalculate impact and blast radius:
A resilient topology pipeline must therefore:
Only once these implicit graph edges are modeled does the resulting topology reflect the real operational surface area. That is the baseline an AI engine needs if it is going to propose changes without underestimating the impact surface inside a locked-down production VPC or a legacy mainframe region.
4. Runtime process inventory and network topology
Each of these sources is incomplete and sometimes contradictory, but when merged into a graph (e.g., property graph or document graph), they approximate a live system topology significantly better than any static diagram.
At this stage, you have not touched actual function-level code yet; you’ve built a macro skeleton of the estate.
Step 2: Extracting code-level dependency graphs without documentation
The next layer is a code-level dependency graph that spans languages and frameworks. The goal is not a perfect AST for every file, but a cross-language dependency map that can answer questions like:
Modern static analysis and metadata-based approaches can infer much of this, even when tests are missing and documentation is outdated.
A practical pipeline often looks like this:
1. Language-specific analyzers
Example:
At small scale, it is tempting to demonstrate “dependency extraction” with a toy Python import graph. In the estates this article is concerned with mainframes, ERPs, PL/SQL-heavy databases, and large Java/.NET applications that example is misleadingly narrow. The real problem is polyglot call‑graph extraction: constructing a single call graph that spans Java, C#, COBOL, ABAP, PL/SQL, shell, and integration glue, and then aligning it with runtime behavior and data contracts.
A practical strategy starts by treating each language and platform as a first‑class analysis domain with its own index:
The crucial step is then to bridge semantic gaps across languages, especially where type information disappears. A classic example is a legacy Java application invoking a stored PL/SQL procedure via an un‑typed JDBC string:
// Java snippet (simplified)
String sql = "CALL BILLING_APPLY_CHARGES(?, ?, ?)";
CallableStatement stmt = conn.prepareCall(sql); On the database side, the corresponding PL/SQL definition might look like:
CREATE OR REPLACE PROCEDURE BILLING_APPLY_CHARGES(
p_account_id IN NUMBER,
p_period_start IN DATE,
p_period_end IN DATE
) AS
BEGIN
-- ...
END; A robust polyglot call‑graph pipeline has to:
Similar patterns apply when:
The end result is not a set of isolated per‑language graphs, but a stitched polyglot call graph where:
This is the level of call‑graph fidelity required for risk‑aware modernization: it lets you ask “If we change this Java method, which PL/SQL procedures, tables, and batch jobs are logically downstream?” instead of only knowing which .java files import which packages.
2. Cross-language linking via integration points
Cross-language linking relies on integration primitives like HTTP calls, message queues, file drops, or SQL.
Example: detect REST calls in Java and map to a logical BillingAPI node:
// Legacy Java example
public class BillingClient {
private final String baseUrl;
public BillingClient(String baseUrl) {
this.baseUrl = baseUrl;
}
public Invoice getInvoice(String id) {
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(baseUrl + "/api/invoices/" + id))
.GET()
.build();
// ...
}
} A Java parser can be used (e.g., JavaParser) to extract baseUrl + "/api/invoices" patterns and map them to an API contract node.
Similar patterns apply for JDBC URLs, message topics, or ABAP RFC calls; each pattern becomes a cross-language edge in the graph
3. Configuration-driven enrichment
4. SBOMs and dependency manifests
To integrate third‑party dependencies, ingest SBOMs or manifest files into the graph.
Example: processing a Maven pom.xml to emit library nodes:
import xml.etree.ElementTree as ET
import networkx as nx
g = nx.DiGraph()
pom = ET.parse("pom.xml").getroot()
ns = {"mvn": "http://maven.apache.org/POM/4.0.0"}
for dep in pom.findall(".//mvn:dependency", ns):
group = dep.find("mvn:groupId", ns).text
artifact = dep.find("mvn:artifactId", ns).text
version = dep.find("mvn:version", ns).text
lib_id = f"{group}:{artifact}:{version}"
g.add_node(lib_id, type="library")
# connect current module -> libs in the global graph SBOM-aware graphs let you reason about vulnerabilities and transitive dependencies alongside internal code.
5. Graph modeling : Normalize all findings into a unified schema, e.g.:
Attach attributes like language, repo, owner, criticality, last_changed, lines_of_code, and test_coverage where available.
This is where AI can start helping in a controlled way: AI models can assist in classifying ambiguous patterns, identifying likely integration points in loosely structured code, and suggesting missing edges but always with human review and deterministic checks.
The key is that the primary artifact is the graph, not the summary. Summaries can be generated from the graph later.
Step 3: Differentiating structural vs. behavioral dependencies
A common anti-pattern in modernization is treating all dependencies as equally important. In reality, many edges in your code-level graph are low-risk (debug-only, rarely used, obsolete), while a subset represents behavioral load-bearing paths.
To separate them:
1. Intersect static dependencies with runtime telemetry
Example: join static edges with trace counts:
import networkx as nx
from collections import Counter
# static graph
static_g = nx.read_gpickle("static_call_graph.gpickle")
# load runtime call pairs from traces (extracted earlier)
runtime_pairs = []
with open("trace_edges.csv") as f:
# format: caller,callee
for line in f:
caller, callee = line.strip().split(",")
runtime_pairs.append((caller, callee))
counts = Counter(runtime_pairs)
for (caller, callee), freq in counts.items():
if static_g.has_edge(caller, callee):
static_g[caller][callee]["runtime_freq"] = freq
nx.write_gpickle(static_g, "behavioral_graph.gpickle") Edges with consistently high runtime_freq and high error rates become high-behavioral-risk paths. This informs both human decisions and AI guardrails.
2. Identify critical paths and hotspots
3. Flag dead or low-value dependencies
This distinction is crucial once AI enters the loop: AI coding agents can be allowed to perform more aggressive refactors and auto-generated changes in low-behavioral-risk regions, while high-risk nodes require stricter workflows (pairing with SMEs, multi-stage review, extensive testing).
Step 4: System topology as the scaffolding for DevOps and AI
By now, we have:
This combined view unlocks several DevOps and AI use cases:
In practice, the value of the system metagraph is not that you can run ad‑hoc graph queries from a notebook, but that you can wire those queries directly into your deployment pipelines as autonomous, graph‑gated circuit breakers.
In a graph‑gated CI/CD model, every change set is evaluated against the living metagraph before it is allowed to progress past key stages:
Instead of a static test matrix, the pipeline then dynamically injects targeted, containerized test configurations based on this blast radius:
Conceptually, the pipeline step looks like:
def evaluate_change(graph, changed_entities):
"""
1) Expand changed entities into an impacted subgraph
2) Compute Semantic Diff Blast Radius and risk level
3) Emit a test and rollout strategy for this specific change
"""
impacted = expand_impacted_subgraph(graph, changed_entities)
risk = score_risk(graph, impacted)
strategy = select_test_and_rollout_strategy(risk, impacted)
return strategy The result is an autonomous circuit breaker: deployments are no longer gated only by static branch policies or generic “run all tests” stages, but by a metagraph‑driven understanding of what this change actually means in the context of the whole estate. This is how a living dependency graph stops being a documentation artifact and becomes an active control plane for risk‑aware modernization.
Conclusion
Part 1 showed how to pierce that opacity by reconstructing system topology and dependency graphs even when documentation is missing and SMEs have moved on. Using static analysis, CI/CD metadata, runtime telemetry, and SBOMs, enterprises can build a living graph that exposes structural and behavioral dependencies, ranked by risk and criticality.
In Part 2, we will go deeper into mining data contracts and behavioral patterns from these same legacy systems turning raw code, logs, and incident history into AI-readable semantics that drive safer refactoring, targeted testing, and risk-aware deployment automation across cross-industry IT landscapes.