Architectural Amnesia: Solving the Context Gap in AI-Assisted Legacy Modernisation

Most legacy modernisation programs fail not because of bad tools, but because AI is thrown at code with no context: thousands of files, zero documentation, tribal knowledge lost, incident history scattered across ticketing systems. To make AI coding assistants reliable in these environments, you have to build technical knowledge layers that encode not just what the code does, but why it exists and how it behaves in production. This article goes deep into how to structure repositories, documentation, runbooks, and incident logs so that AI assistants can reason about legacy systems in cross‑industry IT / DevOps landscapes (banking, manufacturing, telecom, government) instead of just autocompleting syntax.

Why legacy environments break AI coding assistants

Legacy environments don’t just suffer from missing documentation; they suffer from architectural amnesia and decades of semantic drift between what the system was supposed to do and what the code actually does today. Business rules have been patched for new products, regulations, and edge cases, often without updating specs or diagrams, so the implementation gradually diverges from the original intent that lives in old design docs or retired engineers’ heads.

On paper, many of these systems still “work,” but the semantics of fields, workflows, and failure modes have shifted: flags repurposed for new meanings, tables overloaded with legacy and modern data, feature toggles that became permanent, and hotfixes that encode regulatory or commercial constraints no one remembers explicitly. This semantic drift is especially acute in regulated domains like banking, insurance, and telecom, where compliance changes accumulate as ad‑hoc code paths rather than clearly documented domain models.

Traditional AI coding assistants are trained to perform token prediction, not institutional memory reconstruction. Without access to the real, current ground truth regulatory constraints, contractual obligations, operational invariant they will happily propose syntactically correct changes that violate subtle but critical business rules encoded in legacy conditionals, batch schedules, or data flows. In multi‑repo monoliths with unknown dependencies and outdated frameworks, this means a “safe‑looking” refactor in one service can silently break settlement flows, reporting obligations, or SLAs three systems away.

That’s why naive “AI over code” approaches routinely fail in modernisation projects: they operate on text tokens stripped of the evolving semantics that give those tokens meaning in the current business, regulatory, and operational context. Without engineered knowledge layers that reattach code to up‑to‑date domain intent, incident history, and compliance constraints, AI assistants are guessing in the dark.

Technical knowledge layers: what they are

A technical knowledge layer is everything you build around the code that lets humans and machines understand system behavior across time, not just at a single file. In legacy environments, you can think of this as a stack of structured artifacts:

  • Code & configuration graph: Repository layout, module boundaries, infra-as-code, environment configs, deployment descriptors.
  • Architecture & domain intent: Context diagrams, domain models, architecture decision records (ADRs), feature maps tied to business capabilities.
  • Operational runbooks: Repeatable procedures for diagnosis, mitigation, and recovery, wired to services, dashboards, and commands.
  • Incident & change history: Structured timelines of failures, impact, root cause, and applied fixes tied back to code and config versions.
  • Tests & observability: Characterization tests, golden‑path flows, SLOs, logs, traces, and metrics that describe “normal” and “degraded” behavior.

AI agents and coding assistants can then be attached through retrieval‑augmented generation (RAG), embedding indexes, and code understanding tools that process these layers as a unified context graph.

Design the knowledge architecture before tooling

Before you wire any AI assistant into your IDE or CI pipeline, you need a knowledge architecture that decides what “context” actually means in your shop. Experience from legacy modernisation programs shows that teams that treat this as an architecture problem, not a documentation afterthought, get faster and safer outcomes.

Two design principles have emerged as critical:

  1. Featurecentric, not repocentric organization
    Organizing knowledge around business features and domains (e.g., “Policy Issuance”, “Shipment Booking”) rather than repository boundaries makes AI far better at mapping code to business intent. This aligns with domain‑driven design and lets assistants reason about end-to-end workflows across services, UIs, and batch jobs.
  1. Simulated monorepo visibility
    Even when you cannot physically merge repos, building a “simulated monorepo” view for AI by aggregating multiple codebases, configs, and docs into a single context index dramatically improves dependency understanding. AI‑powered tools that map call graphs and data flows across hundreds of services rely on this aggregated view to produce accurate “blast‑radius” analyses for code changes.

From there you define your knowledge entities (service, feature, incident, runbook, config, schema, dashboard) and relationships (service‑uses‑schema, incident‑references‑service, runbook‑resolves‑incident‑type) so they can be indexed and retrieved in a consistent way.

Structuring repositories: from chaos to context

A practical way to expose knowledge layers to AI is to standardize the filesystem layout across legacy systems and surface crosslinks in machine‑readable form. Even if your code lives in multiple repos, each repo (or simulated mono‑view) should follow a predictable structure.

A typical pattern for a legacy modernisation workspace:

/legacy-platform/ 

  ├── /services/ 

    │ ├── /billing-service/ 

      │ │ ├── src/ 

      │ │ ├── tests/ 

     │ │ ├── docs/ 

        │ │ │ ├── feature-maps/ 

        │ │ │ ├── adr/ 

     │ │ ├── runbooks/ 

     │ │ ├── config/ 

     │ │ ├── incidents/ 

    /inventory-batch/ 

     │ ├── src  

     │ ├── tests  

     │ ├── docs  

     │ └── runbooks  
 

 ├── /infra/ 

│ ├──    terraform/ 

    │ ├──  k8s/ 

    │ ├──  legacy-hosts/ 

  /global-docs/ 

    │ ├──  domain-model/ 

    │ ├──  context-maps/ 

    │ ├──  glossary/ 

  /observability/ 

    │ ├──  dashboards/ 

   │ ├──   alerts/ 

    │ ├──  slo/ 

The key is to make relationships explicit and queryable:

  • Each service has a service.yml (or similar) that records owners, dependencies, primary features, critical runbooks, and key dashboards with stable IDs.
  • Blast radius metadata allows AI-assisted tooling to prioritize review depth, test selection, and rollout strategy based on the potential business impact of a change, not just the static dependency graph
  • Feature maps link business capabilities to services, APIs, UIs, and batch jobs via structured metadata files, not just wiki pages.
  • Incident records and runbooks live next to the services they affect, with IDs that can be joined in a knowledge index or graph database.
services/billing-service/service.yml 

apiVersion: platform.example.com/v1  

kind: ServiceMetadata  

metadata:  

name: billing-service  

labels:  

domain: payments  

tier: tier-1  

language: java  

annotations:  

repoUrl: https://git.example.com/legacy/billing-service  

runbookUrl: https://wiki.example.com/runbooks/billing-db-failure dashboardUrl: https://grafana.example.com/d/abc123/billing  

spec:  

# NEW: explicit blast radius metadata for change risk  

blast_radius:  

level: high  

label: high_impact_on_settlement_engine  

rationale: >  

Failures or regressions can block settlement, revenue recognition,  

and regulatory reporting for all card transactions. 

description: >  

Handles invoicing, payment capture, and refunds for all customer orders.  

owner:  

team: payments-squad  

slack_channel: "#team-payments"  

email: payments-squad@example.com  

tier: tier-1  

lifecycle: production  

dependencies:  

- service: customer-service  

type: hard  

description: Reads customer billing addresses  

- service: payment-gateway  

type: hard  

description: Charges credit cards  

slos:  

- name: availability  

target: 99.9  

window: 30d  

indicator: http_availability  

deployments:  

- environment: production  

cluster: prod-cluster-1  

namespace: billing  

replicas: 4 

 

This structure enables code understanding tools and RAG pipelines to retrieve not just nearby code, but the right runbooks, incidents, and architectural context when answering an AI query.

Documenting intent, not just syntax

Most legacy systems suffer from documentation debt: if docs exist, they usually explain “how the function works” at best, not “why this behavior is necessary for the business.” For AI assistants, this is fatal models are already good at reconstructing local syntax, but they cannot infer tacit business rules without explicit hints.

Effective intent‑oriented documentation for AI has several layers:

1. Functional specifications derived from code: AI‑assisted reverse engineering tools can read sprawling legacy codebases, identify architectures and inter‑module dependencies, and generate functional specifications that turn “black box” code into blueprints. These specs should describe preconditions, postconditions, side effects, and business invariants in domain language, not just parameter lists.

2. Characterization tests as executable intent: For legacy modules, "characterization tests" that capture current behavior are often more important than ideal unit tests. Using AI to propose initial test skeletons and documentation, then validating them with these tests, creates an executable contract that both humans and AI can rely on when refactoring. These tests effectively serve as the primary defensive layer when an AI assistant suggests a refactor, ensuring that any proposed change preserves existing, business‑critical behavior

3. Architecture Decision Records (ADRs): ADRs encode why certain design choices were made (e.g., “Why this batch job runs at 02:00,” “Why currency conversion is cached in memory”). When indexed, they give AI assistants high value signals about constraints like regulatory requirements, performance tradeoffs, or historical incidents that shaped the current design.

Characterization test example

Characterization tests capture current (often messy) behavior, so refactoring and AI suggestions can be validated against real expectations.

# tests/test_legacy_discount_engine.py  

import pytest  

from legacy.discount_engine import calculate_discount 

@pytest.mark.parametrize( 

"customer_tier, cart_total, expected", 

[  

("GOLD", 100.00, 0.15), # 15% for GOLD customers  

("SILVER", 100.00, 0.10), # 10% for SILVER  

("BRONZE", 100.00, 0.05), # 5% default  

("GOLD", 49.99, 0.00), # No discount below 50  

], 

) def test_calculate_discount_characterization(customer_tier, cart_total, expected):  

"""  

Characterization tests: document the current behavior of the legacy discount engine before any refactoring.  

"""  

result = calculate_discount(customer_tier, cart_total)  

assert result == expected 

4. Scenario driven narratives: Short narratives describing real workflows (“how an insurance claim passes through the system”) anchored with sequence diagrams and links to code, APIs, and data schemas help AI map textual descriptions to implementation artifacts. These are especially useful when paired with replay or video‑to‑code tools that extract flows from real user sessions.

When you feed these layers into your RAG index, prompts can ask the AI not only “What does this function do?” but “Is this change consistent with historical intent and documented invariants for this feature?

Turning runbooks into AI-friendly operational knowledge

Runbooks are where organizations already write down procedures, but they are often unstructured, out of date, or spread across wikis and PDFs. Modern incident management best practices treat runbooks as living, actionable documents with clear steps, decision points, verification checks, and links to tooling.

For AI, runbooks should:

  • Use consistent templates with sections like “Verification steps”, “Initial response”, “Recovery options”, “Rollback”, and “Postfix checks.”
  • Include concrete commands, API calls, dashboards, and logs by stable IDs rather than screenshots, enabling machine parsing and linking.
  • Reference related incidents with IDs and short summaries (“Similar to INCIDENT‑2022‑134 – connection pool exhaustion in billing DB”).

An example runbook structure that works well for both humans and AI:

#DATABASE_FAILURE_RUNBOOK 

 

service_id: billing-service  

severity: P1  

primary_dashboards: [grafana:billing-db-latency, grafana:billing-db-errors] related_incidents: [INC-2024-0321, INC-2023-1107] 

 

##Verification 

Check error rate on grafana:billing-db-errors 

Run kubectl logs on pods with label app=billing-api 

Validate connectivity from app pods to DB_BILLING via nc command 

##Initial Response 

If connection pool exhaustion suspected, verify max_connections vs current usage 

If lock contention, inspect pg_locks for long-running transactions 

##Recovery Options 

Temporarily increase pool size (see config path config/db/pool.yml) 

Scale read replicas (run script scripts/scale_billing_replicas.sh) 

##Verification 

Ensure error rate < 0.1% for 15 minutes 

Validate no data inconsistencies in reconciliation dashboard 

 

With structured templates like this, AI agents can automatically suggest candidate runbooks when an incident occurs and, more importantly, map operational patterns back to code and configuration hotspots.

Structuring incident logs so AI learns from failures

Incident records are a gold mine of implicit business and system behavior—but only if they are recorded in a consistent, machine-readable format. DevOps incident management guidance recommends chronological timelines with standardized fields (timestamp, actor, action, result) and clear severity classifications.

Best practices for AI‑ready incident logging:

  • Chronological timelines capturing alerts, investigation steps, major decision points, mitigation actions, and resolution milestones.
  • Normalized fields like service_id, environment, impact_scope, root_cause_code, and config_version so incidents can be joined to services, deployments, and configs.
  • Linkage to artifacts such as runbooks, dashboards, and code changes (commit SHAs, pull request IDs) to anchor freeform descriptions into the technical graph.

Incident event JSON schema

{ 

  "$id": "https://example.com/schemas/incident-event.schema.json", 

  "type": "object", 

  "required": ["id", "incident_id", "timestamp", "source", "type", "description"], 

  "properties": { 

    "id": { "type": "string" }, 

    "incident_id": { "type": "string" }, 

    "timestamp": { "type": "string", "format": "date-time" }, 

    "source": { 

      "type": "string", 

      "enum": ["monitoring", "deployment", "chat", "status_page", "runbook", "manual"] 

    }, 

    "type": { 

      "type": "string", 

      "enum": [ 

        "alert_fired", 

        "alert_acknowledged", 

        "alert_resolved", 

        "deployment_started", 

        "deployment_completed", 

        "rollback_initiated", 

        "escalation", 

        "action_taken", 

        "decision_made", 

        "root_cause_identified" 

      ] 

    }, 

    "actor": { 

      "type": "object", 

      "required": ["type", "id"], 

      "properties": { 

        "type": { "type": "string", "enum": ["human", "system", "automation"] }, 

        "id": { "type": "string" }, 

        "name": { "type": "string" } 

      } 

    }, 

    "description": { "type": "string" }, 

    "metadata": { "type": "object" }, 

    "confidence": { 

      "type": "string", 

      "enum": ["automated", "manual", "inferred"], 

      "default": "manual" 

    } 

  } 

} 

Timeline event types (TypeScript)

// incident-timeline-collector.ts 

export type EventSource = 

  | "monitoring" 

  | "deployment" 

  | "chat" 

  | "status_page" 

  | "runbook" 

  | "manual"; 

  

export type EventType = 

  | "alert_fired" 

  | "alert_acknowledged" 

  | "alert_resolved" 

  | "deployment_started" 

  | "deployment_completed" 

  | "rollback_initiated" 

  | "escalation" 

  | "action_taken" 

  | "decision_made" 

  | "root_cause_identified"; 

  

export interface TimelineEvent { 

  id: string; 

  incidentId: string; 

  timestamp: Date; 

  source: EventSource; 

  type: EventType; 

  description: string; 

  actor?: { 

    type: "human" | "system" | "automation"; 

    id: string; 

    name: string; 

  }; 

  metadata: Record<string, unknown>; 

  confidence: "automated" | "manual" | "inferred"; 

} 

An example timeline entry format that tools and AI can both consume:

[2025-11-09T15:23:00Z] actor=db-team  

action=identified_connection_pool_exhaustion result=increased_pool_limit_by_50_percent  

service_id=billing-service  

incident_id=INC-2025-1109 

Over time, AI agents can analyze hundreds of such incidents to detect recurrent failure modes, predict probable root causes for new alerts, and suggest safer defaults or refactoring candidates in the code.

Wiring everything into AI: RAG and context engines

Once code, docs, runbooks, and incidents are structured, you can expose them to AI coding assistants and operator copilots via retrieval augmented generation (RAG) and context engines. RAG integrates your actual knowledge sources repositories, wikis, schema docs, incident KBs so AI uses your truth instead of guessing from generic training data.

A typical architecture for legacy environments:

1. Ingestion pipelines

  • Parse code, configs, commit history, and test files into ASTs and embedding vectors.
  • Ingest Markdown/Confluence docs, ADRs, runbooks, and incident timelines into a document store with semantic search.
  • Index observability data (dashboard definitions, alert rules, SLOs) by service and feature IDs.

2. Context engine / knowledge graph

  • Persist relationships (service → incident → runbook → code paths) in a graph or relational model for fast neighborhood queries.
  • Expose APIs so AI agents can ask “What’s the blast radius of changing this function?” or “What incidents has this config touched?” and get structured results.

3. AI assistant integration

  • IDE plugins that augment code completion with relevant ADRs, incident summaries, and runbook snippets for the active symbol or file.
  • Chat interfaces that let developers ask system level questions (“Why does this job run at 02:00?”) and receive synthesized answers citing documentation, code, and incidents.
  • CI/CD checks where AI reviews pull requests with contextual awareness of historical incidents and tests likely to be affected.

Simple ingestion script example (Python) \

# ingest_legacy_service.py 
import glob 
from pathlib import Path 
 
from langchain_community.document_loaders import TextLoader 
from langchain_text_splitters import RecursiveCharacterTextSplitter 
from langchain_community.vectorstores import FAISS 
from langchain_openai import OpenAIEmbeddings 
 
ROOT = Path("legacy-platform/services/billing-service") 
 
def load_documents(): 
    patterns = [ 
        "src/**/*.java", 
        "docs/**/*.md", 
        "runbooks/**/*.md", 
        "incidents/**/*.md", 
        "config/**/*.yml", 
    ] 
    for pattern in patterns: 
        for path in glob.glob(str(ROOT / pattern), recursive=True): 
            yield TextLoader(path).load()[0] 
 
docs = list(load_documents()) 
splitter = RecursiveCharacterTextSplitter( 
    chunk_size=1500, 
    chunk_overlap=200, 
    separators=["\n## ", "\n# ", "\n\n", "\n"] 
) 
chunks = splitter.split_documents(docs) 
 
vector_store = FAISS.from_documents(chunks, OpenAIEmbeddings()) 
vector_store.save_local("indexes/billing-service") 

 

Retrieval + answer synthesis example

# answer_with_context.py 
from langchain_openai import ChatOpenAI 
from langchain.chains import RetrievalQA 
from langchain_community.vectorstores import FAISS 
from langchain_openai import OpenAIEmbeddings 
 
vector_store = FAISS.load_local( 
    "indexes/billing-service", 
    OpenAIEmbeddings(), 
    allow_dangerous_deserialization=True, 
) 
retriever = vector_store.as_retriever(search_kwargs={"k": 5}) 
 
llm = ChatOpenAI(model="gpt-4.1") 
 
qa = RetrievalQA.from_chain_type( 
    llm=llm, 
    retriever=retriever, 
    chain_type="stuff", 
    verbose=False, 
) 
 
question = "Is it safe to change the discount rules for GOLD customers?" 
print(qa.run(question)) 

 

Cross industry IT / DevOps patterns

The same knowledge layer architecture applies across industries—even though the tech stacks and regulatory constraints differ.

  • Banking / Insurance (mainframes, COBOL, core systems): AI tools can reverse‑engineer COBOL or mainframe jobs into functional specs and domain models, while incident histories around batch cut‑offs and reconciliation failures teach AI which flows are business critical and time‑sensitive. Feature‑centric maps link legacy transactions to newer APIs, allowing safe incremental wrapping and modernisation.
  • Manufacturing / Industrial (MES, SCADA, OT): Knowledge layers combine PLC or SCADA configuration, historian data, and plant incident logs (e.g., line stoppages, quality excursions) with runbooks for on‑site engineers. AI coding assistants use this context to generate safer edge gateway code, digital twin integrations, and predictive maintenance logic without breaking hard real‑time constraints.
  • Telecom / Utilities (BSS/OSS, NMS): Here the knowledge graph connects NMS alarms, topology, customer impact, and ticket histories with scripts and orchestrations that perform remediation. AI copilots can recommend changes to provisioning workflows or fault management logic that respect historical scaling quirks and regulatory SLAs.

In all cases, the pattern is the same: AI understands the system only to the extent that your technical knowledge layers are rich, connected, and machine-readable.

Implementation roadmap: from legacy to AI-ready

Organizations that succeed with AI assisted legacy development usually follow a phased roadmap.

1. Assessment and architecture mapping

  • Use AI‑driven tools to analyze codebases, configs, and dependencies, producing an architecture map and initial documentation.
  • Identify high risk modules and code “black boxes” that need functional specs and characterization tests first.

2. Standardize structure and templates

  • Normalize repository layout, service metadata files, runbook templates, and incident log formats across teams.
  • Introduce ADRs and feature maps for new changes, gradually backfilling for critical legacy areas.

3. Build the knowledge layer and RAG index

  • Stand up ingestion pipelines and a context engine that can join code, docs, incidents, and observability configs.
  • Tune retrieval (chunking, ranking, semantic filters) so that AI queries surface the most relevant, high signal artifacts rather than random wiki pages.

4. Integrate AI assistants incrementally

  • Start with low‑risk use cases like generating tests, summarizing modules, and proposing refactoring behind feature flags.
  • Move toward PR review, incident postmortem drafting, and change impact analysis once trust in the knowledge layer is established.

5. Close the loop with post incident learning

  • After each major incident, update runbooks, docs, and tests, and reindex them so AI learns from what went wrong.
  • Use quarterly trend analyses of incidents and AI suggestions to refine documentation standards and knowledge of architecture.

Over time, this creates a virtuous cycle where AI improves understanding and modernisation of legacy systems, and modernisation in turn generates cleaner artifacts that AI can leverage even more effectively.

Conclusion  

A robust AI‑assisted development strategy in legacy environments ultimately depends less on which model you choose and more on how well you’ve engineered your knowledge layers around the system. When code, configs, architecture decisions, runbooks, tests, and incident histories are structured, linked, and machine‑readable, AI assistants can reason for real business intent and operational behavior instead of merely predicting the next token. That shift from syntax completion to context‑aware decision support is what makes modernisation safer, faster, and genuinely sustainable in complex, high-risk estates.

For cross‑industry IT and DevOps teams, the path forward is clear: standardize repository layouts and metadata, treat runbooks and incident logs as first-class technical assets, and expose everything through a RAG‑powered context engine that your AI tooling can reliably query. Start with a thin slice a critical service or business capability to prove that AI can accelerate understanding and refactoring without increasing incident rates, then iteratively expand coverage across domains and platforms. Over time, each incident resolved, test added, and ADR written compounds into a richer knowledge graph, steadily improving both developer productivity and AI decision quality.

In practice, building technical knowledge layers is not just a documentation initiative; it is a core modernisation capability that turns brittle legacy stacks into systems your AI agents can safely collaborate with. Organizations that invest in this foundation today will be able to plug in new AI tools, models, and workflows tomorrow without starting from scratch, because their real competitive advantage lives in the structured understanding of their own systems

- Authored by Sonal Dwevedi & Tharun Mathew