Coming Soon

Agent Harness Framework

Enable AI agents to work effectively across multiple context windows. A two-part solution inspired by how human engineers collaborate across shifts.

Read Original Article

The Core Challenge

AI agents must work in discrete sessions, and each new session begins with no memory of what came before. Imagine a software project staffed by engineers working in shifts, where each new engineer arrives with no recollection of what happened on the previous shift.

Complex projects cannot be completed within a single context window

Two-Part Solution

Agent Architecture

A specialised prompt for the first run, and incremental progress thereafter

Initialiser Agent

First run only

Setup

Sets up the environment with all necessary context for future sessions:

Creates init.sh script for environment setup
Writes claude-progress.txt for tracking
Generates feature_list.json with all requirements
Makes initial git commit with scaffolding

Coding Agent

Every subsequent run

Execute

Makes incremental progress whilst leaving clean state:

Works on ONE feature at a time
Commits progress with descriptive messages
Updates progress file with summaries
Tests features end-to-end before marking done

Session Workflow

How Each Session Starts

A structured approach to getting up to speed quickly

Run pwdSee working directory

Read git logsCheck recent work

Read featuresChoose priority task

Run init.shStart dev server

Test basicsVerify functionality

Build featureMake progress

Problem Solving

Failure Modes & Solutions

How the harness addresses common agent behaviours

Problem	Initialiser Agent	Coding Agent
Agent declares victory too early	Set up feature list file with all requirements marked as failing	Read feature list at session start, choose a single feature to work on
Agent leaves environment in broken state	Create initial git repo and progress notes file	Read progress notes and git logs, test dev server before starting work
Agent marks features done prematurely	Set up structured feature list with pass/fail status	Self-verify all features with end-to-end testing before marking complete
Agent spends time figuring out how to run the app	Write an init.sh script that starts the development server	Read init.sh at session start to quickly bootstrap environment

Example Output

Typical Session Start

What the agent does when beginning a new context window

session_output.log

[Assistant] I'll start by getting my bearings and understanding the current state.
[Tool Use] <bash - pwd>
[Tool Use] <read - claude-progress.txt>
[Tool Use] <read - feature_list.json>
[Assistant] Let me check the git log to see recent work.
[Tool Use] <bash - git log --oneline -20>
[Assistant] Now let me check if there's an init.sh script.
<Starts the development server>
[Assistant] Let me navigate to the application and verify fundamental features.
<Tests basic functionality>
[Assistant] Fundamental functionality is working. Now let me review the tests.json file.
<Starts work on a new feature>

Roadmap

What's Coming

Features we're building to bring Agent Harness to this platform

Configure Your Harness

Define custom initialiser prompts and coding agent behaviours tailored to your workflow.

In Development

Feature List Builder

Visual editor for creating structured feature requirements with pass/fail tracking.

Planned

Automated Testing

Integrate browser automation and unit tests for end-to-end feature verification.

Research

Agentic Intelligence

Beyond Traditional RAG: Agentic Retrieval

Autonomous AI agents that reason, plan, and adapt. Our agentic harness transforms simple retrieval into intelligent, multi-step problem-solving.

Planning & Memory

Tools & Verification

Self-Correction Loop

Multi-Step Reasoning

Autonomous planning and decomposition of complex queries into manageable sub-tasks, enabling sophisticated problem-solving across multiple retrieval cycles.

Tool & Function Calling

Dynamic integration with external tools, APIs, and databases. The agent decides which tools to invoke based on query requirements.

Self-Correction

Continuous reflection and refinement of outputs. When initial results are insufficient, the agent iterates until quality thresholds are met.

Context-Aware Retrieval

Adaptive retrieval strategies that consider conversation history, user intent, and document relevance to deliver precise results.

What is an Agent Harness?

The agent harness is the complete architectural system surrounding an LLM—everything except the model itself. It manages the entire context lifecycle: planning queries, orchestrating tool calls, maintaining memory, and verifying outputs.

Think of it as the "operating system" for your AI agent. Whilst the LLM provides reasoning capabilities, the harness provides structure, control, and reliability.

Planning & Decomposition

Breaks complex queries into executable steps

Memory & State Management

Maintains context across conversation turns

Verification & Guardrails

Ensures output quality and safety constraints

Tool Integration Layer

Orchestrates connections to external systems

Traditional RAG vs Agentic RAG

See how agentic capabilities transform the retrieval experience from simple question-answering to intelligent problem-solving.

Traditional RAG

•
Single retrieval pass
One query, one search, one response
•
Static context window
Fixed chunk retrieval without adaptation
•
No self-correction
Cannot identify or fix retrieval failures
•
Limited to vector similarity
Cannot leverage external tools or APIs

N+

Agentic RAG

✓
Multi-step reasoning
Iterative retrieval with query refinement
✓
Dynamic context management
Adapts retrieval strategy per query
✓
Self-correction & reflection
Detects gaps and re-queries as needed
✓
Tool & function calling
Integrates databases, APIs, calculators

Use Case Example

"Analyse our Q3 performance and recommend improvements"

STEP 1

Plan & Decompose

Agent breaks query into sub-tasks: retrieve Q3 data, compare to Q2, identify patterns

STEP 2

Multi-Source Retrieval

Queries vector DB, calls SQL database, fetches external market data

STEP 3

Verify & Iterate

Checks completeness, identifies gaps in regional data, performs additional retrieval

STEP 4

Synthesise & Respond

Combines insights into actionable recommendations with cited sources

Enterprise Applications

Real-World Use Cases

Discover how enterprise organisations leverage the Agent Harness Framework to automate complex, time-intensive workflows that previously required significant manual effort.

Quarterly Financial Report Generation

2-4 Hours

Automate the creation of comprehensive Q1-Q4 financial reports by leveraging your organisation's RAG database containing historical financial data, market analysis, and company performance metrics.

How It Works

Initialiser agent scans RAG database for relevant financial documents
Coding agent generates structured report sections incrementally
Progress file tracks completed sections with verification status
Final output includes executive summary, charts, and recommendations

Key Benefits

Reduces report generation time from 2 weeks to 4 hours
Ensures consistency across quarterly reports
Automatic cross-referencing of data sources
Git-tracked revisions for audit trail compliance

Legacy Codebase Modernisation

8-24 Hours

Transform legacy systems written in COBOL, Java 6, or older frameworks into modern, maintainable codebases. The agent systematically analyses, refactors, and tests thousands of files whilst preserving business logic.

Migration Process

Analyses existing codebase architecture and dependencies
Creates feature list mapping legacy to modern patterns
Incrementally converts modules with automated testing
Maintains backwards compatibility during transition

Transformation Capabilities

COBOL to modern Java/Python conversion
Monolith to microservices decomposition
Framework upgrades (Angular 1.x to 17+)
Database migration (Oracle to PostgreSQL)

Compliance Audit Documentation

4-8 Hours

Generate comprehensive compliance documentation for SOX, GDPR, HIPAA, and ISO 27001 audits. The agent analyses your systems, policies, and controls to produce audit-ready documentation packages.

Documentation Workflow

Scans existing policies and control documentation
Maps controls to regulatory framework requirements
Identifies gaps and generates remediation recommendations
Produces formatted audit evidence packages

Supported Frameworks

SOXGDPRHIPAAISO 27001SOC 2PCI DSSNIST CSFFedRAMP

The agent maintains updated knowledge of regulatory requirements and automatically incorporates framework updates into documentation.

Defence Safety Case Creation

6-12 Hours

Generate comprehensive safety cases compliant with Def Stan 00-056 and MOD requirements. The agent creates structured safety arguments using Goal Structuring Notation (GSN), compiles evidence, and produces documentation ready for expert review and Defence Safety Authority approval.

Safety Case Workflow

Identifies hazards and potential accidents across system scope
Generates GSN diagrams with goals, strategies, and evidence links
Compiles Hazard Log with ALARP risk assessments
Produces Safety Case Report for expert review and sign-off

Deliverables

Structured safety argument with GSN notation
Complete Hazard Log with risk mitigation evidence
Safety Case Maturity Tool (SCMT) assessment
DSA-ready documentation package

Supported Standards & Frameworks

Def Stan 00-056Def Stan 00-055GSN Community StandardALARP MethodologyMIL-STD-882IEC 61508DO-178CISO 26262

The agent maintains comprehensive knowledge of MOD safety requirements, Defence Safety Authority guidelines, and international safety standards to ensure compliant documentation for expert review.

85%

Average Time Reduction

99.2%

Task Completion Rate

24/7

Autonomous Operation

100%

Audit Trail Coverage

Ready to Transform Your Agent Workflows?

The Agent Harness Framework represents a significant step forward in enabling AI agents to tackle complex, long-running tasks. Stay tuned for our implementation.

View Quickstart Code Claude Agent SDK Docs

Based on "Effective harnesses for long-running agents" by Anthropic Engineering