Enable AI agents to work effectively across multiple context windows. A two-part solution inspired by how human engineers collaborate across shifts.
AI agents must work in discrete sessions, and each new session begins with no memory of what came before. Imagine a software project staffed by engineers working in shifts, where each new engineer arrives with no recollection of what happened on the previous shift.
A specialised prompt for the first run, and incremental progress thereafter
First run only
Sets up the environment with all necessary context for future sessions:
Every subsequent run
Makes incremental progress whilst leaving clean state:
A structured approach to getting up to speed quickly
How the harness addresses common agent behaviours
Problem | Initialiser Agent | Coding Agent |
|---|---|---|
| Agent declares victory too early | Set up feature list file with all requirements marked as failing | Read feature list at session start, choose a single feature to work on |
| Agent leaves environment in broken state | Create initial git repo and progress notes file | Read progress notes and git logs, test dev server before starting work |
| Agent marks features done prematurely | Set up structured feature list with pass/fail status | Self-verify all features with end-to-end testing before marking complete |
| Agent spends time figuring out how to run the app | Write an init.sh script that starts the development server | Read init.sh at session start to quickly bootstrap environment |
What the agent does when beginning a new context window
[Assistant] I'll start by getting my bearings and understanding the current state.
[Tool Use] <bash - pwd>
[Tool Use] <read - claude-progress.txt>
[Tool Use] <read - feature_list.json>
[Assistant] Let me check the git log to see recent work.
[Tool Use] <bash - git log --oneline -20>
[Assistant] Now let me check if there's an init.sh script.
<Starts the development server>
[Assistant] Let me navigate to the application and verify fundamental features.
<Tests basic functionality>
[Assistant] Fundamental functionality is working. Now let me review the tests.json file.
<Starts work on a new feature>
Features we're building to bring Agent Harness to this platform
Define custom initialiser prompts and coding agent behaviours tailored to your workflow.
In DevelopmentVisual editor for creating structured feature requirements with pass/fail tracking.
PlannedIntegrate browser automation and unit tests for end-to-end feature verification.
ResearchAutonomous AI agents that reason, plan, and adapt. Our agentic harness transforms simple retrieval into intelligent, multi-step problem-solving.
Autonomous planning and decomposition of complex queries into manageable sub-tasks, enabling sophisticated problem-solving across multiple retrieval cycles.
Dynamic integration with external tools, APIs, and databases. The agent decides which tools to invoke based on query requirements.
Continuous reflection and refinement of outputs. When initial results are insufficient, the agent iterates until quality thresholds are met.
Adaptive retrieval strategies that consider conversation history, user intent, and document relevance to deliver precise results.
The agent harness is the complete architectural system surrounding an LLM—everything except the model itself. It manages the entire context lifecycle: planning queries, orchestrating tool calls, maintaining memory, and verifying outputs.
Think of it as the "operating system" for your AI agent. Whilst the LLM provides reasoning capabilities, the harness provides structure, control, and reliability.
Breaks complex queries into executable steps
Maintains context across conversation turns
Ensures output quality and safety constraints
Orchestrates connections to external systems
See how agentic capabilities transform the retrieval experience from simple question-answering to intelligent problem-solving.
One query, one search, one response
Fixed chunk retrieval without adaptation
Cannot identify or fix retrieval failures
Cannot leverage external tools or APIs
Iterative retrieval with query refinement
Adapts retrieval strategy per query
Detects gaps and re-queries as needed
Integrates databases, APIs, calculators
Agent breaks query into sub-tasks: retrieve Q3 data, compare to Q2, identify patterns
Queries vector DB, calls SQL database, fetches external market data
Checks completeness, identifies gaps in regional data, performs additional retrieval
Combines insights into actionable recommendations with cited sources
Discover how enterprise organisations leverage the Agent Harness Framework to automate complex, time-intensive workflows that previously required significant manual effort.
Automate the creation of comprehensive Q1-Q4 financial reports by leveraging your organisation's RAG database containing historical financial data, market analysis, and company performance metrics.
Transform legacy systems written in COBOL, Java 6, or older frameworks into modern, maintainable codebases. The agent systematically analyses, refactors, and tests thousands of files whilst preserving business logic.
Generate comprehensive compliance documentation for SOX, GDPR, HIPAA, and ISO 27001 audits. The agent analyses your systems, policies, and controls to produce audit-ready documentation packages.
The agent maintains updated knowledge of regulatory requirements and automatically incorporates framework updates into documentation.
Generate comprehensive safety cases compliant with Def Stan 00-056 and MOD requirements. The agent creates structured safety arguments using Goal Structuring Notation (GSN), compiles evidence, and produces documentation ready for expert review and Defence Safety Authority approval.
The agent maintains comprehensive knowledge of MOD safety requirements, Defence Safety Authority guidelines, and international safety standards to ensure compliant documentation for expert review.
Average Time Reduction
Task Completion Rate
Autonomous Operation
Audit Trail Coverage
The Agent Harness Framework represents a significant step forward in enabling AI agents to tackle complex, long-running tasks. Stay tuned for our implementation.
Based on "Effective harnesses for long-running agents" by Anthropic Engineering