AI Agents HQ

Multi-Agent Overview

AI Agents HQ is designed to coordinate three different AI coding tools — each with different strengths — on the same project. Rather than using one tool for everything, the system assigns tasks to the most appropriate tool based on what needs to be done.

The Three Tools

Claude Code (Anthropic)

The Lead Developer. Claude Code is the primary coding agent. It writes code, builds features, fixes bugs, refactors, and runs tests. It has the deepest understanding of the codebase because it can read files, execute commands, and make edits directly. Claude is assigned tasks that require writing or modifying code.

Gemini CLI (Google)

The Research & Intelligence Specialist. Gemini's strength is web search and information synthesis. It is assigned tasks that require researching APIs, finding documentation, checking for security vulnerabilities, auditing dependencies, and gathering information that other agents will act on. Gemini does not write code — it produces structured reports.

Codex CLI (OpenAI)

The QA Lead, Code Reviewer, Security Specialist & Implementation Partner. Codex reviews code written by other agents, performs security audits, and validates that implementations match specifications. With gpt-5.3-codex (latest frontier agentic coding model), Codex is also a capable implementation agent — its cross-model perspective catches different classes of bugs than Claude. It has multiple profiles optimized for different tasks (implementation, review, security audit, refactoring).

How They Collaborate

A typical workflow might look like this:

Orchestrator creates three tasks:

Task 1 (Gemini): "Research the best JWT libraries for Go, compare auth0/jwt-go vs golang-jwt/jwt"
Task 2 (Claude, blocked by Task 1): "Implement JWT authentication middleware using the recommended library"
Task 3 (Codex, blocked by Task 2): "Security review the JWT implementation"

Gemini picks up Task 1, researches both libraries, and sends findings to Claude's inbox
Claude is automatically unblocked when Task 1 completes, picks up Task 2, reads Gemini's findings from its inbox, and implements the middleware
Codex is automatically unblocked when Task 2 completes, picks up Task 3, reviews Claude's code, and reports any security issues

Each agent runs in its own session, does its task, reports completion, and exits. The orchestrator manages the flow.

Instruction Files

Each tool automatically reads a specific instruction file when it starts a session in the project:

File

Tool

Role

Key Rules

CLAUDE.md

Claude Code

Lead Developer

Build commands, artifact map, code style, architecture context

GEMINI.md

Gemini CLI

Research Specialist

Always use web search, never write code, structured output

AGENTS.md

Codex CLI

QA Lead

Review workflow, security checklist, git rules

These files are the "personality" of each agent. They tell the tool what it is responsible for, what it should and should not do, and how to interact with the rest of the system.

Claude Agents

Claude Code supports subagents — specialized configurations that run with specific models and tool access. These are defined in .claude/agents/ as markdown files.

researcher.md

terminal

Model: haiku (fast, cheap)

Tools: Glob, Grep, Read, WebSearch

Purpose: Quick codebase exploration

The researcher agent is optimized for speed. It uses Claude's cheapest model (Haiku) to quickly search through code, find relevant files, and answer questions about the codebase. It cannot edit files — only read and search. Use it when you need a fast answer about where something is defined or how something works.

When to use: "Where is the authentication middleware defined?", "How many files import the database package?", "What patterns does this codebase use for error handling?"

reviewer.md

terminal

Model: sonnet (balanced)

Tools: All tools + git diff access

Purpose: Code review with context

The reviewer agent reads git diffs, understands the changes in context, and provides structured review feedback covering correctness, security, performance, and maintainability. It uses Sonnet for a good balance between speed and quality.

When to use: After a coding agent makes changes, the reviewer examines the diff and flags issues before the changes are merged.

architect.md

terminal

Model: opus (most capable)

Tools: All tools

Purpose: System design and trade-off analysis

The architect agent is the most powerful and slowest. It analyzes the entire system, considers trade-offs, and produces detailed design documents. It uses Opus (Claude's most capable model) because architectural decisions require deep reasoning.

When to use: "Should we use microservices or a monolith?", "Design the database schema for the new feature", "What is the migration strategy for moving from REST to GraphQL?"

optimizer.md

terminal

Model: sonnet (balanced)

Tools: All tools + profiling

Purpose: Performance analysis and optimization

The optimizer agent looks for performance bottlenecks, suggests optimizations, and can implement improvements. It focuses on algorithmic complexity, memory allocation, and I/O patterns.

When to use: "This API endpoint is slow — why?", "Optimize the database queries in the reporting module", "Profile memory usage during bulk imports."

Gemini Agents

Gemini agents are defined in .gemini/agents/ and are configured through .gemini/settings.json.

Configuration

The Gemini settings file enforces a critical safety guardrail — Gemini agents cannot use write tools:

terminal

$ cat .gemini/settings.json

{

"agents": true,

"excluded_tools": ["write", "edit", "bash_execute"]

}

Note: No model is pinned — the CLI uses auto routing (Gemini 3 Pro for complex queries, Flash for simple lookups). Subagents pin explicit model versions (gemini-2.5-pro) for reproducibility.

By excluding write, edit, and bash_execute tools, Gemini agents can only read code and search the web. They cannot accidentally modify your codebase. This is intentional — Gemini's role is research, not code modification.

deep-researcher.md

A thorough research agent that investigates topics using multiple web sources and produces structured JSON output. It follows a methodology:

Define the research scope
Search multiple sources (official docs, GitHub, Stack Overflow, blog posts)
Cross-reference findings
Produce a structured report with citations

When to use: "Research the current best practices for Go error handling in 2026", "Compare the top 5 WebSocket libraries for our use case", "What breaking changes are in the next major version of our dependencies?"

api-auditor.md

A specialized agent for dependency auditing. It checks:

Whether your dependencies have known security vulnerabilities (CVEs)
Whether newer versions are available
Whether any dependencies have breaking changes in newer versions
What the migration path looks like for major version upgrades

When to use: "Audit our Go module dependencies for security issues", "Check if any of our npm packages are deprecated", "What would it take to upgrade from v2 to v3 of this library?"

Codex Profiles

Codex uses a different system — profiles defined in .codex/config.toml. Each profile configures the model, reasoning level, and permissions for a specific type of work.

All Five Profiles

Profile

Model

Reasoning

Sandbox

Best For

reviewer

gpt-5.3-codex

xhigh

read-only

Thorough code review

security-auditor

gpt-5.3-codex

xhigh

read-only

Security vulnerability scanning

git-expert

gpt-5.2-codex

high

workspace-write

Complex git operations (rebasing, cherry-picking)

refactor

gpt-5.1-codex-max

high

workspace-write

Large-scale marathon refactoring

quick-fix

gpt-5.1-codex-mini

medium

auto-approve

Small, obvious fixes

reviewer

The most thorough review profile. Uses gpt-5.3-codex (latest frontier agentic coding model, 77.3% Terminal-Bench) with maximum reasoning (xhigh) in a read-only sandbox. It cannot modify files — only analyze and report. This prevents a reviewer from "fixing" issues it finds, which would bypass the normal review-then-fix workflow.

security-auditor

Similar to reviewer but specifically focused on security. Uses gpt-5.3-codex with xhigh reasoning (77.6% Cybersec CTF benchmark). Has web search enabled so it can look up CVE databases, OWASP guidelines, and known vulnerability patterns. Produces detailed reasoning summaries explaining why something is or is not a security concern.

git-expert

For complex git operations that require careful reasoning — interactive rebases, cherry-picks across branches, conflict resolution. Uses gpt-5.2-codex (solid mid-tier agentic model). Has workspace-write permission because it needs to actually perform git operations.

refactor

For large refactoring tasks that touch many files. Uses gpt-5.1-codex-max — a compaction model designed for multi-hour, multi-context-window marathon sessions. This profile trades review safety for sustained execution capability across large changesets.

quick-fix

For small, obvious fixes like typos, missing imports, or simple bug fixes. Uses gpt-5.1-codex-mini (economy tier, codex-optimized) with medium reasoning and auto-approve enabled. This is the "just fix it" mode — fast and low-cost, suitable for trivial issues where review is not needed.

Skills

Skills are reusable capability templates that teach agents how to perform specific types of work. Think of them as SOPs (Standard Operating Procedures) for AI agents. Each skill is a markdown file that describes a methodology the agent should follow.

Claude Skills (.claude/skills/)

Research Skill

A 4-phase methodology: (1) Define scope and constraints, (2) Gather information from multiple sources, (3) Analyze and cross-reference, (4) Produce structured output. Prevents agents from jumping to conclusions based on the first result they find.

Refactor Skill

A safe refactoring protocol: (1) Run existing tests to establish baseline, (2) Make changes in small, testable increments, (3) Run tests after each change, (4) If any test fails, revert immediately. Prevents refactoring from introducing regressions.

Test Writer Skill

Follows the AAA (Arrange-Act-Assert) pattern. Prioritizes: (1) Happy path, (2) Edge cases, (3) Error cases, (4) Concurrency cases. Includes coverage targets and guidance on what is worth testing versus what is overtesting.

Code Review Skill

A structured checklist covering 5 areas: correctness (does it work?), security (is it safe?), performance (is it fast enough?), maintainability (can someone else understand it?), testing (is it tested?). Ensures consistent, thorough reviews.

Team Protocol Skill

The coordination protocol itself — how to read inboxes, claim tasks, do work, report completion, and send messages. This is the skill that teaches agents how to be part of a team.

Gemini Skills (.gemini/skills/)

Deep Research Skill

A more thorough version of the research skill, designed for Gemini's strengths. Includes citation tracking, source reliability assessment, and structured JSON output format. Gemini researchers always cite their sources.

API Audit Skill

A dependency auditing methodology: check for CVEs, check for deprecation notices, check for breaking changes in newer versions, and produce a migration guide for any upgrades. Uses Gemini's web search to access real-time vulnerability databases.

Team Protocol Skill (Gemini)

The Gemini-specific version of the coordination protocol. Identical core contract (one task per session, hq commands only) but adapted for Gemini's research-only role — Gemini reports findings to inboxes rather than making code changes.

Shared Templates (agents/ and skills/)

The agents/ and skills/ directories at the project root contain cross-tool templates — the source of truth that tool-specific versions are derived from. If you want to add a new skill or agent role, you write the template here first, then create tool-specific adaptations.

There are 5 agent templates (researcher, architect, coder, reviewer, optimizer) and 5 skill templates (research, refactor, test-writer, code-review, team-protocol) that document:

What the role/skill is for
What capabilities it requires
How each CLI tool implements it differently

Team Protocol

The team protocol is the set of rules that all agents follow regardless of which tool they are running on. It is the "contract" that makes multi-tool collaboration possible.

Core Rules

The Single-Task Model

Agents do not loop. Each agent session picks up exactly one task, does the work, reports completion, and exits. The orchestrator is responsible for spawning new sessions as needed. This keeps things simple and predictable.

All state mutations go through hq commands. Agents never read or write JSON files directly. This ensures locking, CAS, and idempotency are always applied.

Idempotency keys use a deterministic format: {task_id}-{agent_name}-{action}. This means if an agent restarts and re-sends a command, the key will be the same and the duplicate will be detected.

Protocol version must match. Every state-changing command includes --protocol-version. If the protocol changes, old agents are rejected with a clear error instead of silently doing the wrong thing.

One task per session. An agent claims one task, does the work, and reports. It does not loop looking for more tasks. The orchestrator decides what to do next.

The Standard Workflow

Every agent session follows this pattern:

01 Read Inbox

02 Claim Task

03 Do Work

04 Complete/Fail

05 Report to Inbox

06 Exit

In CLI commands:

terminal

# 1. Check for messages

$ hq inbox read --team {TEAM} --agent {AGENT}

# 2. Claim a task

$ hq task claim --team {TEAM} --task {ID} --agent {AGENT} --tool {TOOL} --protocol-version 2

# 3. Do the actual work (write code, research, review, etc.)

# ... (agent-specific work happens here) ...

# 4. Report completion

$ hq task complete --team {TEAM} --task {ID} --agent {AGENT} --summary "..." --protocol-version 2 --idempotency-key "{ID}-{AGENT}-complete"

# 5. Notify other agents

$ hq inbox send --team {TEAM} --to {LEAD} --from {AGENT} --type task_completed --protocol-version 2 --idempotency-key "{ID}-{AGENT}-report" --payload '{...}'

This is the same workflow regardless of whether the agent is Claude writing code, Gemini doing research, or Codex reviewing changes. The only difference is what happens in step 3.