Project Overview
This page dives into the codebase itself — how the Go code is organized, what each package does, and how the layers connect. If you want to understand the code well enough to modify or extend it, this is where to start.
AI Agents HQ is written in Go (also called Golang), a programming language created by Google that is popular for building command-line tools and backend systems. Go was chosen for this project because it has excellent support for concurrency (running multiple things at once), compiles to a single binary with no dependencies, and has a straightforward standard library.
Project Structure
Here is the complete directory layout with explanations of what each file and directory does:
Why internal/?
In Go, the internal/ directory has special meaning. Code inside internal/ can only be imported by code within the same module — external projects cannot use it. This is intentional: the protocol and storage packages are implementation details of the hq CLI. If we ever publish this as a library, we would move stable APIs to a public package.
Why cmd/hq/?
The cmd/ directory is a Go convention for applications. Each subdirectory under cmd/ becomes a separate binary. cmd/hq/ compiles into the hq executable. If we added more tools later (like a monitoring daemon), they would go in cmd/hq-monitor/ or similar.
Protocol Layer
The protocol layer (internal/teams/protocol/) defines what the data looks like — the data structures, constants, and validation rules. It does not do any file I/O. Think of it as the "schema" or "contract" that everything else agrees on.
constants.go — The Frozen Contract
This file contains values that must never change (or change very carefully) because agents depend on them:
Exit Codes:
ExitSuccessExitCASConflictExitStaleLeaseExitProtocolMismatchExitValidationErrorExitPermissionDeniedTask Statuses:
Six possible states: pending, in_progress, completed, failed, blocked, escalated. These are defined as a Go string type called TaskStatus so the compiler can catch typos.
State Reasons:
When a task is in a non-happy state, the stateReason field explains why: dependency_pending, mcp_unavailable, lease_expired, budget_exceeded, agent_error, human_escalation.
Inbox Event Types:
Five types of inter-agent messages: task_completed, review_approved, research_findings, error_report, ad_hoc_request.
Timing Constants:
DefaultLeaseTTL: 30 minutes (in milliseconds: 1,800,000)HeartbeatInterval: 10 minutes (in milliseconds: 600,000)MaxFailureCount: 3 (triggers auto-escalation)
task_schema.go — The Task Data Structure
This file defines the Task struct (Go's version of a class/object) with 24 fields (including summary and failureDetail added during hardening). Every task JSON file on disk matches this structure exactly.
Key functions:
NewTask(id, tool, subject)— Creates a new task with sensible defaults (status=pending, version=1, empty arrays for blockedBy/blocks, current timestamp)MarshalTask(t)— Converts a Task struct to pretty-printed JSON bytesUnmarshalTask(data)— Parses JSON bytes into a Task struct, with validationValidateTask(t)— Checks that all required fields are present and validCheckCASVersion(t, expected)— Returns an error if the version does not match (used for conflict detection)IsLeaseValid(t)— Checks if the lease has not expiredIsLeaseHeldBy(t, agent)— Checks if a specific agent holds the leaseSetLease(t, agent)— Sets a 30-minute lease for the given agentClearLease(t)— Removes the lease (on complete or fail)IncrementVersion(t)— Bumps the version number and updates the timestampIsReady(t)— Returns true if the task is pending with no blockers
inbox_schema.go — The Inbox Data Structure
This file defines InboxEvent (a single message) and Inbox (a wrapper around an array of events).
Key functions:
NewInbox(agent)— Creates an empty inbox for an agentAppendEvent(inbox, ...)— Adds a new event, checking for duplicate idempotency keysEventsSince(inbox, eventID)— Returns only events newer than the given IDMarshalInbox(inbox)/UnmarshalInbox(data)— JSON serialization
The inbox is append-only — events are never removed or modified. This is a deliberate design choice that creates an audit trail and simplifies concurrency (you only ever add to the end of the array).
Storage Layer
The storage layer (internal/teams/storage/) handles reading and writing files safely. It builds on the protocol layer (uses its data structures) and adds file locking, atomic writes, and persistent idempotency checking.
lockfile.go — File Locking
Uses the flock(2) system call (a POSIX standard) to create advisory locks. When you lock a file, any other process trying to lock the same file will block (wait) until you unlock it.
How it works:
- To lock
task-1.json, the system creates a lock file attask-1.json.lock - It opens this lock file and calls
flock(fd, LOCK_EX)— an exclusive lock - While the lock is held, any other process calling
flockon the same file will wait - When the operation is done, the system calls
flock(fd, LOCK_UN)to release
The WithFileLock(path, fn) helper wraps this pattern: acquire lock, run your function, release lock. If the function panics or returns an error, the lock is still released (using Go's defer mechanism).
atomic_write.go — Safe File Writing
The AtomicWrite(path, data) function ensures that file writes are all-or-nothing:
- Creates a temporary file in the same directory as the target (e.g.,
task-1.json.tmp12345) - Writes all data to the temp file
- Calls
fsync— this forces the operating system to flush data from memory buffers to the physical disk - Renames the temp file to the target path — this is an atomic operation (it either fully succeeds or does not happen at all)
Why this matters: if you write directly to a file and the system crashes mid-write, you end up with a half-written, corrupted file. With the atomic write pattern, you either have the old complete file (if the crash happened before rename) or the new complete file (if the rename completed). Never a corrupted file.
Additional functions:
ReadFile(path)— Reads a file and checks that its permissions are 0600 (only the owner can read/write)EnsureDir(path)— Creates a directory with 0700 permissions if it does not exist
idempotency.go — Duplicate Detection
The IdempotencyStore is a simple JSON file that stores a list of seen idempotency keys:
Two operations:
CheckAndSet(key)— Atomically checks if the key exists and adds it if not. Returnstrueif it was a duplicate,falseif it was new. Uses file locking to ensure two concurrent calls with the same key cannot both return "new."HasKey(key)— Checks without adding (read-only).
task_store.go — Task CRUD
The TaskStore manages task JSON files on disk:
Save(task)— Writes a task to{baseDir}/tasks/{team}/{id}.jsonusing atomic writeLoad(id)— Reads and parses a task fileList()— Scans the team's task directory, reads all task files, and returns them sorted by IDNextID()— Scans existing tasks and returns max ID + 1UpdateWithLock(id, expectedVersion, fn)— The most important function. It:
- Acquires a file lock on the task
- Reads the current task from disk
- Checks that the version matches
expectedVersion(CAS check) - Calls your function
fnto modify the task - Increments the version
- Writes the updated task back to disk
- Releases the lock
This read-modify-write pattern with CAS is the foundation of all task mutations.
inbox_store.go — Inbox Operations
The InboxStore manages inbox JSON files:
Load(agent)— Reads an agent's inbox, or returns an empty inbox if the file/directory does not existSend(to, from, eventType, protocolVersion, idempotencyKey, payload)— Appends a new event to the recipient's inbox with deduplicationRead(agent, sinceEventID)— Returns all events (or only those aftersinceEventID)
CLI Layer
The CLI layer (cmd/hq/) ties everything together. It parses command-line arguments, calls the storage and protocol layers, and outputs JSON results.
main.go — Entry Point
The entry point parses the first two arguments to determine which command to run:
The hqBaseDir() function determines where all data is stored. It resolves to ~/.hq by default.
task_cmd.go — Task Commands
Each task subcommand (create, list, claim, complete, fail, heartbeat) is a separate function that:
- Parses its flags using Go's standard
flagpackage - Validates required arguments
- Creates a
TaskStorepointing at~/.hq - Performs the operation (using
UpdateWithLockfor mutations) - Prints JSON output to stdout
- Returns the appropriate exit code
The autoUnblock function (called from runTaskComplete) is notable — it scans every task in the team and removes the completed task's ID from their blockedBy arrays, transitioning blocked tasks to pending when all blockers are resolved. Auto-unblock is best-effort: if it fails (e.g., CAS conflict on a dependent task), a warning is logged but the completion still succeeds.
command_helpers.go — Shared Utilities
This file contains shared CLI utilities used across all commands:
newFlagSet(name)/parseFlags(fs, args, command)— Standardized flag parsing with consistent error wrappingvalidateTeam(team)/validateAgent(agent)— Identifier validation usingsafeIdentifierPatternregex ([A-Za-z0-9._-]+)commandExitCode(err)— Maps sentinel errors to deterministic exit codes (0, 10-14)exitCommandError(prefix, err)— Prints error to stderr and exits with the correct codemustMarshalString(s)— Safe JSON string encoding for payloadssplitCSV(value)— Splits comma-separated flag values
integration_test.go — End-to-End Tests
The integration tests compile the hq binary, then run it as a real subprocess (like a user typing commands in a terminal). There are 27 integration tests covering task lifecycle, concurrency, error handling, and edge cases. Each test:
- Creates a temporary directory for
~/.hq - Sets the
HQ_BASE_DIRenvironment variable to the temp directory - Runs
hqcommands and checks the output and exit codes - Verifies the final state matches expectations
This catches issues that unit tests miss — like argument parsing bugs, output formatting problems, and the interaction between multiple commands in sequence.
How the Layers Connect
When an agent runs hq task claim --team demo --task 1 --agent worker --tool claude --protocol-version 2, here is the full call chain:
- CLI layer parses arguments, validates protocol version
- Storage layer (
UpdateWithLock) acquires file lock on task 1 - Protocol layer (
CheckCASVersion) verifies the version - CLI layer checks task status, tool match, blockers
- Protocol layer (
SetLease) sets the lease - Protocol layer (
IncrementVersion) bumps version + timestamp - Storage layer (
AtomicWrite) writes the updated task to disk - Storage layer releases the file lock
- CLI layer prints the updated task as JSON
Each layer has a single responsibility and only depends on the layer below it. The protocol layer knows nothing about files. The storage layer knows nothing about CLI arguments. This separation makes the code easier to test, understand, and extend.