Developer Setup
This page is for people who want to work on AI Agents HQ itself — modifying the Go code, adding features, fixing bugs, or contributing. It covers the development environment, how to run tests, what the CI pipeline checks, and a glossary of all the terminology used throughout this documentation.
Prerequisites
You need two tools installed on your machine:
go versiongolangci-lint --versionInstalling Go on Linux / WSL
Installing golangci-lint
Clone and Build
The go build command compiles all the Go source files and produces a single binary called hq. This binary has no external dependencies — you can copy it to any Linux machine and it will work.
Running Tests
The test suite is the main way to verify that the code is correct. There are 73 tests organized into three packages.
Run Everything
Breaking this down:
go test— Go's built-in test runner./...— Run tests in every package (the...means "recursively")-v— Verbose output (show each test name and result)-race— Enable the race detector (catches concurrency bugs)
You should see output like:
Every line that says PASS is a test that succeeded. If any test says FAIL, something is broken.
Run Specific Packages
If you are working on just one package, you can run only its tests:
Run a Single Test
If one specific test is failing and you want to focus on it:
The -run flag takes a regular expression. It runs only tests whose name matches.
Understanding the Test Types
Unit Tests (protocol package)
These test individual functions in isolation. For example, TestCheckCASVersion creates a task with version 3, calls CheckCASVersion(task, 3) (should succeed), and then CheckCASVersion(task, 5) (should fail). No files are created, no external dependencies.
There are 26 protocol tests covering:
- Task creation with default values
- JSON serialization round-trips (marshal then unmarshal, check nothing was lost)
- Validation of required fields (missing subject, invalid status, etc.)
- CAS version conflict detection
- Lease valid/expired/held-by checks
- Inbox event append with duplicate detection
- Events-since filtering
Storage Tests (storage package)
These test file system operations using temporary directories. Each test creates a temp directory, performs operations, and checks the results. The temp directories are automatically cleaned up after the test.
There are 20 storage tests covering:
- File lock acquire and release
- Atomic write creates file with correct permissions (0600)
- Atomic write creates parent directories if needed
- Idempotency store check-and-set (first call: new, second call: duplicate)
- Task store save, load, list, next ID
- Task store update with CAS conflict
- Concurrent claims (10 goroutines) — the most important test. It creates one task, launches 10 goroutines that all try to claim it simultaneously, and verifies exactly 1 wins while the other 9 get CAS conflicts.
- Inbox store send, read, duplicate key detection, empty inbox
Integration Tests (cmd/hq package)
These compile the actual hq binary and run it as a subprocess. They test the entire system end-to-end, including argument parsing, JSON output formatting, exit codes, and the interaction between multiple commands.
There are 27 integration tests covering:
- Task create + list
- Task list filter by tool
- Task claim + complete (full lifecycle)
- CAS conflict (two agents claiming the same task)
- Protocol version mismatch (exit code 12)
- Duplicate idempotency key (silent no-op)
- Auto-unblock (complete blocker, dependent becomes pending)
- Task fail
- Task fail auto-escalate (3 failures)
- Task heartbeat
- Inbox send + read
- Inbox duplicate key
- Inbox protocol mismatch
- Concurrent claims (5 processes) — launches 5 separate OS processes all trying to claim the same task. Exactly 1 wins.
- Description preserved after complete (summary stored in dedicated field)
- Description preserved after fail (failureDetail stored in dedicated field)
- Stale lease recovery (expired lease allows reclaim by different agent)
- Complete with
--notify(inbox event sent to target agent) - Concurrent inbox sends (8 parallel sends produce strictly monotonic event IDs)
- Flag parsing edge cases (table-driven test with 8 cases across all commands)
- Unauthorized complete does not consume idempotency key
The Race Detector
The -race flag is extremely important for this project because AI Agents HQ is designed for concurrent use. The race detector instruments the binary at compile time to detect data races — situations where two goroutines (or processes) access the same memory without proper synchronization.
If a race is detected, the test immediately fails with a detailed report showing exactly which two goroutines accessed what memory and from which line of code. This is how we caught concurrency bugs during development.
Always run tests with -race enabled. It makes tests slightly slower but catches bugs that would be nearly impossible to find otherwise.
CI Pipeline
Every time you push code to GitHub, a Continuous Integration (CI) pipeline runs automatically. It checks three things, and all three must pass for the commit to be considered "green."
The pipeline is defined in .github/workflows/ci.yml.
Gate 1: Check Formatting
Go has an opinionated code formatter called gofmt that enforces one consistent style across all Go code everywhere. There is no configuration — everyone uses the same formatting. This eliminates style debates.
gofmt -l . lists all files that do not match the formatter's output. If any files are listed, the CI gate fails. Fix it by running:
The -w flag writes the formatted output back to the file (in place).
Common issues: Tab vs. space alignment in struct field tags, inconsistent line breaks in long function signatures. Just run gofmt -w . and commit the result.
Gate 2: Lint
The linter runs dozens of checks beyond what the compiler catches. The most common issue we have encountered is errcheck — Go functions often return errors, and the linter requires that you either handle the error or explicitly discard it.
Bad (linter complains):
Good (error handled):
Also good (explicitly discarded):
The _ = syntax tells both the compiler and the linter "I know this returns something, I am intentionally ignoring it." This is different from just not checking — it is a deliberate acknowledgment.
Gate 3: Tests
All 73 tests must pass with the race detector enabled. If any test fails, the gate fails.
Running CI Locally
Before pushing, run the same three checks locally to catch issues early:
If all three produce clean output, your push will pass CI.
Glossary
A complete reference for every technical term used in this documentation, explained in plain language.
Atomic Write
Writing a file in a way that it either fully succeeds or does not happen at all. The system writes to a temporary file first, then renames it to the target. If the process crashes mid-write, the original file is still intact. The word "atomic" comes from physics — something that cannot be split into smaller parts. An atomic write cannot be "half done."
CAS (Compare-And-Swap)
A technique for preventing race conditions. Before modifying a task, you check that its version number matches what you expect. If someone else changed it in the meantime (bumping the version), your modification is rejected. You must re-read the latest version and try again. Used in databases, distributed systems, and CPU instructions worldwide.
CI (Continuous Integration)
A system that automatically runs checks (formatting, linting, tests) every time you push code to GitHub. If any check fails, the commit is marked with a red X. If all pass, green checkmark. The idea is to "continuously" verify that new code does not break existing code, rather than finding out days later.
CLI (Command-Line Interface)
A program you interact with by typing commands in a terminal, as opposed to clicking buttons in a graphical interface. The hq tool is a CLI — you type hq task list --team demo to interact with it.
Concurrency
Multiple things happening at the same time. In this project, multiple AI agents might try to claim the same task at the exact same instant. The system must handle this safely so that exactly one agent wins and the others get a clean error, not corrupted data.
Exit Code
A number that a program returns when it finishes. Zero means success, anything else means something went wrong. Different non-zero codes indicate different problems. You can check the exit code in bash with echo $? after running a command.
File Lock (flock)
An operating system feature that lets one program "lock" a file so other programs have to wait before accessing it. Like a bathroom door lock — if someone is inside, you wait until they come out. Used to prevent two agents from modifying the same task file simultaneously.
fsync
A system call that forces the operating system to write data from its memory buffers to the physical disk. Without fsync, data might sit in memory for seconds or minutes before being written, and a power failure could lose it. In atomic writes, fsync ensures the temp file is fully on disk before the rename.
Go (Golang)
A programming language created by Google in 2009. Popular for command-line tools, web servers, and systems programming. Known for simplicity, fast compilation, excellent concurrency support (goroutines), and producing single-binary executables with no dependencies.
Goroutine
Go's lightweight version of a thread — a way to run code concurrently. You can have thousands of goroutines running at once with minimal memory overhead. In our tests, we launch 10 goroutines all racing to claim the same task to verify our concurrency safety mechanisms work.
Heartbeat
A periodic "I'm still alive" signal. When an agent is working on a long task, it sends a heartbeat every 10 minutes to extend its lease. Without heartbeats, the system would think the agent crashed after 30 minutes and let another agent take over.
Idempotency / Idempotency Key
The property where doing something twice has the same effect as doing it once. If an agent sends "task complete" and then sends it again (maybe it crashed and retried), the second time is silently ignored. The idempotency key is a unique string that identifies each operation so the system can detect duplicates.
JSON (JavaScript Object Notation)
A text format for storing structured data. Looks like: {"name": "task1", "status": "pending"}. Despite the name, it is used in virtually every programming language, not just JavaScript. All task and inbox files in AI Agents HQ are JSON.
Lease / Lease TTL
A time-limited reservation on a task. When an agent claims a task, it gets a lease that expires in 30 minutes (the TTL — Time To Live). If the agent finishes before the lease expires, great. If it crashes, the lease expires and another agent can take over. This prevents tasks from getting stuck forever.
Linter / Lint
A tool that analyzes your code without running it and flags potential problems — bugs, bad style, unused variables, unchecked errors. Think of it as a really thorough spell checker for code. Go's popular linter is golangci-lint, which runs dozens of different checks.
MCP (Model Context Protocol)
A protocol for giving AI models access to external tools and data sources. For example, an MCP server might provide file system access, database queries, or API calls. Tasks can specify which MCP servers they need via the required_mcps field.
Module (Go Module)
Go's system for organizing and versioning code. The go.mod file declares the project's name (like github.com/gbas/ai-agents-hq) and any external dependencies. Similar to package.json in Node.js or requirements.txt in Python.
Protocol Version
A number (currently 2) that agents include in every state-changing command. If the protocol changes in a future version, old agents are rejected with a clear error instead of silently corrupting data. This is a safety net for upgrades.
Race Condition / Data Race
A bug that occurs when two things try to read and write the same data at the same time, and the result depends on which one happens to go first. Race conditions are notoriously hard to find because they may only occur under specific timing conditions. Go's race detector catches them automatically during testing.
Race Detector (-race flag)
A Go tool that instruments your code at compile time to detect data races. When two goroutines access the same memory without synchronization, it immediately reports the exact location. Always run tests with -race for concurrent code.
Subprocess
A separate program launched by another program. Our integration tests compile hq into a binary and then run it as a subprocess — exactly like you would type the command in a terminal. This tests the real behavior end-to-end.
Unit Test
A test that checks one small piece of code in isolation — a single function or a single behavior. Unit tests are fast (milliseconds), do not touch the file system or network, and catch regressions immediately.
Integration Test
A test that checks multiple pieces working together. Our integration tests run the full hq binary, create real files, and verify the complete workflow from command-line input to file output. Slower than unit tests but catches issues in the connections between components.
gofmt
Go's built-in code formatter. It enforces one consistent style — everyone's Go code looks the same. There are no configuration options and no style debates. Just run gofmt -w . and commit the result.
YAML
A human-readable data format similar to JSON but with less punctuation. Uses indentation instead of braces. The docs.yaml configuration file for Phosphor is written in YAML. Looks like: title: "My Site" instead of {"title": "My Site"}.
TOML
Another configuration file format, used by Codex's .codex/config.toml. Similar to INI files with [sections] and key = "value" pairs. More structured than YAML, less verbose than JSON.