Developer Setup


This page is for people who want to work on AI Agents HQ itself — modifying the Go code, adding features, fixing bugs, or contributing. It covers the development environment, how to run tests, what the CI pipeline checks, and a glossary of all the terminology used throughout this documentation.

Prerequisites

You need two tools installed on your machine:

Tool
Version
Purpose
How to Check
Go
1.23.6+
The programming language
go version
golangci-lint
1.62.2+
Code linter (finds potential bugs)
golangci-lint --version
Installing Go on Linux / WSL
terminal
$ wget https://go.dev/dl/go1.23.6.linux-amd64.tar.gz -O /tmp/go.tar.gz
$ sudo tar -C /usr/local -xzf /tmp/go.tar.gz
$ rm /tmp/go.tar.gz
$ echo 'export PATH="/usr/local/go/bin:$PATH"' >> ~/.bashrc
$ source ~/.bashrc
$ go version
go version go1.23.6 linux/amd64
Installing golangci-lint
terminal
$ curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sudo sh -s -- -b /usr/local/bin v1.62.2
$ golangci-lint --version
golangci-lint has version 1.62.2

Clone and Build

terminal
$ git clone https://github.com/gbasran/ai-agents-hq.git
$ cd ai-agents-hq
$ go build ./cmd/hq/
$ ./hq version
hq version 0.1.0

The go build command compiles all the Go source files and produces a single binary called hq. This binary has no external dependencies — you can copy it to any Linux machine and it will work.

Running Tests


The test suite is the main way to verify that the code is correct. There are 73 tests organized into three packages.

Run Everything

terminal
$ go test ./... -v -race

Breaking this down:

  • go test — Go's built-in test runner
  • ./... — Run tests in every package (the ... means "recursively")
  • -v — Verbose output (show each test name and result)
  • -race — Enable the race detector (catches concurrency bugs)

You should see output like:

terminal
=== RUN TestIntegration_TaskCreateAndList
--- PASS: TestIntegration_TaskCreateAndList (0.01s)
=== RUN TestIntegration_TaskClaimAndComplete
--- PASS: TestIntegration_TaskClaimAndComplete (0.02s)
...
ok github.com/gbas/ai-agents-hq/cmd/hq 1.507s
ok github.com/gbas/ai-agents-hq/internal/teams/protocol 1.026s
ok github.com/gbas/ai-agents-hq/internal/teams/storage 1.110s

Every line that says PASS is a test that succeeded. If any test says FAIL, something is broken.

Run Specific Packages

If you are working on just one package, you can run only its tests:

terminal
# Protocol tests only (data structures, validation)
$ go test ./internal/teams/protocol/ -v
# Storage tests only (file locking, atomic writes, task store)
$ go test ./internal/teams/storage/ -v
# Integration tests only (full CLI end-to-end)
$ go test ./cmd/hq/ -v

Run a Single Test

If one specific test is failing and you want to focus on it:

terminal
$ go test ./cmd/hq/ -v -run TestIntegration_ConcurrentClaims

The -run flag takes a regular expression. It runs only tests whose name matches.

Understanding the Test Types

Unit Tests (protocol package)

These test individual functions in isolation. For example, TestCheckCASVersion creates a task with version 3, calls CheckCASVersion(task, 3) (should succeed), and then CheckCASVersion(task, 5) (should fail). No files are created, no external dependencies.

There are 26 protocol tests covering:

  • Task creation with default values
  • JSON serialization round-trips (marshal then unmarshal, check nothing was lost)
  • Validation of required fields (missing subject, invalid status, etc.)
  • CAS version conflict detection
  • Lease valid/expired/held-by checks
  • Inbox event append with duplicate detection
  • Events-since filtering

Storage Tests (storage package)

These test file system operations using temporary directories. Each test creates a temp directory, performs operations, and checks the results. The temp directories are automatically cleaned up after the test.

There are 20 storage tests covering:

  • File lock acquire and release
  • Atomic write creates file with correct permissions (0600)
  • Atomic write creates parent directories if needed
  • Idempotency store check-and-set (first call: new, second call: duplicate)
  • Task store save, load, list, next ID
  • Task store update with CAS conflict
  • Concurrent claims (10 goroutines) — the most important test. It creates one task, launches 10 goroutines that all try to claim it simultaneously, and verifies exactly 1 wins while the other 9 get CAS conflicts.
  • Inbox store send, read, duplicate key detection, empty inbox

Integration Tests (cmd/hq package)

These compile the actual hq binary and run it as a subprocess. They test the entire system end-to-end, including argument parsing, JSON output formatting, exit codes, and the interaction between multiple commands.

There are 27 integration tests covering:

  • Task create + list
  • Task list filter by tool
  • Task claim + complete (full lifecycle)
  • CAS conflict (two agents claiming the same task)
  • Protocol version mismatch (exit code 12)
  • Duplicate idempotency key (silent no-op)
  • Auto-unblock (complete blocker, dependent becomes pending)
  • Task fail
  • Task fail auto-escalate (3 failures)
  • Task heartbeat
  • Inbox send + read
  • Inbox duplicate key
  • Inbox protocol mismatch
  • Concurrent claims (5 processes) — launches 5 separate OS processes all trying to claim the same task. Exactly 1 wins.
  • Description preserved after complete (summary stored in dedicated field)
  • Description preserved after fail (failureDetail stored in dedicated field)
  • Stale lease recovery (expired lease allows reclaim by different agent)
  • Complete with --notify (inbox event sent to target agent)
  • Concurrent inbox sends (8 parallel sends produce strictly monotonic event IDs)
  • Flag parsing edge cases (table-driven test with 8 cases across all commands)
  • Unauthorized complete does not consume idempotency key

The Race Detector

The -race flag is extremely important for this project because AI Agents HQ is designed for concurrent use. The race detector instruments the binary at compile time to detect data races — situations where two goroutines (or processes) access the same memory without proper synchronization.

If a race is detected, the test immediately fails with a detailed report showing exactly which two goroutines accessed what memory and from which line of code. This is how we caught concurrency bugs during development.

Always run tests with -race enabled. It makes tests slightly slower but catches bugs that would be nearly impossible to find otherwise.

CI Pipeline


Every time you push code to GitHub, a Continuous Integration (CI) pipeline runs automatically. It checks three things, and all three must pass for the commit to be considered "green."

The pipeline is defined in .github/workflows/ci.yml.

Gate 1: Check Formatting

terminal
$ gofmt -l .

Go has an opinionated code formatter called gofmt that enforces one consistent style across all Go code everywhere. There is no configuration — everyone uses the same formatting. This eliminates style debates.

gofmt -l . lists all files that do not match the formatter's output. If any files are listed, the CI gate fails. Fix it by running:

terminal
$ gofmt -w .

The -w flag writes the formatted output back to the file (in place).

Common issues: Tab vs. space alignment in struct field tags, inconsistent line breaks in long function signatures. Just run gofmt -w . and commit the result.

Gate 2: Lint

terminal
$ golangci-lint run ./...

The linter runs dozens of checks beyond what the compiler catches. The most common issue we have encountered is errcheck — Go functions often return errors, and the linter requires that you either handle the error or explicitly discard it.

Bad (linter complains):

terminal
store.Save(task) # Ignoring the error return value!

Good (error handled):

terminal
if err := store.Save(task); err != nil {
t.Fatalf("Save: %v", err)
}

Also good (explicitly discarded):

terminal
_ = fs.Parse(args) # We know this won't error in practice

The _ = syntax tells both the compiler and the linter "I know this returns something, I am intentionally ignoring it." This is different from just not checking — it is a deliberate acknowledgment.

Gate 3: Tests

terminal
$ go test ./... -v -race

All 73 tests must pass with the race detector enabled. If any test fails, the gate fails.

Running CI Locally

Before pushing, run the same three checks locally to catch issues early:

terminal
# 1. Check formatting (no output = all clean)
$ gofmt -l .
# 2. Run linter (no output = all clean)
$ golangci-lint run ./...
# 3. Run all tests with race detector
$ go test ./... -v -race

If all three produce clean output, your push will pass CI.

Glossary


A complete reference for every technical term used in this documentation, explained in plain language.

Atomic Write

Writing a file in a way that it either fully succeeds or does not happen at all. The system writes to a temporary file first, then renames it to the target. If the process crashes mid-write, the original file is still intact. The word "atomic" comes from physics — something that cannot be split into smaller parts. An atomic write cannot be "half done."

CAS (Compare-And-Swap)

A technique for preventing race conditions. Before modifying a task, you check that its version number matches what you expect. If someone else changed it in the meantime (bumping the version), your modification is rejected. You must re-read the latest version and try again. Used in databases, distributed systems, and CPU instructions worldwide.

CI (Continuous Integration)

A system that automatically runs checks (formatting, linting, tests) every time you push code to GitHub. If any check fails, the commit is marked with a red X. If all pass, green checkmark. The idea is to "continuously" verify that new code does not break existing code, rather than finding out days later.

CLI (Command-Line Interface)

A program you interact with by typing commands in a terminal, as opposed to clicking buttons in a graphical interface. The hq tool is a CLI — you type hq task list --team demo to interact with it.

Concurrency

Multiple things happening at the same time. In this project, multiple AI agents might try to claim the same task at the exact same instant. The system must handle this safely so that exactly one agent wins and the others get a clean error, not corrupted data.

Exit Code

A number that a program returns when it finishes. Zero means success, anything else means something went wrong. Different non-zero codes indicate different problems. You can check the exit code in bash with echo $? after running a command.

File Lock (flock)

An operating system feature that lets one program "lock" a file so other programs have to wait before accessing it. Like a bathroom door lock — if someone is inside, you wait until they come out. Used to prevent two agents from modifying the same task file simultaneously.

fsync

A system call that forces the operating system to write data from its memory buffers to the physical disk. Without fsync, data might sit in memory for seconds or minutes before being written, and a power failure could lose it. In atomic writes, fsync ensures the temp file is fully on disk before the rename.

Go (Golang)

A programming language created by Google in 2009. Popular for command-line tools, web servers, and systems programming. Known for simplicity, fast compilation, excellent concurrency support (goroutines), and producing single-binary executables with no dependencies.

Goroutine

Go's lightweight version of a thread — a way to run code concurrently. You can have thousands of goroutines running at once with minimal memory overhead. In our tests, we launch 10 goroutines all racing to claim the same task to verify our concurrency safety mechanisms work.

Heartbeat

A periodic "I'm still alive" signal. When an agent is working on a long task, it sends a heartbeat every 10 minutes to extend its lease. Without heartbeats, the system would think the agent crashed after 30 minutes and let another agent take over.

Idempotency / Idempotency Key

The property where doing something twice has the same effect as doing it once. If an agent sends "task complete" and then sends it again (maybe it crashed and retried), the second time is silently ignored. The idempotency key is a unique string that identifies each operation so the system can detect duplicates.

JSON (JavaScript Object Notation)

A text format for storing structured data. Looks like: {"name": "task1", "status": "pending"}. Despite the name, it is used in virtually every programming language, not just JavaScript. All task and inbox files in AI Agents HQ are JSON.

Lease / Lease TTL

A time-limited reservation on a task. When an agent claims a task, it gets a lease that expires in 30 minutes (the TTL — Time To Live). If the agent finishes before the lease expires, great. If it crashes, the lease expires and another agent can take over. This prevents tasks from getting stuck forever.

Linter / Lint

A tool that analyzes your code without running it and flags potential problems — bugs, bad style, unused variables, unchecked errors. Think of it as a really thorough spell checker for code. Go's popular linter is golangci-lint, which runs dozens of different checks.

MCP (Model Context Protocol)

A protocol for giving AI models access to external tools and data sources. For example, an MCP server might provide file system access, database queries, or API calls. Tasks can specify which MCP servers they need via the required_mcps field.

Module (Go Module)

Go's system for organizing and versioning code. The go.mod file declares the project's name (like github.com/gbas/ai-agents-hq) and any external dependencies. Similar to package.json in Node.js or requirements.txt in Python.

Protocol Version

A number (currently 2) that agents include in every state-changing command. If the protocol changes in a future version, old agents are rejected with a clear error instead of silently corrupting data. This is a safety net for upgrades.

Race Condition / Data Race

A bug that occurs when two things try to read and write the same data at the same time, and the result depends on which one happens to go first. Race conditions are notoriously hard to find because they may only occur under specific timing conditions. Go's race detector catches them automatically during testing.

Race Detector (-race flag)

A Go tool that instruments your code at compile time to detect data races. When two goroutines access the same memory without synchronization, it immediately reports the exact location. Always run tests with -race for concurrent code.

Subprocess

A separate program launched by another program. Our integration tests compile hq into a binary and then run it as a subprocess — exactly like you would type the command in a terminal. This tests the real behavior end-to-end.

Unit Test

A test that checks one small piece of code in isolation — a single function or a single behavior. Unit tests are fast (milliseconds), do not touch the file system or network, and catch regressions immediately.

Integration Test

A test that checks multiple pieces working together. Our integration tests run the full hq binary, create real files, and verify the complete workflow from command-line input to file output. Slower than unit tests but catches issues in the connections between components.

gofmt

Go's built-in code formatter. It enforces one consistent style — everyone's Go code looks the same. There are no configuration options and no style debates. Just run gofmt -w . and commit the result.

YAML

A human-readable data format similar to JSON but with less punctuation. Uses indentation instead of braces. The docs.yaml configuration file for Phosphor is written in YAML. Looks like: title: "My Site" instead of {"title": "My Site"}.

TOML

Another configuration file format, used by Codex's .codex/config.toml. Similar to INI files with [sections] and key = "value" pairs. More structured than YAML, less verbose than JSON.