Files

Signed-off-by: Christoph Schmatzler <christoph@schmatzler.com>

2026-02-04 20:04:32 +00:00

5.5 KiB

Raw Blame History

Verification Guide

Before marking any task complete, you MUST verify your work. Verification separates "I think it's done" from "it's actually done."

The Verification Process

Re-read the task context: What did you originally commit to do?
Check acceptance criteria: Does your implementation satisfy the "Done when" conditions?
Run relevant tests: Execute the test suite and document results
Test manually: Actually try the feature/change yourself
Compare with requirements: Does what you built match what was asked?

Strong vs Weak Verification

Strong Verification Examples

"All 60 tests passing, build successful"
"All 69 tests passing (4 new tests for middleware edge cases)"
"Manually tested with valid/invalid/expired tokens - all cases work"
"Ran cargo test - 142 tests passed, 0 failed"

Weak Verification (Avoid)

"Should work now" - "should" means not verified
"Made the changes" - no evidence it works
"Added tests" - did the tests pass? What's the count?
"Fixed the bug" - what bug? Did you verify the fix?
"Done" - done how? prove it

Verification by Task Type

Task Type	How to Verify
Code changes	Run full test suite, document passing count
New features	Run tests + manual testing of functionality
Configuration	Test the config works (run commands, check workflows)
Documentation	Verify examples work, links resolve, formatting renders
Refactoring	Confirm tests still pass, no behavior changes
Bug fixes	Reproduce bug first, verify fix, add regression test

Cross-Reference Checklist

Before marking complete, verify all applicable items:

Task description requirements met
Context "Done when" criteria satisfied
Tests passing (document count: "All X tests passing")
Build succeeds (if applicable)
Manual testing done (describe what you tested)
No regressions introduced
Edge cases considered (error handling, invalid input)
Follow-up work identified (created new tasks if needed)

If you can't check all applicable boxes, the task isn't done yet.

Result Examples with Verification

Code Implementation

await tasks.complete(taskId, `Implemented JWT middleware:

Implementation:
- Created src/middleware/verify-token.ts
- Separated 'expired' vs 'invalid' error codes
- Added user extraction from payload

Verification:
- All 69 tests passing (4 new tests for edge cases)
- Manually tested with valid token: Access granted
- Manually tested with expired token: 401 with 'token_expired'
- Manually tested with invalid signature: 401 with 'invalid_token'`);

Configuration/Infrastructure

await tasks.complete(taskId, `Added GitHub Actions workflow for CI:

Implementation:
- Created .github/workflows/ci.yml
- Jobs: lint, test, build with pnpm cache

Verification:
- Pushed to test branch, opened PR #123
- Workflow triggered automatically
- All jobs passed (lint: 0 errors, test: 69/69, build: success)
- Total run time: 2m 34s`);

Refactoring

await tasks.complete(taskId, `Refactored storage to one file per task:

Implementation:
- Split tasks.json into .overseer/tasks/{id}.json files
- Added auto-migration from old format
- Atomic writes via temp+rename

Verification:
- All 60 tests passing (including 8 storage tests)
- Build successful
- Manually tested migration: old -> new format works
- Confirmed git diff shows only changed tasks`);

Bug Fix

await tasks.complete(taskId, `Fixed login validation accepting usernames with spaces:

Root cause:
- Validation regex didn't account for leading/trailing spaces

Fix:
- Added .trim() before validation in src/auth/validate.ts:42
- Updated regex to reject internal spaces

Verification:
- All 45 tests passing (2 new regression tests)
- Manually tested:
  - " admin" -> rejected (leading space)
  - "admin " -> rejected (trailing space)
  - "ad min" -> rejected (internal space)
  - "admin" -> accepted`);

Documentation

await tasks.complete(taskId, `Updated API documentation for auth endpoints:

Implementation:
- Added docs for POST /auth/login
- Added docs for POST /auth/logout
- Added docs for POST /auth/refresh
- Included example requests/responses

Verification:
- All code examples tested and working
- Links verified (no 404s)
- Rendered in local preview - formatting correct
- Spell-checked content`);

Common Verification Mistakes

Mistake	Better Approach
"Tests pass"	"All 42 tests passing" (include count)
"Manually tested"	"Manually tested X, Y, Z scenarios" (be specific)
"Works"	"Works: [evidence]" (show proof)
"Fixed"	"Fixed: [root cause] -> [solution] -> [verification]"

When Verification Fails

If verification reveals issues:

Don't complete the task - it's not done
Document what failed in task context
Fix the issues before completing
Re-verify after fixes

// Update context with failure notes
await tasks.update(taskId, {
  context: task.context + `

Verification attempt 1 (failed):
- Tests: 41/42 passing
- Failing: test_token_refresh - timeout issue
- Need to investigate async handling`
});

// After fixing
await tasks.complete(taskId, `Implemented token refresh:

Implementation:
- Added refresh endpoint
- Fixed async timeout (was missing await)

Verification:
- All 42 tests passing (fixed timeout issue)
- Manual testing: refresh works within 30s window`);

5.5 KiB Raw Blame History