5.5 KiB
5.5 KiB
Verification Guide
Before marking any task complete, you MUST verify your work. Verification separates "I think it's done" from "it's actually done."
The Verification Process
- Re-read the task context: What did you originally commit to do?
- Check acceptance criteria: Does your implementation satisfy the "Done when" conditions?
- Run relevant tests: Execute the test suite and document results
- Test manually: Actually try the feature/change yourself
- Compare with requirements: Does what you built match what was asked?
Strong vs Weak Verification
Strong Verification Examples
- "All 60 tests passing, build successful"
- "All 69 tests passing (4 new tests for middleware edge cases)"
- "Manually tested with valid/invalid/expired tokens - all cases work"
- "Ran
cargo test- 142 tests passed, 0 failed"
Weak Verification (Avoid)
- "Should work now" - "should" means not verified
- "Made the changes" - no evidence it works
- "Added tests" - did the tests pass? What's the count?
- "Fixed the bug" - what bug? Did you verify the fix?
- "Done" - done how? prove it
Verification by Task Type
| Task Type | How to Verify |
|---|---|
| Code changes | Run full test suite, document passing count |
| New features | Run tests + manual testing of functionality |
| Configuration | Test the config works (run commands, check workflows) |
| Documentation | Verify examples work, links resolve, formatting renders |
| Refactoring | Confirm tests still pass, no behavior changes |
| Bug fixes | Reproduce bug first, verify fix, add regression test |
Cross-Reference Checklist
Before marking complete, verify all applicable items:
- Task description requirements met
- Context "Done when" criteria satisfied
- Tests passing (document count: "All X tests passing")
- Build succeeds (if applicable)
- Manual testing done (describe what you tested)
- No regressions introduced
- Edge cases considered (error handling, invalid input)
- Follow-up work identified (created new tasks if needed)
If you can't check all applicable boxes, the task isn't done yet.
Result Examples with Verification
Code Implementation
await tasks.complete(taskId, `Implemented JWT middleware:
Implementation:
- Created src/middleware/verify-token.ts
- Separated 'expired' vs 'invalid' error codes
- Added user extraction from payload
Verification:
- All 69 tests passing (4 new tests for edge cases)
- Manually tested with valid token: Access granted
- Manually tested with expired token: 401 with 'token_expired'
- Manually tested with invalid signature: 401 with 'invalid_token'`);
Configuration/Infrastructure
await tasks.complete(taskId, `Added GitHub Actions workflow for CI:
Implementation:
- Created .github/workflows/ci.yml
- Jobs: lint, test, build with pnpm cache
Verification:
- Pushed to test branch, opened PR #123
- Workflow triggered automatically
- All jobs passed (lint: 0 errors, test: 69/69, build: success)
- Total run time: 2m 34s`);
Refactoring
await tasks.complete(taskId, `Refactored storage to one file per task:
Implementation:
- Split tasks.json into .overseer/tasks/{id}.json files
- Added auto-migration from old format
- Atomic writes via temp+rename
Verification:
- All 60 tests passing (including 8 storage tests)
- Build successful
- Manually tested migration: old -> new format works
- Confirmed git diff shows only changed tasks`);
Bug Fix
await tasks.complete(taskId, `Fixed login validation accepting usernames with spaces:
Root cause:
- Validation regex didn't account for leading/trailing spaces
Fix:
- Added .trim() before validation in src/auth/validate.ts:42
- Updated regex to reject internal spaces
Verification:
- All 45 tests passing (2 new regression tests)
- Manually tested:
- " admin" -> rejected (leading space)
- "admin " -> rejected (trailing space)
- "ad min" -> rejected (internal space)
- "admin" -> accepted`);
Documentation
await tasks.complete(taskId, `Updated API documentation for auth endpoints:
Implementation:
- Added docs for POST /auth/login
- Added docs for POST /auth/logout
- Added docs for POST /auth/refresh
- Included example requests/responses
Verification:
- All code examples tested and working
- Links verified (no 404s)
- Rendered in local preview - formatting correct
- Spell-checked content`);
Common Verification Mistakes
| Mistake | Better Approach |
|---|---|
| "Tests pass" | "All 42 tests passing" (include count) |
| "Manually tested" | "Manually tested X, Y, Z scenarios" (be specific) |
| "Works" | "Works: [evidence]" (show proof) |
| "Fixed" | "Fixed: [root cause] -> [solution] -> [verification]" |
When Verification Fails
If verification reveals issues:
- Don't complete the task - it's not done
- Document what failed in task context
- Fix the issues before completing
- Re-verify after fixes
// Update context with failure notes
await tasks.update(taskId, {
context: task.context + `
Verification attempt 1 (failed):
- Tests: 41/42 passing
- Failing: test_token_refresh - timeout issue
- Need to investigate async handling`
});
// After fixing
await tasks.complete(taskId, `Implemented token refresh:
Implementation:
- Added refresh endpoint
- Fixed async timeout (was missing await)
Verification:
- All 42 tests passing (fixed timeout issue)
- Manual testing: refresh works within 30s window`);