🐛 use PID liveness for stale, not file age #1

Merged
mat merged 1 commit from worktree-fix-stale-pid-liveness into feat/agents-room-v1 2026-05-20 16:37:47 +00:00
Owner

Summary

  • Busy Claude agents were misreported as stale because Claude Code only updates ~/.claude/sessions/<pid>.json on state transitions — a long tool call easily goes >60s between writes, tripping the old STALE_AFTER_SECS = 60 threshold. The official claude agents doesn't have this problem because it probes the PID; this PR matches that.
  • classify now uses kill(pid, 0) (via a pid_alive helper injected as a closure so tests stay deterministic). done still wins for completed jobs; a dead PID is stale regardless of status; otherwise the raw status maps to work / idle / new wait.
  • New Status::Waiting surfaces Claude Code's "status":"waiting" value (permission-prompt pause), styled cyan; previously it silently mapped to idle.
  • README updated; the time-threshold sentence is gone.

Test plan

  • cargo test — 20 pass, including new cases (dead_pid_classified_stale_even_if_status_busy, waiting_status_maps_to_waiting, done_overrides_dead_pid).
  • Smoke-tested against my real ~/.claude/sessions/ — the "agent tui dashboard" agent that triggered the bug now classifies as work; the only stale entries are sessions whose PID is actually gone.
  • Visual check in the TUI (the ui.rs tests pass but I didn't drive a real terminal in this session).
## Summary - Busy Claude agents were misreported as `stale` because Claude Code only updates `~/.claude/sessions/<pid>.json` on state transitions — a long tool call easily goes >60s between writes, tripping the old `STALE_AFTER_SECS = 60` threshold. The official `claude agents` doesn't have this problem because it probes the PID; this PR matches that. - `classify` now uses `kill(pid, 0)` (via a `pid_alive` helper injected as a closure so tests stay deterministic). `done` still wins for completed jobs; a dead PID is `stale` regardless of status; otherwise the raw status maps to `work` / `idle` / new `wait`. - New `Status::Waiting` surfaces Claude Code's `"status":"waiting"` value (permission-prompt pause), styled cyan; previously it silently mapped to `idle`. - README updated; the time-threshold sentence is gone. ## Test plan - [x] `cargo test` — 20 pass, including new cases (`dead_pid_classified_stale_even_if_status_busy`, `waiting_status_maps_to_waiting`, `done_overrides_dead_pid`). - [x] Smoke-tested against my real `~/.claude/sessions/` — the "agent tui dashboard" agent that triggered the bug now classifies as `work`; the only `stale` entries are sessions whose PID is actually gone. - [ ] Visual check in the TUI (the `ui.rs` tests pass but I didn't drive a real terminal in this session).
Claude Code writes ~/.claude/sessions/<pid>.json only on state
transitions, so a busy agent on a long tool call easily goes >60s
between writes. The previous 60s `updatedAt` threshold misreported
those agents as stale — they show up correctly in `claude agents`
because the official dashboard probes the PID instead. Match that
behavior: `kill(pid, 0)` decides whether a process is gone; status
text decides between work/idle/wait. Also surface the new `waiting`
status that Claude Code emits when paused on a permission prompt.
mat merged commit d963c7b38f into feat/agents-room-v1 2026-05-20 16:37:47 +00:00
mat deleted branch worktree-fix-stale-pid-liveness 2026-05-20 16:37:48 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
mat/sidequest!1
No description provided.