Task Contracts
Guarantees
Execution Semantics
- At-least-once execution: Every scheduled task will be attempted at least once (unless cancelled)
- Idempotency key respect: Tasks with the same
idempotencyKeyrun the handler at most once withinidempotencyRetentionMs(default 24h); subsequent executions return the stored result. Requires a shared KV store to hold across workers/instances - Concurrency cap:
maxConcurrencyis enforced via a distributed counter — at most N instances of a task type run concurrently across the fleet; a worker that can't reserve a slot leaves the task for another - State machine validity: Tasks only transition through valid states:
scheduled → claimed → running → success|failed|retryfailed → dlq|retry|scheduledretry → scheduleddlq → scheduled (replay)
Retry Behavior
- Bounded retries: Tasks retry up to
maxRetriestimes before going to DLQ - Exponential backoff: Retry delays increase exponentially with configurable base and multiplier
- Jitter: Retry times include random jitter to prevent thundering herd
Claim/Lease
- Exclusive execution: Only one worker executes a task at a time (via claim mechanism)
- Claim expiration: If worker dies, claim expires after timeout, allowing reclaim
- No dual execution: CAS operations prevent split-brain dual execution
Recurring Tasks
- Scheduled execution: Recurring tasks execute at specified intervals
- Timezone honored: Cron schedules are evaluated in the configured
timezone(default UTC), so DST transitions shift wall-clock fire times correctly - Configurable catchup: A
catchuppolicy controls how missed occurrences (worker downtime / delayed tick) are handled —"skip"(default, fire once, no pile-up),"last"(coalesce all missed runs into a single execution), or"all"(re-run each missed occurrence betweenlastRunAtand now, bounded to 1000 per tick) - Drift control: Fixed-rate vs fixed-delay semantics are explicit and honored
Dead Letter Queue
- Replay lineage: Replaying a DLQ entry preserves
originalTaskIdand increments areplayCountthat survives across repeated failures/replays; replays may carry an optionalreplayedByactor and are recorded in a replay audit log - Alerting hook: An optional
onDlqEnqueuecallback fires whenever a task lands in the DLQ (failures in the hook never affect DLQ persistence) - Bounded replay:
retryAllis bounded by alimit(default 100) and audited - Metrics:
metrics()exposes the current DLQ count and oldest-entry age
Progress, Heartbeat, Result TTL
- Progress reporting: Handlers receive
ctx.reportProgress(percent, message?)which persists progress (clamped 0–100) so a monitor can read it viastorage.getProgress - Heartbeat: Running tasks update
lastHeartbeatAtindependently of lock extension, so a stalled task is detectable even while its lock is still valid - Result TTL:
resultTtlMs(default 24h) setsresultExpiresAton completed/failed records; expired records are removed lazily on the next read rather than accumulating forever
Cloudflare Queues Backend
- Push delivery:
createCloudflareQueueProducer(binding)enqueues tasks onto a Workers Queue andcreateQueueConsumer({ kv, registry })executes them from aMessageBatch, acking on success/dead-letter and retrying (with backoffdelaySeconds) on retryable failure — no long-lived poller required
Non-Guarantees
Timing (What We Don't Promise)
- ❌ Exact execution time: Tasks execute "around" scheduled time, not precisely at it
- ❌ Order preservation: Tasks scheduled at same time may execute in any order
- ❌ Clock accuracy: System depends on reasonable clock accuracy (±seconds, not milliseconds)
Execution (What We Don't Promise)
- ❌ Exactly-once: Delivery is at-least-once.
idempotencyKeygives effectively-once execution within the retention window when a shared KV is configured, but without an idempotency key (or with a memory KV that isn't shared) a task may run more than once - ❌ Concurrency cap without shared KV:
maxConcurrencyenforcement relies on the shared counter; with the per-process memory KV it only bounds a single instance - ❌ Execution duration limits: Tasks can run indefinitely (unless timeout configured)
- ❌ Permanent result persistence: Completed task records (including results) are retained only until
resultExpiresAt(resultTtlMs, default 24h), then removed on the next read; they are not kept forever
Distributed (What We Don't Promise)
- ❌ Fair distribution: Work distribution across workers is best-effort, not guaranteed fair
- ❌ Affinity: Same task may execute on different workers across retries
Failure Modes
Worker Crash During Execution
- Task remains in
claimedorrunningstate - Claim expires after timeout
- Another worker reclaims and retries
runCountis incremented for retry tracking
Database Unavailable
- Task scheduling fails (reported to caller)
- Running tasks may fail to update status
- On recovery, orphaned claims are reclaimed
Poison Pill (Always-Failing Task)
- Retries up to
maxRetries - Moves to DLQ after exhausting retries
- Does NOT block other tasks in queue
- DLQ can be replayed with new idempotency key
Clock Skew Between Workers
- Claim timeouts account for reasonable skew (recommended: timeout > 2× max skew)
- Scheduling uses server time, not worker time
- Backoff calculations use relative time
Test Coverage
tests/invariants/task-state-machine.test.ts- State machine invariantstests/invariants/distributed-correctness.test.ts- Distributed scenariostests/tasks/worker.test.ts- Worker behaviortests/tasks/scheduler.test.ts- Schedulingtests/tasks/retry.test.ts- Retry logictests/tasks/dlq.test.ts- Dead letter queuetests/tasks/recurring.test.ts- Recurring tasks