Writing

Writing

Short notes on retries, tail latency, safe replay, and operability design.

Technical notes

Operability First: Policy, Not Hope

Throughput is technical. Operability is sociotechnical. Treat retries and replay as a control plane with explicit policy, bounded failure, and safe recovery.

Safe DLQ replay checklist

A practical runbook for replaying dead-letter messages without corrupting data or melting dependencies, with SQS/SNS and Kafka appendices.

Why recourse

Policy-driven resilience for Go services: consistent retries, explicit backpressure budgets, hedging, and circuit breaking - with explainable observability.

Why redress

My retry philosophy: classification-first, bounded unknowns, capped exponential backoff, and observability hooks.

Musings

The Monolith and the Swarm

How Stories Bind Us and What Happens When They Break