Writing
Writing
Short notes on retries, tail latency, safe replay, and operability design.
Technical notes
Operability First: Policy, Not Hope
Throughput is technical. Operability is sociotechnical. Treat retries and replay as a control plane with explicit policy, bounded failure, and safe recovery.
Safe DLQ replay checklist
A practical runbook for replaying dead-letter messages without corrupting data or melting dependencies, with SQS/SNS and Kafka appendices.
Why recourse
Policy-driven resilience for Go services: consistent retries, explicit backpressure budgets, hedging, and circuit breaking - with explainable observability.
Why redress
My retry philosophy: classification-first, bounded unknowns, capped exponential backoff, and observability hooks.
Musings
The Monolith and the Swarm
How Stories Bind Us and What Happens When They Break