Disaster recovery
The actual recovery layers - continuous archiving, base backups, nightly dumps - and the honest limits of a single-region product.
The layers
Recovery is not one mechanism. There are three independent layers, each protecting against a different failure:
| Layer | Cadence | Retention | Protects against |
|---|---|---|---|
| Continuous transaction-log archiving + daily physical base backups | Log segments shipped continuously (forced at least every 60 seconds); base backup daily at 04:00 UTC | The 30 most recent base backups (roughly a 30-day point-in-time window) | "We need the database exactly as it was at 14:32" — and total loss of the host |
| Nightly logical dumps of every project database | Daily at 02:15 UTC | 14 days | Per-database corruption or deletion, independent of the archive chain |
| Your own backups — on demand or scheduled | You choose | You choose (1–365 days) | "Snapshot before the risky thing", named and on your terms |
All backup artifacts are written to object storage off the database host, so losing the machine does not lose the backups. Each logical backup is verified by actually restoring it into a throwaway database before it is reported as complete — a backup that does not restore is treated as a failure, not a success.
How point-in-time recovery works
PITR is available in regions with continuous-archiving storage (the project's region info shows whether it applies). When you request a restore to a timestamp:
- A temporary recovery instance is materialized from the most recent base backup taken before your timestamp.
- The archived transaction log is replayed up to the requested moment.
- Your single database is extracted from the recovered instance and loaded into the target — a preview database by default, or production with the explicit overwrite confirmation.
The practical consequences:
- Granularity is per-database. A PITR restore lands one project's database, not the whole host.
- The window is bounded by base-backup retention — about 30 days. Beyond that, your own retained backups are the recovery path.
- Duration scales with data size. Recovery replays a base backup plus up to a day of transaction log, then dumps and reloads your database. Small databases restore in minutes; large ones take as long as they take. Restore into a preview when you want to measure it before you need it.
Realistic RPO and RTO framing
We are not going to print SLA numbers we have not earned. What the mechanics support:
- Data-loss window (RPO): within the PITR window, restores hit an exact requested timestamp. If the host is lost outright, the recoverable state is everything up to the last archived log segment — archiving is continuous and segments are forced out at least every 60 seconds under normal operation, so the exposure is on the order of the final minute of writes, plus whatever a real incident does to shipping that last segment.
- Recovery time (RTO): no committed number. It is dominated by database size and the failure mode — a single-database restore is a routine job; rebuilding from a lost host is an operator-driven incident. There is no automatic failover.
Host failure
The provisioning supports a warm standby: a second host streaming replication from the primary over a private network, with replication health (recovery state, replay lag) sampled continuously by the control plane. Failover to a standby is a manual operator action — a human promotes the standby; nothing flips automatically. This is a deliberate trade: automatic failover that fires on a false positive does more damage in a product this size than a human taking minutes to confirm.
Single region, stated plainly
A project lives in one region on dedicated EU infrastructure (Regions). There is no cross-region replication and no multi-region failover today. If your availability requirements demand a database that survives the loss of an entire region without operator involvement, CapyDB is not that product yet — and we would rather you read that here than discover it during an incident.
What you should do
- Keep scheduled backups on with a retention that matches how far back you ever realistically need to go.
- Pin a restore point before risky work — a named rollback target beats reconstructing a timestamp.
- Rehearse: restore a backup into a preview once, now, while nothing is on fire. You will learn your real restore duration and confirm the backups contain what you think they do.
- For belt-and-braces independence,
pg_dumpover the direct connection works like on any Postgres — your data is never locked in.