Disaster recovery

The actual recovery layers - continuous archiving, base backups, nightly dumps - and the honest limits of a single-region product.

The layers

Recovery is not one mechanism. There are three independent layers, each protecting against a different failure:

Layer	Cadence	Retention	Protects against
Continuous transaction-log archiving + daily physical base backups	Log segments shipped continuously (forced at least every 60 seconds); base backup daily at 04:00 UTC	The 30 most recent base backups (roughly a 30-day point-in-time window)	"We need the database exactly as it was at 14:32" - and total loss of the node
Nightly logical dumps of every project database	Daily at 02:15 UTC	14 days	Per-database corruption or deletion, independent of the archive chain
Your own backups - on demand or scheduled	You choose	You choose (1–365 days)	"Snapshot before the risky thing", named and on your terms

All backup artifacts are written to object storage off the database node, so losing the machine does not lose the backups. Each logical backup is verified by actually restoring it into a throwaway database before it is reported as complete - a backup that does not restore is treated as a failure, not a success.

How point-in-time recovery works

PITR is available in regions with continuous-archiving storage (the project's region info shows whether it applies). When you request a restore to a timestamp:

A temporary recovery cell is materialized from the most recent base backup taken before your timestamp.
The archived transaction log is replayed up to the requested moment.
Your single database is extracted from the recovery cell and loaded into the target - a preview database by default, or production with the explicit overwrite confirmation.

The practical consequences:

Granularity is per-database. A PITR restore lands one project's database, not the whole node.
The window is bounded by base-backup retention - about 30 days. Beyond that, your own retained backups are the recovery path.
Duration scales with data size. Recovery replays a base backup plus up to a day of transaction log, then dumps and reloads your database. Small databases restore in minutes; large ones take as long as they take. Restore into a preview when you want to measure it before you need it.

Realistic RPO and RTO framing

We are not going to print SLA numbers we have not earned. What the mechanics support:

Data-loss window (RPO): within the PITR window, restores hit an exact requested timestamp. If the node is lost outright, the recoverable state is everything up to the last archived log segment - archiving is continuous and segments are forced out at least every 60 seconds under normal operation, so the exposure is on the order of the final minute of writes, plus whatever a real incident does to shipping that last segment.
Recovery time (RTO): no committed number. It is dominated by database size and the failure mode - a single-database restore is a routine job; rebuilding from a lost node is an operator-driven incident. There is no automatic failover.

Node failure

The provisioning supports a warm standby: a second node streaming replication from the primary over a private network, with replication health (recovery state, replay lag) sampled continuously by the control plane. Failover to a standby is a manual operator action - a human promotes the standby; nothing flips automatically. This is a deliberate trade: automatic failover that fires on a false positive does more damage in a product this size than a human taking minutes to confirm.

Single region, stated plainly

A project lives in one region on dedicated EU infrastructure (Regions). There is no cross-region replication and no multi-region failover today. If your availability requirements demand a database that survives the loss of an entire region without operator involvement, CapyDB is not that product yet - and we would rather you read that here than discover it during an incident.

What you should do

Keep scheduled backups on with a retention that matches how far back you ever realistically need to go.
Pin a restore point before risky work - a named rollback target beats reconstructing a timestamp.
Rehearse: restore a backup into a preview once, now, while nothing is on fire. You will learn your real restore duration and confirm the backups contain what you think they do.
For belt-and-braces independence, pg_dump over the direct connection works like on any Postgres - your data is never locked in.