---
rfc: 0020
title: CF-Native Gateway Key Custody (v2)
status: RATIFIED
version: 2.0.0
date: 2026-05-07
authors: [agentic-architect]
audience: [cloudflare-native-edge, devex-protocol-sec, edge-kubelet-engineer, cto, ceo]
requires: [RFC-0006-v1.1, RFC-0017]
supersedes: [RFC-0020-hsm-cutover-aws-kms-spike.md]
intersects: [RFC-0015, RFC-0016, RFC-0018]
---

# RFC-0020 v2 — CF-Native Gateway Key Custody

> **Schema is the UX.** This RFC is consumed by Cloud Agents (LLMs) and the
> `cloudflare-native-edge` implementer. Every contract here is closed-enum,
> versioned, and refuses unknown fields.

---

## §1. Status

- **DRAFT — ratified-pending (CEO fast-track 2026-05-07)**.
- **Supersedes** [`docs/rfcs/0020-hsm-cutover-aws-kms.md`](./0020-hsm-cutover-aws-kms.md)
  (the AWS-KMS spike, hereafter "RFC-0020 v1"). v1 is **REVOKED** as of CEO directive
  2026-05-07; this document replaces it in toto.
- Companion design doc: [`RFC-0020-cf-native-custody/signer-do-design.md`](./RFC-0020-cf-native-custody/signer-do-design.md)
  (`cloudflare-native-edge`, LANDED 2026-05-07). That document is the wire-level
  DO surface (storage layout, RPC handlers, audit ring buffer, rate-limit
  internals); this RFC is the authoritative *contract* for what the DO must
  guarantee. Where the two disagree, this RFC wins.

---

## §2. Decision Record

| Item                       | Value                                                                 |
|----------------------------|-----------------------------------------------------------------------|
| S4-D7 (AWS KMS)            | **REVOKED** 2026-05-07                                                |
| S4-D7-revised              | **RATIFIED** 2026-05-07 — Cloudflare-only custody                     |
| Authority                  | CEO directive 2026-05-07 ("Cloudflare-only custody")                  |
| Single-cloud rationale     | Operational simplicity; no cross-cloud auth/secret-sync surface       |

**Accepted residual risk (verbatim, CEO 2026-05-07):**

> "CF account compromise = full key compromise; accepted in exchange for single-cloud
> operational simplicity."

This RFC does not reopen that trade. Mitigations live downstream (RFC-0006 quorum,
SignerDO RPC rate-limit, JWKS overlap). FIPS-140 boundary is **out of scope** (§11).

---

## §3. Architecture Overview

Custody is implemented by the **SignerDO** Durable Object pattern (see
[`signer-do-design.md`](./RFC-0020-cf-native-custody/signer-do-design.md) §1–§6
for the wire-level DO surface; this RFC is the authoritative contract for
*what* the DO must guarantee).

Invariants (NORMATIVE):

1. **One SignerDO instance per `kid`.** Address is `idFromName(kid)`. The DO is the
   single writer of that key's lifecycle.
2. **No plaintext key export.** The DO MUST NOT expose any RPC that returns a
   private-key byte. The only egress paths are `sign(payload)` and
   `getPublicKey()` / `attestKeyGen()`.
3. **Keys are generated inside the DO.** Genesis ceremony (§5) calls
   `attestKeyGen()`; the DO generates the keypair via WebCrypto / `@noble/*`
   inside the isolate, persists the private half to DO storage, and returns
   the public half + an `AttestationDoc`.
4. **Quorum binding (§7) is enforced inside the DO.** A `sign()` call whose
   target capability has `safety_class ∈ {physical_actuation, power_control}`
   MUST present a `QuorumToken` (PolicyDO-issued, §7) or be refused.
5. **Algorithm pinning.** Each SignerDO is bound to exactly one alg-pair at
   genesis: `Ed25519+ML-DSA-65` (gateway hybrid signers) — matching the
   existing `ed25519_sig(64B) || mldsa65_sig(3309B) = 3373B` wire contract.
   Other alg-pairs require a distinct SignerDO type and a new RFC.

```
┌──────────────────────┐   sign(payload, quorum_token?)
│  Gateway Worker      │ ───────────────────────────────► ┌─────────────────┐
│  (dispatch + RFC-6)  │ ◄─────────────────────────────── │  SignerDO[kid]  │
└──────────────────────┘     hybrid_sig (3373B)            │  - keypair@DOst │
        │                                                  │  - quorum check │
        │ getPublicKey() / attestKeyGen()                  │  - rate limit   │
        └──────────────────────────────────────────────────►└─────────────────┘
                                                                 │
                                                  /.well-known/jwks.json (§8)
```

---

## §4. Custody Interface (NORMATIVE)

### §4.0 Deployment Locus (LOCKED 2026-05-07)

The `SignerDO` class is declared in a **separate Worker script** named
`signer-worker` (script isolation). The `mcp-gateway` Worker invokes it via a
**service binding** — not in-isolate DO namespace binding — so that a
compromise of the gateway isolate cannot reach the DO storage layer except
through the four RPCs in §4.1–§4.4. This raises the trust boundary from
same-isolate to service-binding RPC. (Resolves `signer-do-design.md` §10 Q1.)

Per-region/per-`kid` instance binding remains `env.SIGNER_DO.idFromName(kid)`
inside the `signer-worker` script (one DO instance per regional kid; resolves
`signer-do-design.md` §10 Q3 — confirmed YES, see RFC-0018 §4).

The four RPC signatures below are the authoritative contract. The companion
[`signer-do-design.md`](./RFC-0020-cf-native-custody/signer-do-design.md)
provides the wire-level encoding; any disagreement is resolved in favour of
this section.

> Schemas are JSON Schema Draft 2020-12. All inputs `additionalProperties: false`.

### §4.1 `sign`

```json
{
  "$id": "mcp://schemas/signer-do/sign-input@2.0.0",
  "type": "object",
  "additionalProperties": false,
  "required": ["payload_b64", "purpose"],
  "properties": {
    "payload_b64":   { "type": "string", "contentEncoding": "base64", "maxLength": 65536 },
    "purpose":       { "enum": ["runtime_token", "tenant_init", "tool_call_envelope"] },
    "quorum_token":  { "type": "string", "description": "REQUIRED iff target cap is physical_actuation/power_control (§7). PolicyDO-issued JWS." },
    "request_id":    { "type": "string", "format": "uuid", "description": "idempotency + audit correlation" }
  }
}
```

Output: `{ "kid": "<string>", "alg": "Ed25519+ML-DSA-65", "sig_b64": "<3373B base64>" }`.

### §4.2 `getPublicKey`

Input: `{}` (empty, `additionalProperties: false`).
Output: JWK-set fragment for this `kid` — `{ kid, alg, status, ed25519_pub_b64, mldsa65_pub_b64, not_before, not_after? }`.
`status ∈ { "active", "rotating-in", "rotating-out", "retired" }`.

### §4.3 `rotate`

Input:
```json
{
  "additionalProperties": false,
  "required": ["new_kid", "overlap_window_s"],
  "properties": {
    "new_kid":          { "type": "string", "pattern": "^gw-sig(-1|\\.[a-z0-9]+\\.edge-signer\\.[0-9]+)$" },
    "overlap_window_s": { "type": "integer", "minimum": 86400, "maximum": 2592000, "default": 604800 }
  }
}
```

Semantics: spawns a new SignerDO at `idFromName(new_kid)`, marks self
`rotating-out`, marks new DO `rotating-in`. Both publish to JWKS for
`overlap_window_s`. After window expires, self transitions to `retired` and
sign() returns `E_KEY_RETIRED`. Defaults: 7d overlap; min 24h; max 30d.

### §4.4 `attestKeyGen`

Input: `{ "ceremony_nonce": "<uuid>", "env": "dev" | "prod" }` (`additionalProperties: false`).
Idempotent on `(kid, ceremony_nonce)`. First call generates the keypair and
returns:
```json
{
  "kid": "<string>",
  "alg": "Ed25519+ML-DSA-65",
  "ed25519_pub_b64": "...",
  "mldsa65_pub_b64": "...",
  "attestation": {
    "doc_b64": "<canonical CBOR, signed by DO with ephemeral attest-key>",
    "ceremony_nonce": "<uuid>",
    "generated_at": "<rfc3339>",
    "do_id": "<hex>",
    "env": "dev" | "prod"
    // attest-key cert chain values TBD W2 — depends on CF Worker attestation surface
  }
}
```
Subsequent calls with the same `(kid, ceremony_nonce)` return the same blob
(idempotency); a different `ceremony_nonce` against an initialised DO returns
`E_ALREADY_INITIALISED`.

---

## §5. Genesis Ceremony

**The G-2 Path-A env-var bootstrap (RFC-0020 v1 §G-2 Path-A) is REPLACED in
toto.** No `GW_HYBRID_*_SK_*` secret is ever materialised again, in any env.

### §5.1 Procedure

CEO (sole operator) runs **one `wrangler` invocation per `kid` per env**:

```
wrangler durable-objects rpc SignerDO --name=<kid> attestKeyGen \
    '{"ceremony_nonce":"<uuid>","env":"<dev|prod>"}'
```

- Idempotent on `(kid, ceremony_nonce)`. Re-running with same nonce is a no-op
  returning the original `AttestationDoc`.
- Output captured by CEO and committed to
  `release/keys/<env>/<kid>.attestation.json` (public material only — pubkeys +
  signed AttestationDoc; **no private key material**).
- Two ceremonies per env at minimum (dev + prod); one per regional `kid`
  introduced by RFC-0018.

### §5.2 Outputs

The ceremony produces *only*:

1. `ed25519_pub_b64`, `mldsa65_pub_b64` (committed to repo, served via JWKS).
2. `AttestationDoc` (committed to repo, verifiable offline by CTO/CEO).

Nothing else. There is no key-export step. There is no key-backup step. Loss
of the DO's underlying storage = key loss = forced rotation under §6.

### §5.3 Deprecation of v1 G-2 Path-A

| Artifact                              | Action                                  |
|---------------------------------------|-----------------------------------------|
| `GW_HYBRID_ED25519_SK_PEM` (dev)      | `wrangler secret delete` on ratification |
| `GW_HYBRID_MLDSA65_SK_B64` (dev)      | `wrangler secret delete` on ratification |
| `GW_HYBRID_*_SK_*` (prod)             | Never materialised under v2 (§5.1)       |

---

### §5.4 Rate-Limit Defaults (LOCKED 2026-05-07)

Every SignerDO enforces, per-`kid`, a `sign()` rate ceiling of:

- **Sustained:** 100 sig/sec
- **Burst:** 1,000 sig/min (token-bucket; refill 100/sec)

Over-limit → `E_SIGN_RATE_LIMITED` (HTTP 429 surfaced by the gateway). Counters
live in DO storage and reset on the bucket window; values are configurable via
DO storage key `gw/rate_limit_overrides` (operator-only; CEO unlock path per
[`signer-do-design.md`](./RFC-0020-cf-native-custody/signer-do-design.md) §6.3).
Defaults are CEO-locked; a write to `gw/rate_limit_overrides` MUST emit an
audit row.

## §6. Rotation

- Driven by `SignerDO.rotate()` (§4.3).
- **Dual-kid overlap** is mandatory: `default = 7d`, `min = 24h`, `max = 30d`.
- During overlap, JWKS (§8) serves **both** the outgoing and incoming pubkey
  entries; verifiers MUST accept either.
- Triggers (NORMATIVE):
  1. Scheduled (annual; CEO cron).
  2. Suspected compromise (ad-hoc; overlap MAY be set to `min = 24h`).
  3. Algorithm bump (requires new SignerDO type + RFC amendment; not a `rotate()`).
- `kid` value space is constrained by RFC-0018 §3 taxonomy.

---

## §7. Quorum Binding

- Capabilities with `safety_class ∈ {physical_actuation, power_control}` MUST
  set `requires_quorum: true` in their manifest entry. Per RFC-0006 **v1.1**,
  `requires_quorum: false` on these classes is no longer admissible (the v1.0
  §7.2 single-signer escape hatch is REMOVED). The gateway rejects manifests
  violating this with `E_MANIFEST_INVALID`.
- The SignerDO independently re-checks: a `sign()` whose `purpose =
  tool_call_envelope` and whose envelope targets a `requires_quorum: true`
  capability MUST carry a valid `quorum_token` field. Absence or invalidity →
  `E_QUORUM_REQUIRED`. This is *defence-in-depth* against a compromised
  dispatcher path — and, post RFC-0006 v1.1, the **only** path to physical
  actuation (no audit-row bypass exists).
- `quorum_token` is a JWS issued by **PolicyDO** (separate DO type;
  authoritative spec deferred to **RFC-0021** — issuer DID, claim shape, alg).
  The SignerDO contract only needs the *presence and verifiability* property
  here; coupling to an unratified PolicyDO RFC is intentionally avoided.
- Cross-reference: [`rfc-0006-safety-class-enforcement.md`](./rfc-0006-safety-class-enforcement.md)
  v1.1 §3(g) (gate algorithm) and v1.1 §7 (`requires_quorum` semantics).

---

## §8. JWKS

- Endpoint: `/.well-known/jwks.json` on the gateway Worker.
- Aggregates `getPublicKey()` output from all SignerDOs whose status is
  `active | rotating-in | rotating-out`. `retired` entries MUST NOT appear.
- `Cache-Control: max-age = floor(min(overlap_window_s)/4)`. With default 7d
  overlap → `max-age ≤ 151200` (42h). Verifiers MUST honour TTL.
- Shape: `{ "keys": [ { kid, alg, status, ed25519_pub_b64, mldsa65_pub_b64,
  not_before, not_after? } ] }`. Per-region kid expansion: see RFC-0018 §6.
- **Signed-URL delivery for a binary JWKS snapshot** (out-of-band CDN-friendly
  variant for verifier cold-start optimisation) is **deferred to RFC-0021**.
  v2 ships the JSON endpoint only.

---

## §9. Threat Model

| # | Threat                                                      | Mitigation                                                                               | Residual                                                       |
|---|-------------------------------------------------------------|------------------------------------------------------------------------------------------|----------------------------------------------------------------|
| 1 | Worker RCE → key extraction via SignerDO RPC abuse          | RFC-0006 v1.1 quorum gate (§7, no escape hatch); SignerDO per-`kid` rate-limit (§5.4: 100/sec sustained, 1k/min burst); script-isolation boundary (§4.0); audit log | **ACCEPTED-RESIDUAL** (CEO 2026-05-07: CF compromise ≡ key compromise) |
| 2 | DO storage exfil via `wrangler` (CF account compromise)     | CF account hygiene; 2FA; audit log of `wrangler` access (out-of-band)                    | **ACCEPTED-RESIDUAL** (same CEO directive)                     |
| 3 | Replay of `sign()` with stale payload                       | `request_id` UUID + DO-side dedupe window (values TBD W2); upstream nonce in payload     | None material                                                  |
| 4 | Rotation race (verifier sees new kid before JWKS refresh)   | Mandatory overlap window §6; JWKS `max-age ≤ overlap/4`; verifier MUST refetch on miss   | Brief verify-fail window bounded by JWKS TTL                   |
| 5 | Ceremony nonce reuse (operator error)                       | `attestKeyGen` idempotent on `(kid, nonce)`; second nonce on initialised DO ⇒ `E_ALREADY_INITIALISED` | None — forced by interface |
| 6 | PolicyDO compromise → bogus QuorumToken                     | Separate DO; separate signing key; out of this RFC's scope                               | Tracked in PolicyDO RFC                                        |

---

## §10. Migration Plan

| Phase  | When           | Action                                                                                  |
|--------|----------------|-----------------------------------------------------------------------------------------|
| M1     | Sprint 4 W2    | **Dev cutover.** CEO runs §5.1 ceremony for dev `kid`. Existing dev env-var keys torn down (§5.3). Gateway Worker switches to SignerDO RPC for dev. JWKS dev endpoint serves SignerDO pubkey only. |
| M2     | Sprint 4 W2    | Verifier audit (`edge/crates/transport/jwks.rs`) — confirms no hardcoded `gw-sig-1`; RFC-0018 §8 task. |
| M3     | Sprint 4 W3    | **Prod cutover.** CEO runs §5.1 ceremony for prod `kid`. Prod has *no* prior env-var keys (none ever materialised under v2). Gateway flips to SignerDO. |
| M4     | W3 + 7d        | Confirm no fallback path remains. Delete v1 (AWS KMS) feature flags / code from `workers/mcp-gateway/src`. |

**G-2 Path-A is DEPRECATED** as of RFC-0020 v2 ratification. No new env-var
secrets MAY be created.

### §10.1 CI Lint — Env-Var Key Reintroduction Block (W3 acceptance criterion)

A CI check MUST run on every PR with the following acceptance test:

```
grep -rE 'GATEWAY_.*_SK|GW_HYBRID_.*_SK_' workers/ \
  --include='*.ts' --include='*.js' --include='*.toml' --include='*.json' \
  --exclude-dir=node_modules
```

**MUST return 0 hits** post-cutover (M3 + 7d). A non-zero exit fails the PR.
Scope: `workers/` tree only (edge crates are out-of-scope for this lint;
hybrid SK material never resided there). Owner: `cloudflare-native-edge`
(W3 acceptance).

---

## §11. Non-Goals

- **FIPS-140 cryptographic boundary.** Cloudflare Workers / DOs do not
  expose a FIPS-140 validated module today. Out of scope; reopen if/when CF
  surfaces one. Customers requiring FIPS-140 will be told "not supported on
  this platform" (CEO accepted).
- **Multi-cloud / multi-account custody.** CEO single-account directive.
- **Tenant key custody.** Gateway-identity keys only.
- **Native ML-DSA-65 in CF runtime.** Currently `@noble/post-quantum` in
  WASM/JS inside the DO; native acceleration is a future amendment.

---

## §12. Open Questions

1. **PolicyDO contract details** — issuer DID, claim shape, alg-pair for
   `quorum_token`. **Deferred to RFC-0021** (do not author here). Owner:
   `agentic-architect` + `devex-protocol-sec`.
2. **DO storage durability guarantee** — what is the documented CF SLA on DO
   storage loss? Drives §6 rotation-trigger #2 framing. Owner:
   `cloudflare-native-edge`. (**values TBD W2**.)
3. **AttestationDoc cert chain** — what root does the DO's attestation key
   chain to on Cloudflare? CF does not currently publish a verifiable
   workload-attestation root; surfacing this as a known limitation.
   (**values TBD W2**.)
4. **RFC-0016 §6.5.1 amendment vs no-change memo** — does the kid-set-aware
   refresh rule (RFC-0018 §7 alias window) require a normative diff in
   RFC-0016, or is the CEO 48h SLA review (S4-D4) sufficient as a
   no-change memo? Owner: `devex-protocol-sec`.

> **Locked-and-removed (2026-05-07):** the v2 draft previously listed open
> questions on (a) SignerDO deployment locus — now §4.0 (separate Worker
> `signer-worker`); (b) per-PoP signer mapping — now §4.0 + RFC-0018 §4;
> (c) JWKS signed-URL binary snapshot — now §8 (deferred to RFC-0021);
> (d) PolicyDO contract — deferred to RFC-0021 (Q1 above is the stub);
> (e) sign() rate-limit numerics — now §5.4 (100/sec sustained, 1k/min burst);
> (f) CI lint regex — now §10.1.

---

## §13. Handoff

- **Next persona:** `☁️ cloudflare-native-edge`.
- **Expected artifact:** `rfcs/RFC-0020-cf-native-custody/signer-do-design.md`
  finalised against §4 RPC contracts; `workers/mcp-gateway/src/signer-do.ts`
  implementing same; M1 dev-cutover runbook.
- **Then:** `🛡️ devex-protocol-sec` audits §9 threat table against the
  implementation; `🦀 edge-kubelet-engineer` audits `transport/jwks.rs` for
  hardcoded kid (RFC-0018 §8).

---

## Changelog

### 2026-05-07 v2.0.0: RATIFIED. CEO auto-approve directive 2026-05-07; merged via PR #48 (0018/0020).
