# RFC-0002 — WSS Envelope v2.0 (`aethermesh.v1` subprotocol)

- **Status:** APPROVED 2026-04-21 (v2.2 corrigendum).
- **Author:** 🧠 Agentic Architect
- **Sprint:** 1 (mid-flight pivot)
- **Supersedes:** [rfc-0002-pubsub-topic-taxonomy.md](rfc-0002-pubsub-topic-taxonomy.md) (v1.0, marked SUPERSEDED).
- **Related:** [rfc-0001-mcp-json-v1.md](rfc-0001-mcp-json-v1.md) v1.2 (UNCHANGED contract; off-by-one numeric fix only), [rfc-0003-rbac-token-format.md](rfc-0003-rbac-token-format.md) v1.2 (companion amendment introduces the `device-runtime` token class consumed here).
- **Audience:** Cloud Agents (LLMs); 🦀 edge-kubelet-engineer; ☁️ cloudflare-native-edge; 🛡️ devex-protocol-sec.

---

## §1. Motivation

Cloudflare Pub/Sub has been deprecated and shut down mid-Sprint-1, invalidating the topic-tree substrate of RFC-0002 v1. The CTO-approved replacement is **WSS terminated by a Cloudflare Durable Object** (`DeviceConnectionDO`, one DO per `node_id`). This RFC redefines only the *delivery substrate*: every RFC-0001 invariant (Ed25519 manifest signing, JCS canonicalization, `additionalProperties:false`, ASCII error envelope, closed `kind`/`verb`/`safety_class` enums, content-addressable `schema_ref`) is preserved verbatim. This is a lockstep cutover; there is no MQTT/Pub-Sub dual-stack.

---

## §2. Connection Contract

### 2.1 URL pattern

```
wss://gateway-{env}.aethermesh.app/devices/connect
```

- `{env}` ∈ closed enum `{ dev, staging, prod }`.
- **NO** token in URL (CF access logs capture full URLs).
- **NO** `node_id` in URL — `node_id` is asserted in the auth frame (§2.3) and verified to equal the JWT `sub` claim. The DO routing key is then derived server-side from the validated `sub`.
- **NO** query string of any kind (rejected at gateway).

### 2.2 Subprotocol-based DO routing (clarified v2.1)

At the WSS upgrade, the gateway Worker has not yet seen the JWT (which is delivered in the first WS frame per §2.3) and therefore cannot derive the DO routing key from the validated `sub` claim. To route the still-unauthenticated upgrade to the correct `DeviceConnectionDO` instance, `aethermesh.v1` overloads the `Sec-WebSocket-Protocol` header with **two** subprotocol tokens carrying a routing hint.

- Client MUST send, on the upgrade request:

  ```
  Sec-WebSocket-Protocol: aethermesh.v1, node-<ULID26>
  ```

  - Exactly two comma-separated tokens, in this order. `<ULID26>` is the lowercase 26-char Crockford ULID `node_id` (regex `^[0-9a-hjkmnp-tv-z]{26}$`).
  - Tokens are case-sensitive. Any third token, missing token, malformed ULID, or wrong order → server rejects the upgrade with **HTTP 400**; no DO is allocated.
- Server (gateway Worker) MUST select **only** `aethermesh.v1` as the negotiated subprotocol and echo **only** `aethermesh.v1` in the 101 response's `Sec-WebSocket-Protocol` header. The `node-<ULID26>` token MUST NOT be echoed back.
- Server consumes the `node-<ULID26>` token as a routing hint only: `env.DEVICE_DO.idFromName(<ULID26>)`. The asserted ULID confers **no** authority of any kind.
- After the WS is accepted and the first auth frame arrives (§2.3), the DO MUST cryptographically verify the asserted `node_id` by comparing the upgrade-time `<ULID26>` to the JWT `sub` claim. Mismatch → close **4401** within the 5 s auth deadline (§6).
- Future protocol versions open as a **parallel first subprotocol** (`aethermesh.v2, node-<ULID26>`); no in-band version negotiation, no `hello` frame.

**Rationale.** At WS upgrade time no message body has been received yet, so the JWT — which lives in the first frame per RFC-0003 v1.2 §11 — is unavailable to the gateway when it must choose the DO. The subprotocol token carries the routing hint without introducing a new header, query string, or URL segment (all of which are forbidden for token material per §2.1 / §2.3 / RFC-0003 §9.5 W6).

**Threat note.** A malicious client could route to any DO by guessing or learning a valid `node_id`. This is acceptable because the upgrade-time ULID confers **no authority**: the auth frame's JWT signature MUST verify, and the JWT `sub` claim MUST equal the upgrade-time ULID. A wrong-DO connection is rejected at frame 1 with close 4401 within the 5 s auth deadline (§2.3, §6); no state is mutated and one audit row is written (`ws_open` with `decision=reject`).

### 2.3 Auth flow (first-frame placement, locked)

1. Server accepts the WS (after subprotocol check) and starts a **5-second auth deadline timer**.
2. Client MUST send, as the **first** frame:

   ```json
   { "type": "auth", "msg_id": "<client ULID>", "token": "<JWT>" }
   ```

   - `token` MUST be a **device-runtime** JWT per RFC-0003 v1.2 §11. Provisioning JWTs (24 h class) are explicitly rejected on WS — they are exchanged once at `POST /v1/devices/{node_id}/runtime-token` for a runtime token before WS open.
   - `token` MUST appear ONLY in this frame's `token` field. It MUST NOT appear in URL, query string, or any HTTP header (other than the protocol-level subprotocol declaration in §2.2).
3. Server verifies in this exact order, hard-rejecting on first failure:
   1. JWT signature (Ed25519, `kid` resolved against JWKS per RFC-0003 §3).
   2. `exp` not past, `iat` not future, TTL clamp `exp - iat ≤ 3600`.
   3. `token_class == "device-runtime"`.
   4. `scope` set contains `device:connect`.
   5. `sub` matches `^[0-9a-hjkmnp-tv-z]{26}$` (lowercase ULID).
   6. `jti` not in revocation table.
4. Server derives DO routing: `env.DEVICE_DO.idFromName(claims.sub)` and forwards the established socket to that DO instance.
5. DO replies with:

   ```json
   { "type": "auth_ack", "msg_id": "<server ULID>", "in_reply_to": "<client msg_id>" }
   ```

6. On any auth failure or auth-deadline expiry: server closes with the close code per §6 and writes one audit row (`ws_open` with `decision=reject`).

After `auth_ack`, the connection is in the **AUTHED** state and §3–§9 govern all subsequent frames.

---

## §3. Message Envelope (Draft 2020-12)

```json
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "mcp://schemas/ws.envelope@2.0.0",
  "title": "WSEnvelope",
  "type": "object",
  "additionalProperties": false,
  "required": ["type", "msg_id"],
  "properties": {
    "type": {
      "type": "string",
      "enum": [
        "auth", "heartbeat", "announce", "telemetry", "event", "cmd_ack", "resume",
        "auth_ack", "ack", "cmd", "resume_ack", "error"
      ],
      "description": "Closed enum. Unknown values rejected with `error` envelope and connection close (4400)."
    },
    "msg_id": {
      "type": "string",
      "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$",
      "description": "ULID. Per-direction namespace (see §8). Client assigns its own; server assigns its own; never overlap."
    },
    "in_reply_to": {
      "type": "string",
      "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$",
      "description": "Echoes the OTHER side's msg_id. REQUIRED on auth_ack, ack, resume_ack; OPTIONAL on error; FORBIDDEN elsewhere (see §5)."
    },
    "token": {
      "type": "string",
      "description": "JWT. REQUIRED on `auth` only; FORBIDDEN on every other type."
    },
    "last_acked_msg_id": {
      "type": "string",
      "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$",
      "description": "REQUIRED on `resume` only; FORBIDDEN elsewhere."
    },
    "missing_cmds": {
      "type": "array",
      "maxItems": 100,
      "items": { "$ref": "#/$defs/CmdFrame" },
      "description": "REQUIRED on `resume_ack` only; FORBIDDEN elsewhere."
    },
    "payload": {
      "description": "Type-specific body; validated against the per-type schema in §5."
    }
  },
  "$defs": {
    "CmdFrame": {
      "type": "object",
      "additionalProperties": false,
      "required": ["type", "msg_id", "payload"],
      "properties": {
        "type":   { "const": "cmd" },
        "msg_id": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
        "payload": { "$ref": "mcp://schemas/ws.cmd.payload@1.0.0" }
      }
    }
  }
}
```

The full per-type required-field combinations are normatively expressed by §5 (one schema per `type`); §3 is the umbrella shape used for cheap pre-dispatch validation at DO ingress.

---

## §4. Closed `type` Enum — Required-Fields Matrix

| `type`        | Direction | `msg_id` | `in_reply_to`           | `token` | `payload`              | Other required             |
|---------------|-----------|:--------:|:-----------------------:|:-------:|:----------------------:|----------------------------|
| `auth`        | C → S     | required | forbidden               | required| forbidden              | —                          |
| `heartbeat`   | C → S     | required | forbidden               | forbidden| forbidden             | —                          |
| `announce`    | C → S     | required | forbidden               | forbidden| required (Manifest)    | —                          |
| `telemetry`   | C → S     | required | forbidden               | forbidden| required (Sample)      | —                          |
| `event`       | C → S     | required | forbidden               | forbidden| required (EventBody)   | —                          |
| `cmd_ack`     | C → S     | required | required (server cmd)   | forbidden| required (CmdAckBody) | —                          |
| `resume`      | C → S     | required | forbidden               | forbidden| forbidden             | `last_acked_msg_id` required |
| `auth_ack`    | S → C     | required | required (client auth)  | forbidden| forbidden             | —                          |
| `ack`         | S → C     | required | required (client msg)   | forbidden| forbidden             | —                          |
| `cmd`         | S → C     | required | forbidden               | forbidden| required (CmdPayload) | —                          |
| `resume_ack`  | S → C     | required | required (client resume)| forbidden| forbidden             | `missing_cmds` required (may be empty array) |
| `error`       | S → C     | required | optional (offending msg)| forbidden| required (ErrorBody)  | —                          |

Any frame whose `type` is not in this enum → server emits one `error` (`E_PROTOCOL_UNKNOWN_FRAME`) and closes with code **4400**.

### §4.1 Field shape: `cmd_ack` and `error` (corrigendum, v2.2)

This subsection re-states normatively — to remove any room for misreading — that **`cmd_ack` and `error` use NESTED `payload`**, identical in shape-discipline to every other body-bearing envelope (`announce`, `telemetry`, `event`, `cmd`, `resume_ack`). No body fields ever sit at the top level of an envelope alongside `type` / `msg_id` / `in_reply_to`. The §3 umbrella schema's `payload` slot is the **only** legal carrier for type-specific bodies.

**Rationale.** (a) Internal consistency — every envelope that carries domain data already nests it under `payload`; flattening `cmd_ack` or `error` would make them the only two exceptions and break `additionalProperties:false` discipline at the umbrella schema. (b) Information theory — `cmd_ack.payload.ok` is required because `type:"cmd_ack"` only signals "the device received and processed the cmd"; the `ok` boolean is what distinguishes execution success from execution failure (with structured `result` vs `error`). Likewise `error.payload.{code,message,suggested_fix}` is required because the `type:"error"` discriminator alone carries no diagnostic content. (c) Rule of least surprise — a reader of §3–§5 expects the `payload` slot to hold the body for every type that has one.

**Canonical `cmd_ack` shape (C → S):**

```json
{
  "type": "cmd_ack",
  "msg_id": "<client ULID>",
  "in_reply_to": "<server cmd msg_id>",
  "payload": {
    "ok": true,
    "result": { }
  }
}
```

Required top-level fields: `type`, `msg_id`, `in_reply_to`, `payload`. Required `payload` fields: `ok` (boolean). Conditionally required: `payload.error` when `ok=false`; `payload.result` MAY appear when `ok=true`. `payload.error` MUST NOT appear when `ok=true`. Top-level `code` / `message` / `result` / `ok` / `error` are FORBIDDEN — they belong inside `payload`.

**Canonical `error` shape (S → C):**

```json
{
  "type": "error",
  "msg_id": "<server ULID>",
  "in_reply_to": "<offending client msg_id, if any>",
  "payload": {
    "code": "E_PROTOCOL_UNKNOWN_FRAME",
    "message": "Unknown envelope type.",
    "suggested_fix": "Use one of the closed enum values listed in RFC-0002 §3."
  }
}
```

Required top-level fields: `type`, `msg_id`, `payload`. Optional top-level: `in_reply_to` (present iff the error is in reply to a specific client frame). Required `payload` fields: `code`, `message`, `suggested_fix` (all ASCII, server-rendered from fixed templates per RFC-0001 §4 — never device-supplied). Optional `payload`: `retry_after_ms`, `correlation_id` (per RFC-0001 §4). Top-level `code` / `message` / `suggested_fix` are FORBIDDEN — they belong inside `payload`.

Validators on both sides MUST reject any frame that places these body fields at the top level (`additionalProperties:false` already enforces this at the umbrella schema; this subsection makes it explicit for human readers).

---

## §5. Per-Type Schemas

All schemas: Draft 2020-12, `additionalProperties: false`, ASCII strings only where shown.

### 5.1 `auth` (C → S)

```json
{
  "$id": "mcp://schemas/ws.auth@2.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["type", "msg_id", "token"],
  "properties": {
    "type":   { "const": "auth" },
    "msg_id": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "token":  { "type": "string", "minLength": 16, "maxLength": 4096,
                "pattern": "^[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+\\.[A-Za-z0-9_-]+$" }
  }
}
```

### 5.2 `auth_ack` (S → C)

```json
{
  "$id": "mcp://schemas/ws.auth_ack@2.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["type", "msg_id", "in_reply_to"],
  "properties": {
    "type":        { "const": "auth_ack" },
    "msg_id":      { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "in_reply_to": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" }
  }
}
```

### 5.3 `heartbeat` (C → S)

```json
{
  "$id": "mcp://schemas/ws.heartbeat@2.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["type", "msg_id"],
  "properties": {
    "type":   { "const": "heartbeat" },
    "msg_id": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" }
  }
}
```

Heartbeat is an application-level frame (not a WS PING). It refreshes the idle timer (§9) and counts as one inbound message against the §9 rate cap. Server does NOT `ack` heartbeats.

### 5.4 `announce` (C → S)

```json
{
  "$id": "mcp://schemas/ws.announce@2.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["type", "msg_id", "payload"],
  "properties": {
    "type":    { "const": "announce" },
    "msg_id":  { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "payload": { "$ref": "mcp://schemas/manifest@1.1.0" }
  }
}
```

`payload` is the **SignedManifest** as defined in RFC-0001 §2 (manifest_version `1.1.0`, full `node_attestation`, `expires_at_ms` REQUIRED). Server (DO) MUST verify the Ed25519 signature against the JWT-bound `node_id` BEFORE persisting; mismatch → `error` (`E_ATTESTATION_FAILED`) and connection close (4401).

### 5.5 `telemetry` (C → S)

```json
{
  "$id": "mcp://schemas/ws.telemetry@2.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["type", "msg_id", "payload"],
  "properties": {
    "type":    { "const": "telemetry" },
    "msg_id":  { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "payload": { "$ref": "mcp://schemas/system.metrics.sample@1.0.0" }
  }
}
```

For Sprint 1 the `payload` schema is the `MetricsSample` from RFC-0001 §5.4. Future kinds add their own sample schemas; the union is enforced at the DO via the projected manifest's `schema_ref` set.

### 5.6 `event` (C → S)

```json
{
  "$id": "mcp://schemas/ws.event@2.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["type", "msg_id", "payload"],
  "properties": {
    "type":    { "const": "event" },
    "msg_id":  { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "payload": {
      "type": "object", "additionalProperties": false,
      "required": ["event_kind", "ts_ms"],
      "properties": {
        "event_kind": { "type": "string",
                        "enum": ["going_offline", "manifest_changed", "rate_limited_self"] },
        "ts_ms":      { "type": "integer", "minimum": 1700000000000 },
        "detail":     { "type": "object", "additionalProperties": false }
      }
    }
  }
}
```

Closed `event_kind` enum; new event kinds require an RFC bump.

### 5.7 `cmd` (S → C)

```json
{
  "$id": "mcp://schemas/ws.cmd@2.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["type", "msg_id", "payload"],
  "properties": {
    "type":    { "const": "cmd" },
    "msg_id":  { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "payload": { "$ref": "mcp://schemas/ws.cmd.payload@1.0.0" }
  }
}
```

```json
{
  "$id": "mcp://schemas/ws.cmd.payload@1.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["tool", "arguments"],
  "properties": {
    "tool":         { "type": "string", "maxLength": 64,
                      "pattern": "^[a-z0-9_]+(\\.[a-z0-9_]+){3}$" },
    "arguments":    { "type": "object", "additionalProperties": false },
    "commit_token": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$",
                      "description": "Two-phase-commit token; REQUIRED for safety_class=physical_actuation per RFC-0001 §3. Sprint 1 unused (no physical kinds projected)." }
  }
}
```

`tool` MUST match the projection rule in RFC-0001 §3 (≤ 64 chars, four dot-joined segments). `arguments` is type-shaped by the projected tool's input schema; the DO MUST validate `arguments` against the tool's `schema_ref` before delivery.

### 5.8 `cmd_ack` (C → S)

```json
{
  "$id": "mcp://schemas/ws.cmd_ack@2.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["type", "msg_id", "in_reply_to", "payload"],
  "properties": {
    "type":        { "const": "cmd_ack" },
    "msg_id":      { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "in_reply_to": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$",
                     "description": "Server cmd msg_id." },
    "payload": {
      "type": "object", "additionalProperties": false,
      "required": ["ok"],
      "properties": {
        "ok":     { "type": "boolean" },
        "result": { "type": "object", "additionalProperties": false },
        "error":  { "$ref": "mcp://schemas/error@1.0.0" }
      },
      "allOf": [
        { "if": { "properties": { "ok": { "const": false } } },
          "then": { "required": ["error"] } },
        { "if": { "properties": { "ok": { "const": true } } },
          "then": { "not": { "required": ["error"] } } }
      ]
    }
  }
}
```

### 5.9 `ack` (S → C)

```json
{
  "$id": "mcp://schemas/ws.ack@2.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["type", "msg_id", "in_reply_to"],
  "properties": {
    "type":        { "const": "ack" },
    "msg_id":      { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "in_reply_to": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" }
  }
}
```

Sent by the DO ONLY after the originating `announce` / `telemetry` / `event` is durably persisted (manifest → DO storage; telemetry → enqueued to telemetry sink; event → enqueued to audit). Heartbeats and cmd_acks are NOT acked.

### 5.10 `resume` (C → S)

```json
{
  "$id": "mcp://schemas/ws.resume@2.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["type", "msg_id", "last_acked_msg_id"],
  "properties": {
    "type":              { "const": "resume" },
    "msg_id":            { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "last_acked_msg_id": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$",
                           "description": "ULID of the last server `cmd` the client persisted as cmd_acked. Server scans DO `cmd_queue` for entries with msg_id > this ULID." }
  }
}
```

A `resume` frame MAY ONLY appear immediately after `auth_ack`. It is OPTIONAL; a fresh client with empty outbox may skip it and proceed directly to `announce`.

### 5.11 `resume_ack` (S → C)

```json
{
  "$id": "mcp://schemas/ws.resume_ack@2.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["type", "msg_id", "in_reply_to", "missing_cmds"],
  "properties": {
    "type":         { "const": "resume_ack" },
    "msg_id":       { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "in_reply_to":  { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "missing_cmds": {
      "type": "array",
      "maxItems": 100,
      "items": { "$ref": "mcp://schemas/ws.cmd@2.0.0" }
    }
  }
}
```

### 5.12 `error` (S → C)

```json
{
  "$id": "mcp://schemas/ws.error@2.0.0",
  "type": "object", "additionalProperties": false,
  "required": ["type", "msg_id", "payload"],
  "properties": {
    "type":        { "const": "error" },
    "msg_id":      { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "in_reply_to": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$" },
    "payload":     { "$ref": "mcp://schemas/error@1.0.0" }
  }
}
```

`payload` is the unmodified RFC-0001 §4 error envelope (`code`, `message`, `suggested_fix`, optional `retry_after_ms`, optional `correlation_id`). `message` and `suggested_fix` are ASCII-only (regex `^[\x20-\x7E]*$`) and rendered server-side from fixed templates per `code`; client-supplied or device-supplied strings are NEVER interpolated.

---

## §6. WebSocket Close Codes (custom 4xxx)

All close codes are server-initiated unless noted. The client MUST treat any 4xxx code as fatal-for-this-connection and MUST NOT immediately reconnect with the same token if the code is 4401 (auth failure).

| Code | Name                  | Trigger                                                                 | Client retry policy                                |
|-----:|-----------------------|-------------------------------------------------------------------------|----------------------------------------------------|
| 4400 | invalid_frame         | Schema validation failure on any frame; unknown `type`; malformed JSON. | Reconnect with backoff; treat as protocol bug.     |
| 4401 | auth_failed           | JWT invalid / expired / wrong scope / wrong class / sub mismatch / **auth deadline (5 s) exceeded**. | Refresh device-runtime token via control-plane; do NOT retry with same token. |
| 4408 | idle_timeout          | No app-level frame (incl. heartbeat) received for **90 s**.             | Reconnect with backoff.                            |
| 4413 | frame_too_large       | Inbound frame > **64 KiB**.                                             | Reconnect with backoff; fix client-side framing.   |
| 4429 | rate_limited          | Inbound rate > **20 msgs/sec** (default), OR `resume.missing_cmds` would exceed 100. | Backoff per `Retry-After`-equivalent (next reconnect must wait ≥ 5 s). |
| 4499 | server_shutdown       | DO eviction, gateway redeploy, or graceful drain.                       | Reconnect with backoff; full state on server side.  |

WS standard codes (1000, 1001, 1006, 1009, 1011) MAY also appear from the underlying stack; clients MUST treat 1006 (abnormal closure) as a transport-level failure and reconnect with backoff.

---

## §7. Acks & Replay

### 7.1 Server-side ack obligation

The DO MUST emit exactly one `ack` per client `announce | telemetry | event` after persistence completes:

- `announce` → after `manifest` is written to DO storage AND projected schema validated.
- `telemetry` → after the sample is enqueued to the telemetry sink (Cloudflare Queue, per CTO decision).
- `event` → after the event is enqueued to `AUDIT_QUEUE`.

`ack.in_reply_to` carries the client `msg_id`. `ack.msg_id` is a fresh server-assigned ULID.

`heartbeat` and `cmd_ack` are NOT acked (would create infinite ping-pong).

### 7.2 Client outbox

Client retains every un-acked `announce | telemetry | event` in a local outbox keyed by `msg_id` in ULID order. On reconnect (after `auth_ack` and OPTIONAL `resume`/`resume_ack` exchange), client replays outbox entries **in `msg_id` ascending order**. Server deduplication is by `(node_id, msg_id)`; replays of an already-persisted `msg_id` are silently re-acked (idempotent).

### 7.3 Resume protocol

1. After `auth_ack`, client MAY send `resume` with `last_acked_msg_id` = the ULID of the last server `cmd` it has cmd_acked AND persisted locally.
2. DO scans its `cmd_queue` for entries with `msg_id > last_acked_msg_id`.
3. If the scan would yield > 100 entries: DO closes with **4429** (client must reconnect; older cmds beyond the window are not redeliverable via resume — the client must treat as missed and rely on the operator-level reconciliation flow, out of scope for Sprint 1).
4. Otherwise: DO sends `resume_ack` with `missing_cmds` = the (possibly empty) array of those `cmd` frames in `msg_id` order.
5. Client processes each, emitting `cmd_ack` for each, before sending any new client-originated frame.

Cmds older than **24 h** are evicted from DO storage and unrecoverable via resume.

---

## §8. Server-Assigned `msg_id` Rule

- The `msg_id` namespace is **per-direction**. Client-originated frames carry client-assigned ULIDs; server-originated frames carry server-assigned ULIDs. The two namespaces never overlap and are never compared for equality.
- A server msg_id MUST be a fresh ULID generated server-side at the moment of frame emission. The client MUST NOT echo a server msg_id back as its own `msg_id` field on any subsequent frame.
- Request/response correlation uses the `in_reply_to` field, which carries the OTHER side's msg_id:
  - `auth_ack.in_reply_to` = client `auth.msg_id`.
  - `ack.in_reply_to` = client `announce|telemetry|event.msg_id`.
  - `resume_ack.in_reply_to` = client `resume.msg_id`.
  - `cmd_ack.in_reply_to` = server `cmd.msg_id`.
- Rationale: per-direction namespaces eliminate forged-correlation attacks (a malicious client cannot forge a server msg_id to replay an `ack`) and remove the trust-the-client-clock concern that v1 footnoted.

---

## §9. Limits & Timers

| Knob                              | Value           | Enforcement point      | Violation → close code |
|-----------------------------------|-----------------|------------------------|-----------------------|
| Max frame size (inbound)          | **64 KiB**      | Gateway + DO ingress   | 4413                  |
| Auth deadline after WS open       | **5 s**         | DO timer               | 4401                  |
| Client heartbeat cadence          | **30 s**        | Client (server measures via idle timer) | (idle → 4408) |
| Idle timeout (no inbound frame)   | **90 s**        | DO timer               | 4408                  |
| Inbound rate cap (per connection) | **20 msgs/sec** | DO sliding-window counter | 4429               |
| Resume scan window                | **100 cmds**    | DO storage scan        | 4429                  |
| DO `cmd_queue` retention          | **24 h**        | DO storage TTL         | (silent eviction)     |
| Device-runtime JWT TTL            | **≤ 1 h**       | Verifier (RFC-0003 §11)| 4401 on next auth     |

Hibernation (CF WS Hibernation API) is permitted by the DO between any two messages within the idle window; storage state is preserved across wakes.

---

## §10. Removed (vs RFC-0002 v1)

The following v1 surfaces are deleted and have no replacement at the transport layer:

- The entire `$devices/{node_id}/...` topic tree (`announce`, `telemetry`, `heartbeat`, `cmd`, `ack`, `cmd/_dlq`).
- MQTT-style retained-message semantics. Late-joining cloud agents now read the manifest via a synchronous HTTP path (`GET /v1/devices/{node_id}/manifest`) backed by DO storage; this is documented in the gateway plan, not in this RFC.
- MQTT QoS levels (0/1) and the LWT (Last Will and Testament) primitive. Replaced by the §7 app-level ack-and-replay protocol; offline detection is the DO observing `onclose` (graceful) or socket-level RST (abrupt).
- Wildcard subscription rules (`$devices/+/announce`, `$devices/+/+`, etc.). Cross-node fan-out for control-plane consumers is a separate Worker concern, not part of this transport contract.
- Broker HMAC-signed webhook envelope (`PUBSUB_WEBHOOK_SECRET`, `WebhookPayloadSchema`). The Pub/Sub → Worker webhook hop no longer exists.
- The v1 envelope fields `topic_kind`, `correlation_id` (renamed to `in_reply_to` with stricter scoping), and the topic-segment-vs-envelope `node_id` cross-check (now enforced via JWT `sub` ≡ DO routing key).

---

## §11. Open Questions

None (re-confirmed 2026-04-21 at v2.1). All transport-layer ambiguities are resolved by CTO-locked decisions baked into §2, §6, §7, §9 and the companion RFC-0003 v1.2 §11.

---

## §12. Changelog

- **v2.2 — 2026-04-21 — corrigendum:** nail down cmd_ack and error field shapes; both implementations had diverged from each other and from the under-specified v2.1 text. Canonical shape for both is NESTED `payload` (re-stated normatively in new §4.1). NON-BREAKING — matches the shape already implemented by 🦀 edge transport and ☁️ Worker domain layer; no code changes required, spec text only.
- **v2.1 — 2026-04-21 — clarification:** codified subprotocol-based DO routing pattern (`Sec-WebSocket-Protocol` carries `aethermesh.v1, node-<ULID>`); ratification of ☁️ implementation. NON-BREAKING.
- **v2.0.0 — 2026-04-21:** Breaking. Replaces the MQTT/Pub-Sub topic tree with a WSS envelope terminated by `DeviceConnectionDO` (1 DO per `node_id`). Pinned WS subprotocol `aethermesh.v1`; first-frame JWT auth with 5 s deadline; 64 KiB frame cap; 90 s idle; 20 msg/s rate cap; app-level ack + outbox replay; per-direction msg_id namespaces; resume protocol with 100-cmd / 24 h windows. RFC-0001 contract preserved verbatim.

## v2 Cross-Reference (additive 2026-05-03)

RFC-0012, RFC-0013, RFC-0014 (v1.0, 2026-05-03): supersede portions of this RFC. v3.0 amendment scheduled end-of-sprint-3-extended (2026-05-30).
