Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.fluxamerica.io/llms.txt

Use this file to discover all available pages before exploring further.

1. Overview

A client opens a WebTransport session to the platform. The QUIC handshake establishes transport-level encryption (TLS 1.3) and yields a long-lived session. The first application-layer act is for the client to open a bidirectional control stream and send an auth handshake. The server validates, returns its capability set, and the session is live. Thereafter, every interaction with the platform is one of:
  • An op — a single typed request/response. The client opens a new bidirectional op stream, writes the envelope and the request body, reads the response, closes the stream.
  • A subscription — a long-lived op that receives server-pushed events on a dedicated subscription stream (a unidirectional server-to-client stream parented by an op stream) until the client cancels.
  • Bulk data movement — Avro Object Container Files transferred on one or more data streams (unidirectional, parented by an op stream).
  • Datagrams — fire-and-forget client telemetry; not used for product traffic.
When the session ends (network drop, explicit GracefulShutdown control message, or auth-token expiry), the client reconnects with a fresh handshake. There is no mid-session reconfiguration.

2. Session lifecycle

2.1 QUIC and WebTransport handshake

Standard. The server presents a TLS 1.3 certificate; the client validates per browser PKI (web) or per pinned cert (native). The WebTransport upgrade succeeds; a QUIC session is live.

2.2 Control stream

Immediately after the session is up, the client opens a single bidirectional WebTransport stream. This is the control stream. There is exactly one per session; opening a second control stream is a protocol error (OpError::Conflict) and the offending stream is closed by the server.

2.3 Auth handshake

The first frame on the control stream is ClientHello:
struct ClientHello {
    supported_protocol_versions: Vec<u32>,   // descending preference
    client_capabilities: CapabilitySet,
    auth: AuthToken,                          // WorkOS-signed JWT
    client_metadata: ClientMetadata,          // product, version, OS
}

struct CapabilitySet(pub BitFlags<Capability>);

enum Capability {
    // v1.0 starter set
    SubscriptionResume,
    BulkRecordImport,
    BulkRecordExport,
    AuditTailSubscription,
    // ...
}

struct ClientMetadata {
    client_name: String,                      // "platform-cli", "web-leptos"
    client_version: String,                   // semver
    user_agent: Option<String>,               // web only
    locale: Option<String>,
}
The server validates the JWT against WorkOS JWKS (cached per the existing auth module), resolves it to a principal and tenant, picks the highest mutually-supported protocol_version, intersects the capability sets, and replies:
struct ServerHello {
    selected_protocol_version: u32,
    server_capabilities: CapabilitySet,       // intersection of client ∩ server
    session_id: SessionId,                    // for log correlation; opaque to client
    server_metadata: ServerMetadata,
    auth_expires_at: Timestamp,               // when the bound token expires
}
If the validation fails, the server replies with OpError::Auth(...) on the control stream and closes the session. There is no retry within a session — the client must reconnect. The handshake budget is 5 seconds wall-clock. The server closes any session that hasn’t completed the handshake within that window. (OpError::Auth(HandshakeTimeout) is best-effort — the session may be torn down without a final frame.)

2.4 Heartbeat

After the handshake, the client and server exchange Ping/Pong frames on the control stream every 30 seconds. The server sends the first Ping; the client must respond with Pong within 10 seconds or the server closes the session. Symmetric the other direction. Ping and Pong are zero-payload variants of ControlFrame (defined below).

2.5 Server-pushed events

Subscription events flow on dedicated unidirectional server-to-client streams (see § 6). Other server-initiated control traffic — ServerNotice, capability changes during a session (none in v1), graceful-shutdown notices — flows on the control stream as variants of ServerControlFrame.

2.6 Graceful shutdown

The server signals graceful shutdown by sending ServerControlFrame::Shutdown { reason, drain_deadline_ms } on the control stream. Clients have until drain_deadline_ms (default 30 000) to finish in-flight ops. New op streams opened after the shutdown notice are rejected with OpError::PreconditionFailed(SessionDraining). After the deadline, the server closes the QUIC session. Client-initiated shutdown is ClientControlFrame::Goodbye. Same drain semantics.

2.7 Token expiry

When the auth token expires (auth_expires_at from ServerHello), the server completes in-flight ops on a best-effort basis but rejects new op-stream creation with OpError::Auth(TokenExpired). The client must reconnect with a fresh token. There is no mid-session token rotation in v1.

3. Wire format and framing

3.1 Framing on a stream

Every WebTransport stream carries length-prefixed Postcard frames:
+--------+------------------+
| varint |    payload       |
+--------+------------------+
  N bytes    N-decoded bytes
The varint is Postcard’s standard varint encoding (LEB128-style). The payload is a Postcard-serialized Rust value. The frame type is implied by the stream’s role and position:
  • Control stream, position 0 (client→server): ClientHello.
  • Control stream, position 0 (server→client): ServerHello.
  • Control stream, position ≥1: ClientControlFrame or ServerControlFrame (enum dispatched per direction).
  • Op stream, position 0 (client→server): OpEnvelope.
  • Op stream, position 1 (client→server): Op (enum).
  • Op stream, position 0 (server→client): Result<OpResponse, OpError>.
  • Op stream, position ≥1 (server→client): typed continuation frames for ops that produce multiple frames (rare; see § 6 for subscriptions, which use dedicated subscription streams instead).
  • Data stream: a single Avro Object Container File (not Postcard).
The framing is not self-describing. The expected frame type at each position is fixed by the protocol’s grammar (this document, mirrored by the Rust types in src/protocol/). A length-prefix that decodes to an invalid Postcard payload for the expected type closes the stream with OpError::Internal(WireFormatError).

3.2 The op envelope

struct OpEnvelope {
    protocol_version: u32,                    // must equal ServerHello.selected_protocol_version
    capabilities_used: CapabilitySet,         // must be ⊆ ServerHello.server_capabilities
    trace_context: Option<TraceContext>,      // W3C Trace Context (traceparent + tracestate)
    idempotency_key: Option<IdempotencyKey>,  // 128-bit; required for mutating ops
    deadline: Option<DurationMillis>,         // client-imposed soft deadline
}
  • protocol_version mismatch closes the stream with OpError::PreconditionFailed(VersionMismatch). The client should treat this as a session-level error and reconnect.
  • capabilities_used not a subset of the negotiated set: OpError::PreconditionFailed(CapabilityNotNegotiated { missing }).
  • trace_context follows W3C Trace Context: traceparent + optional tracestate. Generated by the client; the server propagates it to internal spans and to audit events as correlation_id (per ADR-0007).
  • idempotency_key is a 128-bit value supplied by the client. The server stores (idempotency_key, principal, op_kind) → response for 24 hours. Re-submitting an op with the same key returns the cached response without re-executing. Required on every mutating op; optional on reads. Storage of the cache and its eviction policy live in docs/OPERATIONS.md once Phase 3 lands.
  • deadline is advisory: the server may abort the op when the deadline passes (OpError::PreconditionFailed(DeadlineExceeded)); it is not a guarantee.

3.3 Stream cancellation

The client cancels an op by closing its op stream’s write side. The server observes the close, aborts the op (best-effort), and closes its write side. There are no cancellation responses — the close itself is the cancel. Server-initiated cancellation (timeout, shutdown) closes the stream from the server side with OpError::* written first when possible.

3.4 Maximum frame size

OpEnvelope, Op, OpResponse, and OpError are each capped at 64 KiB serialized. Larger payloads (record bodies, query results) MUST flow on data streams as Avro. Exceeding 64 KiB on a control or op stream closes the stream with OpError::PreconditionFailed(PayloadTooLarge). Subscription event frames are capped at 256 KiB to accommodate fan-out scenarios; oversized events are split across multiple EventChunk frames (see § 6).

4. Op model

4.1 The Op enum

enum Op {
    // Models and fields
    CreateModel(CreateModel),
    GetModel(GetModel),
    ListModels(ListModels),
    UpdateModel(UpdateModel),
    TombstoneModel(TombstoneModel),
    PublishModelVersion(PublishModelVersion),
    ListModelVersions(ListModelVersions),
    GetModelVersion(GetModelVersion),
    AddField(AddField),
    UpdateField(UpdateField),
    TombstoneField(TombstoneField),

    // Records
    CreateRecord(CreateRecord),
    GetRecord(GetRecord),
    UpdateRecord(UpdateRecord),
    DeleteRecord(DeleteRecord),
    ListRecords(ListRecords),

    // Bulk
    QueryRecords(QueryRecords),               // server returns data stream
    ImportRecords(ImportRecords),             // client sends data stream
    ExportModel(ExportModel),                 // server returns multi-stream bundle

    // Subscriptions
    SubscribeRecords(SubscribeRecords),
    SubscribeModels(SubscribeModels),
    SubscribeAuditTail(SubscribeAuditTail),

    // Attachments
    InitiateAttachmentUpload(InitiateAttachmentUpload),
    CompleteAttachmentUpload(CompleteAttachmentUpload),
    RequestAttachmentDownload(RequestAttachmentDownload),

    // Audit
    GetAuditEvent(GetAuditEvent),
    ListAuditEvents(ListAuditEvents),

    // Introspection
    DescribeProtocol(DescribeProtocol),       // returns server schema digest
    DescribeAvroSchema(DescribeAvroSchema),   // returns Avro schema for a given target
}
The same enum is the source of truth for both server (src/protocol/op.rs) and client (in the separately-versioned client library). Variant addition is capability-gated (see § 8). Adding a field to an existing variant is a wire-breaking change and requires a major-version bump.

4.2 The OpResponse enum

enum OpResponse {
    Empty,                                    // for ops with no return body
    Model(Model),
    ModelList { items: Vec<Model>, next_cursor: Option<Cursor> },
    Record(Record),
    RecordList { items: Vec<Record>, next_cursor: Option<Cursor> },
    DataStreamRef {
        stream_id: DataStreamId,
        schema_fingerprint: SchemaFingerprint,
        expected_record_count: Option<u64>,
    },
    DataStreamAccepted { stream_id: DataStreamId },
    SubscriptionAccepted {
        subscription_id: SubscriptionId,
        resume_token: ResumeToken,
        event_stream_id: StreamId,
    },
    AttachmentUploadInitiated(AttachmentUploadInitiation),
    AttachmentDownloadGranted(AttachmentDownloadGrant),
    AuditEvent(AuditEvent),
    AuditEventList { items: Vec<AuditEvent>, next_cursor: Option<Cursor> },
    ProtocolDescription(ProtocolDescription),
    AvroSchema(AvroSchemaDocument),
}
A response variant is paired with its Op variant by convention; the typechecker on the server enforces the mapping. Unexpected response variants close the stream as OpError::Internal(WireFormatError).

4.3 Cursor pagination

Cursor-based pagination (carried over conceptually from ADR-0012). A Cursor is opaque to the client — a 16-byte tag plus a Postcard-encoded server-internal payload. The server signs the cursor with a per-tenant HMAC to detect tampering; tampered cursors return OpError::PreconditionFailed(InvalidCursor). Page size: default 50, max 250. Limits are server-enforced; the client’s limit parameter is advisory.

4.4 Idempotency

See § 3.2. Idempotency keys are required for every mutating op variant (Create*, Update*, Delete*, Tombstone*, Publish*, Import*, Complete*, Initiate*). Read ops and subscription ops do not require keys; their idempotency is structural.

5. V1 op catalogue

Op bodies are structurally simple: a transcription of the request parameters into a Postcard struct. Below is the v1 inventory with notes; exact field-by-field shapes live in src/protocol/op.rs and are documented in this design doc (the sole canonical spec since the docs/protocol.md collapse per ADR-0054).

5.1 Model and field ops

OpMutatingNotes
CreateModelyesReturns the created Model with system-derived slug.
GetModelnoBy slug.
ListModelsnoCursor-paginated; optional filter by lifecycle.
UpdateModelyesPretty name only — slug is immutable.
TombstoneModelyesSoft delete; slug enters tombstone set.
PublishModelVersionyesPromotes a draft to active; per ADR-0014’s evolution semantics.
ListModelVersionsnoHistory of a model.
GetModelVersionnoA specific version.
AddFieldyesAdds a field to a model; creates a new draft model version if one isn’t open.
UpdateFieldyesPretty name, validation rules, cardinality, metadata. Slug is immutable.
TombstoneFieldyesField slug enters tombstone set; can be undone only via replay admin op (out of band).

5.2 Record ops

OpMutatingNotes
CreateRecordyesReturns the created record with system-projected metadata.
GetRecordnoBy model slug + record id.
UpdateRecordyesPartial; body is a sparse map of field_slug → value.
DeleteRecordyesSoft-delete; tombstones the record id. Hard delete is admin-only (out of band).
ListRecordsnoCursor-paginated; optional Fx-predicate filter; returns Vec<Record> inline up to a server-enforced byte cap (~128 KiB), beyond which the response is a DataStreamRef to an Avro stream.

5.3 Bulk ops

OpDirectionNotes
QueryRecordsserver → client data streamReturns a DataStreamRef; records flow on the unidirectional data stream as an Avro OCF whose writer schema is the model’s Avro derivation at the version specified in the query.
ImportRecordsclient → server data streamClient opens a uni stream and writes an Avro OCF, then sends Op::ImportRecords on the op stream. The server reads the OCF to EOF, processes every record (no fail-fast), commits the valid ones, and returns ImportResult { accepted, rejected, failures } on the op stream. One record.created audit event per accepted row. Capability-gated by BulkRecordImport. v1 has no per-record idempotency — see ADR-0025.
ExportModelserver → multiple data streamsCapability-gated by BulkRecordExport (optional ZstdOcfCodec for compression). Server enumerates the model’s versions that have matching records (filtered by include_deleted), opens one uni stream per non-empty version (records as Avro OCF, ascending version order), then a final manifest uni stream (platform.bundle.ExportBundleManifest — model slug, tenant, export timestamp, codec, per-version stream IDs + fingerprints + record counts). Returns OpResponse::ExportBundle listing every stream. Per-stream cap: 64 MiB; oversize → PayloadTooLarge.

5.4 Subscription ops

OpNotes
SubscribeRecordsFiltered by model and optional Fx predicate (is_match). Returns SubscriptionAccepted with a resume_token. Events flow on a dedicated server→client unidirectional stream.
SubscribeModelsPer-tenant model and field lifecycle events.
SubscribeAuditTailPer-tenant audit-log tail; requires the audit_tail.read permission on the principal (governance integration).
Subscriptions are covered in § 6.

5.5 Attachment ops

Per ADR-0015:
OpNotes
InitiateAttachmentUploadClient supplies sha256, byte_size, mime_type. Server returns presigned S3 upload URL, attachment row in pending state.
CompleteAttachmentUploadClient confirms the S3 upload. Server transitions the attachment row to pending for scanning (or rejects if the sha256 doesn’t match).
RequestAttachmentDownloadServer gates on scan_state = clean per ADR-0015; returns a presigned S3 download URL or an OpError::PreconditionFailed(AttachmentNotClean).

5.6 Audit ops

OpNotes
GetAuditEventBy event id.
ListAuditEventsCursor-paginated; filtered by record, principal, time range, event type. Beyond an inline-byte cap, response is a DataStreamRef.

5.7 Introspection

OpNotes
DescribeProtocolReturns a Postcard-serialized ProtocolDescription: protocol version, supported capabilities, op-kind list, version digest. Drives client-side compatibility checks and AI tooling.
DescribeAvroSchemaReturns the Avro schema for a given target (Model { slug, version }, AuditEventType, etc.). Drives client code generation.

6. Subscriptions and live updates

Subscriptions are first-class. The model:
  1. Client sends a Subscribe* op on an op stream.
  2. Server returns SubscriptionAccepted { subscription_id, resume_token, event_stream_id } and opens a unidirectional server→client stream identified by event_stream_id.
  3. The op stream remains open. The server may push subscription-control messages (lease renewal, server-side cancellation) on the op stream.
  4. Events flow on the event stream as length-prefixed Postcard SubscriptionFrame values.
  5. The client cancels by closing the op stream’s write side. The server observes, closes the event stream, and the subscription is gone.

6.1 SubscriptionFrame

enum SubscriptionFrame {
    Accepted { subscription_id: SubscriptionId, resume_token: ResumeToken },
    Event {
        sequence: u64,                        // monotonically increasing per subscription
        correlation_id: CorrelationId,        // matches the originating audit event
        occurred_at: Timestamp,
        body: SubscriptionEvent,
    },
    EventChunk {
        sequence: u64,
        chunk_index: u32,
        chunk_count: u32,
        partial_body: Bytes,                  // raw Postcard chunk of SubscriptionEvent
    },
    Heartbeat { server_now: Timestamp },      // every 30s in idle subscriptions
    LeaseRenewed { expires_at: Timestamp },
    EndOfStream { reason: EndOfStreamReason },
}
The Accepted frame mirrors OpResponse::SubscriptionAccepted for symmetry on the event stream and is always frame 0.

6.2 Resume semantics

Subscriptions survive client reconnects. A Subscribe* op with resume_from: Some(resume_token) resumes from the position encoded in the token. The server:
  1. Validates the token (HMAC-signed per tenant; tamper detection identical to cursors).
  2. Locates the position in the audit log.
  3. Replays events from position + 1 until caught up, then transitions to live tailing.
If the resume position is older than the subscription’s gap window (default 60 minutes; configurable per subscription kind), the resume fails with OpError::PreconditionFailed(ResumeGapExceeded) and the client must reseed from a snapshot op (ListRecords + new subscription with no resume_from). Resume tokens are opaque, capped at 256 bytes, and embed: subscription_kind, last_delivered_sequence, audit_position, tenant_id, hmac.

6.3 Deduplication

Every Event frame carries the originating audit event’s correlation_id. Clients use this to deduplicate across reconnects when the server replayed events the client already saw.

6.4 Filtering

SubscribeRecords accepts an optional Fx predicate (per ADR-0017). The predicate is evaluated by the server against each candidate event before delivery. Predicates are subject to the same static-analysis budget as visibility rules (per ADR-0017’s per-surface budgets). Volatile predicates (reading now) are permitted but flagged in the audit event for the subscription’s subscription.opened audit entry.

6.5 Subscription leases

A subscription has a server-side lease (default 4 hours). The server sends LeaseRenewed ten minutes before expiry; the lease auto-renews if the client is responsive. If the lease expires without a heartbeat for 60 seconds, the server tears down the subscription (EndOfStream { reason: LeaseExpired }). Leases bound the server’s per-tenant subscription resource usage. Per-tenant subscription concurrency limits live in docs/OPERATIONS.md.

6.6 Backpressure

WebTransport stream flow control bounds the event stream’s send buffer. If the client cannot keep up (its receive window is full), the server queues events up to a per-subscription buffer cap (default 10 000 events). When the cap is reached, the server tears down the subscription (EndOfStream { reason: ClientTooSlow }) and the client must resume from the last acknowledged position. Acknowledgement is implicit: an event is considered acknowledged when the client’s QUIC stream window advances past it.

6.7 Server-pushed lifecycle

When a model’s schema changes mid-subscription (a new model version is published, a field is tombstoned), the server emits a SubscriptionEvent::ModelSchemaChanged { new_version } frame. The client may continue or reseed depending on its compatibility tolerance — the platform does not silently rewrite events to a different schema.

7. Bulk data: Avro

7.1 Object Container Files

Bulk data streams carry a single Avro Object Container File (OCF) per the Avro spec. The OCF format:
[magic 'Obj\x01']
[header: { meta: { 'avro.schema': <writer schema JSON>,
                   'avro.codec': <codec>,
                   'platform.model_slug': <slug>,
                   'platform.model_version': <u32>,
                   'platform.tenant_id': <tenant id>,
                   'platform.trace_id': <trace id> },
           sync_marker: <16 bytes> }]
[block]*
  • The writer schema is the canonical Avro representation of the target model version (records) or the relevant audit-event variant (audit exports).
  • The codec is null (uncompressed) in v1. Zstd codec is a capability-gated future addition.
  • Platform-specific metadata is embedded under the platform.* prefix. Avro readers ignore unknown metadata; platform readers use it for verification.

7.2 Avro schema derivation

Avro schemas are derived from the platform’s primitive set (per ADR-0014) by src/protocol/avro/schema.rs. The mapping is fixed and tested:
Platform primitiveAvro typeNotes
textstringUTF-8, NFC at the boundary.
integerlongi64.
decimalbytes with logicalType: decimalPrecision and scale per field.
booleanboolean
dateint with logicalType: dateDays since 1970-01-01.
timeint with logicalType: time-millisSub-second precision dropped to millis.
datetimelong with logicalType: timestamp-microsUTC.
durationfixed(12) with logicalType: durationPer Avro spec.
referencerecord { slug: string, model_slug: string }The reference handle’s public shape (no SurrealDB record id).
selectenum of the declared option set, OR string if the option set is too large (>200)Resolved at schema-derivation time.
attachmentrecord { object_key, sha256, byte_size, mime_type, ... }The attachment handle per glossary.
geopointrecord { lat: double, lon: double }
geoshapestring (WKT)Avro lacks a native geo type; WKT is the wire form.
addressrecord per address sub-shape
lookupas the projected primitiveThe lookup is transparent on the wire.
jsonstring (canonical JSON)The platform does not introspect the JSON.
Cardinality:
  • single → the inner type.
  • listarray<inner> with platform.cardinality: list, platform.ordered: true|false in field metadata.
  • setarray<inner> with platform.cardinality: set (Avro lacks a native set; uniqueness is a platform invariant).
Field-level metadata: every record field carries Avro field-level metadata (doc, default, aliases) plus a platform.* block (sensitivity, retention class, tags).

7.3 Schema evolution

Avro’s reader/writer schema resolution gives the platform forward and backward compatibility for free when the schema change is compatible (additive with defaults). The platform’s model-version evolution rules (per ADR-0014) are designed to produce compatible Avro schemas in the common case. The protocol commits the following:
  • Every Record carries the model_version it was written under (platform.model_version metadata in the Avro file).
  • The server publishes Avro schemas for every model version via DescribeAvroSchema. Clients cache these by (model_slug, model_version) pair.
  • When a client deserializes a record at a version newer than its cached schema, it fetches the new schema before proceeding. Avro reader/writer schema resolution handles the deserialization.

7.4 Codec policy

Compression is not in v1. Adding the zstd codec is a capability-gated future change (Capability::AvroZstdCodec). Streaming compression on the QUIC layer is the better long-term answer if/when it’s needed — deferred to a future ADR.

7.5 Conversion coverage and deferred primitives

The schema derivation in § 7.2 maps every platform primitive. The runtime value conversion in src/protocol/avro/record.rs ships a subset and will grow with the C.2 bulk-op slices. This section catalogues the deferred primitives, their technical requirements, and their target phase, so the work isn’t lost between commits and so reviewers can see the full picture in one place. The shipped subset (C.2.2) covers domain types whose JSON representation maps cleanly to a single apache_avro::types::Value variant: Text, Integer, Boolean, Date, Time, Datetime, Json, Reference, Select. Each works for all three cardinalities (single required, single optional via ["null", T] union, list/set as array<inner>). Deferred primitives report a structured RecordConvertError::UnsupportedPrimitive { slug, kind }. The error names the field so callers can surface actionable diagnostics. Each of the below lists what’s required to lift the restriction:

Decimal { precision, scale }

  • Domain shape. serde_json::Value::String containing the decimal literal (e.g. "123.45"). JSON numbers lose precision past f64 so the string form is canonical; the integer-and-scale form is also acceptable on input. Per ADR-0014.
  • Avro target. Value::Decimal(apache_avro::Decimal::from(bytes)) where bytes is the big-endian two’s-complement representation of the unscaled integer, with redundant sign-extension bytes trimmed (Avro spec § “Decimal logical type”).
  • Implementation requirements.
    1. Parse the JSON string via rust_decimal::Decimal::from_str (the crate is already a pinned dep for Fx; reuse).
    2. Validate that the parsed scale matches the field schema’s declared scale; mismatch → Validation error.
    3. Validate that the unscaled magnitude fits in precision digits; overflow → Validation error.
    4. Convert the i128 mantissa to a Vec<u8> of big-endian bytes, stripping leading 0x00 (for positives) or 0xFF (for negatives) bytes that don’t change the value but bloat the payload. Avro’s canonical encoding requires the minimal-bytes form.
    5. Reverse direction: reconstruct i128 from the byte slice (sign-extend if shorter than 16 bytes), format with the schema’s scale, return as JSON string.
  • Target phase. C.2.3 or C.2.5 — whichever bulk op first carries a decimal-typed field through. Self-contained; can also land opportunistically as a small commit.

Duration

  • Domain shape. Glossary: “ISO 8601 duration; stored as nanoseconds.” So the canonical domain value is an i64 nanosecond count, surfaced as serde_json::Value::Number(i64) or as the ISO string form for human readability.
  • Avro target. Value::Duration(apache_avro::Duration::new(months, days, millis)) where each component is a u32 little-endian. Total wire size 12 bytes.
  • Implementation requirements.
    1. Decide canonical input form: number-of-nanoseconds (precise) vs ISO 8601 string (round-trip-able with calendar units). Avro’s duration cannot represent both nanoseconds AND months — the month component is calendar-aware and ambiguous in milliseconds. The platform’s domain stores nanoseconds, so the conversion is lossy on the calendar dimension: we set months = 0, days = 0, compute millis = nanoseconds / 1_000_000, and truncate sub-ms precision. Document the loss in the wire spec.
    2. Reverse direction: months × ~30.44 days + days + millis as nanoseconds. Months > 0 from a non-platform writer would produce a precision-incompatible value; flag as a Validation error on decode if encountered (platform never writes months > 0).
  • Target phase. C.2.3 or later. Lower priority than decimal — duration fields are rare in practice.

Geopoint

  • Domain shape. serde_json::Value::Object with lat: f64 and lon: f64.
  • Avro target. Value::Record([("lat", Double(lat)), ("lon", Double(lon))]). Schema-side already renders correctly (§ 7.2).
  • Implementation requirements.
    1. Extract lat and lon as as_f64(); missing or non-numeric → TypeMismatch.
    2. Reverse direction: pairs into JSON object.
  • Target phase. C.2.3. Trivial; the only reason it’s deferred is that the C.2.2 substrate didn’t need it. Add when first geopoint field appears in an e2e test.

Geoshape

  • Domain shape. serde_json::Value::String containing WKT (Well-Known Text).
  • Avro target. Value::String(wkt). Schema-side renders as bare string.
  • Implementation requirements. One match arm. The only reason it was deferred in C.2.2 is that this list of “complex types not shipped” originally included geoshape by mistake — it’s actually trivial. Lift in C.2.3 as a one-line addition.
  • Target phase. C.2.3.

Address

  • Domain shape. serde_json::Value::Object with optional fields per the address sub-shape (line1, line2, locality, region, postcode, country).
  • Avro target. Value::Record([(name, Value::Union(0|1, …))*]) where each field is ["null", "string"]. Schema-side already renders this (§ 7.2).
  • Implementation requirements.
    1. For each known sub-field, look up in JSON object; convert present strings to Union(1, String(s)) and missing or null to Union(0, Null).
    2. Unknown sub-fields in the JSON should be ignored or surface as a Validation warning (open question; default: ignore).
    3. Reverse direction: walk the Avro Record pairs and rebuild the JSON object, omitting fields that decoded as null (matches the glossary’s “optional address sub-fields” semantics).
  • Target phase. C.2.3 or C.2.5. Address fields are real workloads but not on the critical path for the first end-to-end QueryRecords.

Attachment

  • Domain shape. serde_json::Value::Object with the full AttachmentHandle per the glossary entry: object_key, sha256, byte_size, mime_type, original_filename, scan_state, scan_at, scanner_version, uploaded_by, uploaded_at.
  • Avro target. A 10-field Value::Record(...). Schema-side already renders the type.
  • Implementation requirements.
    1. Type-mapping table: strings for textual fields, Long for byte_size, TimestampMicros for uploaded_at, and a nullable union for scan_at + scanner_version. The scan_state enum could remain a string (the platform’s enum is pending | clean | infected | expired, all valid Avro symbols) or upgrade to an Avro enum in a follow-up.
    2. Reverse direction: pairs → JSON object; map missing optionals to absent JSON keys (consistent with the existing HTTP wire form produced by crate::attachments).
  • Target phase. C.2.5 (ImportRecords) or Phase E (attachment ops), whichever first ships records carrying an attachment field through bulk transport.

Lookup { via_field_slug, project_field_slug }

  • Domain shape. Computed at read time from a referenced record. The stored record has no value of its own for a lookup field; the API materialises it during projection. So on the wire, the lookup field’s value is whatever the projected source field’s value is.
  • Avro target. “As the projected primitive” (design doc § 7.2). Requires knowing the target model’s field type for project_field_slug, which lives in a different ModelVersion.
  • Implementation requirements.
    1. Schema-side: schema_for_model_version currently emits string for lookup with a // TODO comment. The proper form requires: a. Looking up the referenced model (via the via_field_slug’s Reference { model_slug }). b. Fetching its current published version. c. Finding project_field_slug in that version’s field_specs. d. Emitting the projected field’s primitive schema in place of the lookup.
    2. This requires the derivation function to take a &dyn ModelStore (or equivalent) so it can resolve the reference. That’s a signature change to schema_for_model_version — the caller becomes async, and the function becomes recursive (lookups pointing at lookups would loop; guard with a depth limit).
    3. Cycle detection: (model_slug, version) → field_slug visits; loop → Validation error at publish time, not at read time.
    4. Value-side conversion uses whatever the projected primitive resolves to at materialisation time (which the existing crate::records code already handles for HTTP).
  • Target phase. Future ADR. Lookups are a v1.x concern; the v1.0 protocol can ship without them by emitting string for the schema and an UnsupportedPrimitive error for the value. The recursive derivation deserves its own ADR (signature change, async derivation, cycle detection policy).

Summary table

PrimitiveC.2.2 statusEffortTarget phase
DecimalDeferredModerate (rust_decimal byte encoding)C.2.3 / C.2.5
DurationDeferredModerate (lossy month/day mapping)C.2.3
GeopointDeferredTrivial (two doubles)C.2.3
GeoshapeDeferred (mis-classified)Trivial (one match arm)C.2.3
AddressDeferredModerate (nested record)C.2.3 / C.2.5
AttachmentDeferredModerate (10-field record + timestamps)C.2.5 / Phase E
LookupDeferredSignificant (recursive schema + ADR)Future ADR
All other primitives (the simple-subset above) are shipped.

8. Versioning and evolution

8.1 Major version

The protocol’s major version is a u32 advertised in ServerHello.selected_protocol_version. v1 = 1. Major bumps:
  • Are ADR-gated. A new ADR justifies the bump, supersedes ADR-0023, and documents what changed.
  • Are reserved for wire-framing changes, breaking semantic redefinitions, or removals that the capability system cannot express.
  • Are rare. The platform commits to at most one major bump every 18 months, except in response to a security or correctness issue.
When the platform supports multiple major versions concurrently (typical during a deprecation window), ClientHello.supported_protocol_versions lists the client’s options in descending preference; the server picks the highest mutually-supported.

8.2 Deprecation window

When a major version is to be retired, the platform commits to:
  • A minimum 6-month notice before removal, signaled via ServerNotice { version_deprecation: { version, sunset_at } } on the control stream of every session that handshakes onto the deprecated version.
  • A Warning header equivalent in ServerHello.server_metadata.warnings for the deprecated version.
  • Public communication via the same channel as ADR publication (release notes; downstream client repos).
After sunset, sessions advertising only the retired version receive OpError::Auth(ProtocolVersionUnsupported) and the QUIC session is closed.

8.3 Capabilities

Capabilities are the day-to-day evolution unit. Each is a named bit in CapabilitySet. Adding a capability:
  • Is NOT ADR-gated. It is a code change with a glossary update if the capability introduces a new concept.
  • Is recorded in src/protocol/capabilities.rs with a doc comment explaining its semantics, the op or behavior it gates, and the date it was added.
  • Triggers a contract test snapshot update (new capability in the digest).
Removing a capability:
  • Requires a deprecation window of 3 months during which the capability is advertised but server-side enforcement is “ignored, with warning.”
  • After removal, clients still requesting it get a ServerNotice { capability_removed: { name } } and the capability is intersected out of the negotiated set.

8.4 Field-level evolution within Postcard variants

Postcard variants are wire-frozen. Adding a field to an existing variant (e.g., adding force: bool to DeleteRecord) is a breaking change. The platform’s discipline:
  • DO NOT add fields to existing variants. Instead, add a new variant (e.g., ForceDeleteRecord) gated by a capability.
  • DO NOT rename fields. Postcard’s wire format depends on struct layout, not field names, but the discipline holds for clarity.
  • DO add new variants to existing enums freely; capability-gate them.
The dead_code lint and a custom tests/contract_protocol.rs snapshot test enforce this — adding a field to an existing variant changes the snapshot and fails CI without an explicit reset.

8.5 The capability inventory (v1.0)

v1.0 ships with the following capabilities:
CapabilityPurpose
SubscriptionResumeSubscribe* ops accept a resume_from: Option<ResumeToken>.
BulkRecordImportImportRecords op accepted.
BulkRecordExportQueryRecords and ExportModel ops accepted.
AuditTailSubscriptionSubscribeAuditTail op accepted (also gated on principal permission).
FxPredicateFilterSubscribeRecords accepts an Fx predicate filter.
IdempotencyKeyCacheServer promises 24-hour idempotency-key replay protection.
ZstdOcfCodecServer may emit zstd-compressed Avro Object Container Files on data streams (currently ExportModel; other bulk ops keep the null codec unless they opt in). Per docs/design/protocol.md § 7.1.
Future-reserved (not implemented in v1.0):
CapabilityPurpose
ArrowIpcDataStreamsData streams may use Apache Arrow IPC instead of Avro. (Per ADR-0023, deferred.)
BatchOpsMultiple ops in a single op-stream envelope.

9. Error model

Every op returns Result<OpResponse, OpError> over the wire. OpError is closed:
enum OpError {
    Auth(AuthError),
    Validation { errors: Vec<FieldError>, trace_id: TraceId },
    NotFound { kind: ResourceKind, slug: SlugRef },
    RateLimited { retry_after_ms: u32, scope: RateScope },
    PreconditionFailed(PreconditionDetail),
    Conflict(ConflictKind),
    Internal { trace_id: TraceId, kind: InternalErrorKind },
}

9.1 Variant semantics

Auth(AuthError) — token invalid, expired, principal lacks permission for the op. Sub-cases:
enum AuthError {
    TokenInvalid,
    TokenExpired,
    PrincipalLacksPermission { required: PermissionName },
    HandshakeTimeout,
    ProtocolVersionUnsupported,
}
Validation { errors, trace_id } — one or more field-level rule failures, per ADR-0017. FieldError carries the field slug, rule name, and human-readable message:
struct FieldError {
    field_slug: SlugRef,
    rule_name: Option<String>,                // None if structural (cardinality, primitive)
    message: String,
}
NotFound { kind, slug } — the named resource does not exist. ResourceKind is an enum (Model, Field, Record, ModelVersion, Subscription, AuditEvent, Attachment). RateLimited { retry_after_ms, scope } — covered in § 11. PreconditionFailed(PreconditionDetail) — every protocol-level precondition is here:
enum PreconditionDetail {
    DeadlineExceeded,
    VersionMismatch { client_sent: u32, server_active: u32 },
    CapabilityNotNegotiated { missing: CapabilityName },
    InvalidCursor,
    InvalidResumeToken,
    ResumeGapExceeded,
    PayloadTooLarge,
    SessionDraining,
    AttachmentNotClean,                       // ADR-0015
    SchemaIncompatible { details: String },
}
Conflict(ConflictKind) — concurrent modification, slug collision, idempotency-key reuse with a different request body:
enum ConflictKind {
    IdempotencyKeyMismatch,
    SlugAlreadyExists { slug: SlugRef },
    ModelVersionInUse,
    DuplicateControlStream,
}
Internal { trace_id, kind } — server-side errors. InternalErrorKind is a coarse enum (Storage, Projection, WireFormat, Unexpected); details live in the trace, not on the wire.

9.2 Audit correlation

Every error variant carries something correlatable to an audit event:
  • Validation and Internal carry trace_id, which is the correlation_id of an audit event emitted at the time of the failure.
  • NotFound, Conflict, PreconditionFailed, RateLimited carry the resource slug or scope, plus an implicit trace_id available in the spans for the op (the operations runbook documents how to find it).
  • Auth errors emit an auth.* audit event before the error is written; the trace is correlatable via the op envelope’s trace_context.

9.3 Client retry guidance

The client library implements:
  • Retry-able errors: RateLimited (with the supplied delay), Internal { kind: Storage } (exponential backoff, capped at 30 s, max 3 attempts).
  • Not retry-able: Auth, Validation, NotFound, PreconditionFailed, Conflict, Internal { kind: WireFormat }.
Retry policy is enforced client-side; the server does no retry-rejection beyond the rate-limit window.

10. Observability

10.1 Spans

Every op stream produces one span (platform.protocol.op) covering envelope decode → op execution → response delivery. Span attributes:
  • protocol.op.kind — the Op variant name.
  • protocol.session.id — opaque session id.
  • protocol.principal.id — principal slug.
  • protocol.tenant.id — tenant slug.
  • protocol.idempotency.key — present when the envelope supplies one.
  • protocol.cursor.present — boolean.
  • protocol.error.variant — set when the op fails; the OpError variant name.
  • protocol.bytes_in / protocol.bytes_out — frame sizes.
The span’s trace context is propagated from OpEnvelope.trace_context if present, otherwise a new trace is started. Child spans cover storage operations, Fx evaluation, audit writes. Subscriptions produce a long-lived span (platform.protocol.subscription) plus a child span per delivered event.

10.2 Metrics

Prometheus-shaped, scraped via the private HTTP listener (which remains for operational endpoints; see § 13):
MetricTypeLabels
protocol_sessions_activegaugetenant, client_name, protocol_version
protocol_session_handshakes_totalcountertenant, outcome (success, auth_failed, version_mismatch, timeout)
protocol_op_streams_totalcountertenant, op_kind, outcome (ok, error_variant)
protocol_op_latency_secondshistogramtenant, op_kind
protocol_op_bytes_in / _bytes_outhistogramtenant, op_kind
protocol_data_stream_bytescountertenant, direction, op_kind
protocol_subscription_events_totalcountertenant, subscription_kind, outcome (delivered, dropped)
protocol_subscription_lag_secondsgaugetenant, subscription_id
protocol_rate_limit_rejections_totalcountertenant, scope, op_kind

10.3 Client telemetry datagrams

Datagrams are unreliable, fire-and-forget. Schema:
struct ClientTelemetryDatagram {
    session_id: SessionId,
    trace_context: Option<TraceContext>,
    kind: ClientTelemetryKind,
    payload: Bytes,                           // bounded at 1 KiB
}

enum ClientTelemetryKind {
    UiPerfMark,                               // payload: Postcard(UiPerfMark)
    UiError,                                  // payload: Postcard(UiError)
}
The server logs telemetry datagrams to the observability sink without acknowledgement. Datagrams that fail to deserialize are silently dropped (a coarse counter protocol_telemetry_datagrams_invalid_total exists for visibility).

10.4 Logging policy

Per-op logs at INFO level on completion (success or failure). Per-subscription-event logs at DEBUG. Auth handshakes at INFO. WebTransport-session lifecycle (open/close) at INFO. The audit log per ADR-0007 is separate from operational logs and remains authoritative.

11. Rate limiting

Three layers, enforced server-side, surfaced as OpError::RateLimited:

11.1 Per-session

  • Max concurrent op streams: 32 (configurable per principal class).
  • Max concurrent data streams: 4 (configurable).
  • Bandwidth: governed by QUIC stream flow control naturally; no application-layer cap in v1.
Violating the concurrency cap returns OpError::RateLimited { scope: RateScope::Session, retry_after_ms: <wait for an op slot> }. The retry window is short — typically when the next in-flight op completes.

11.2 Per-principal

  • Ops/sec: 100 (configurable).
  • Bulk-data bytes/sec: 10 MiB/s (configurable).
Token-bucket algorithm; the server returns the retry_after_ms computed from the bucket’s recovery rate.

11.3 Per-tenant

  • Aggregate ops/sec: 1 000 (configurable per tenant tier).
  • Monthly op quota: per-tenant, enforced via the projection store; resets monthly.
The tenant scope is the largest — when a RateLimited rejection is in scope Tenant, the retry window may be minutes.

11.4 The retry-hint contract

OpError::RateLimited { retry_after_ms, scope } carries:
  • retry_after_ms: the smallest valid retry delay. Clients SHOULD wait at least this long; the server MAY further reject retries that arrive earlier.
  • scope: identifies which layer triggered the rejection. Clients use this to surface meaningful errors to users (e.g., “your tenant has reached its monthly quota”).
Limits are tunable per environment via app.config.rate_limits (see docs/OPERATIONS.md once Phase 3 lands).

12. Security

12.1 Transport security

  • QUIC’s TLS 1.3 is mandatory. The server presents a certificate from the tenant’s CA bundle (browser PKI by default; pinned roots for native clients).
  • Forward secrecy is on by default in QUIC; no configuration knob.
  • 0-RTT is disabled in v1 — the security/replay-protection tradeoff is not worth the latency savings for our op shape.

12.2 Authentication

  • WorkOS-issued JWTs, signed by the tenant’s WorkOS organization’s signing key.
  • The platform validates against the WorkOS JWKS endpoint (per ADR-0009’s .well-known bootstrap pattern).
  • JWKS keys are cached per existing auth module policy; rotation is handled out-of-band.
  • The JWT carries: sub (principal id), org_id (WorkOS org id, mapped to tenant), exp (expiry), iat, aud (the platform’s audience identifier), plus claims for permissions.

12.3 Authorization

Per-op authorization happens server-side using the principal’s permission claims. The protocol does not have a generic “permission denied” code separate from OpError::Auth(PrincipalLacksPermission { required }). Glossary terms used in permission names: model.write, model.read, record.write, record.read, audit_tail.read, subscription.audit, attachment.upload, attachment.download. The full permission catalogue is documented in docs/OPERATIONS.md § Authorization.

12.4 Tenant isolation

Every op’s tenant is fixed at handshake. The server’s storage and query layers (per ADR-0010) enforce tenant scoping on every read and write; the protocol layer relies on this without re-checking.

12.5 Replay protection

QUIC’s transport layer is replay-safe by default (0-RTT disabled in v1; § 12.1). At the application layer, idempotency keys (§ 3.2) bound replay attacks on mutating ops.

12.6 Threat model summary

ThreatMitigation
Wire eavesdroppingQUIC/TLS 1.3 with forward secrecy.
Token theftShort-lived JWTs; WorkOS revocation; mid-session expiry forces reconnect.
Cross-tenant data leakTenant bound at handshake; storage layer enforces scope (per ADR-0010).
Replay of mutating opsIdempotency keys + 24h server-side cache.
Denial of serviceThree-layer rate limiting (§ 11); QUIC stream flow control; per-session resource caps.
Malicious subscription consumersPer-tenant subscription concurrency caps; lease-based teardown; Fx predicate budget.
Wire-format fuzzingLength-prefix bounds; Postcard’s strict-deserialization mode; max-frame-size caps.
Schema-evolution-driven confusionAvro schemas signed into OCF metadata; client validates against DescribeAvroSchema.

13. Audit-log integration

Per ADR-0007, the audit log is the platform’s source of truth. Every mutating op produces exactly one audit event; subscription delivery and audit-tail subscriptions are projections of the log.

13.1 Op-to-event mapping

The mutating Op variants map 1:1 onto the audit event types defined in docs/audit-event.schema.json:
Op variantAudit event type
CreateModelmodel.created
UpdateModelmodel.updated
TombstoneModelmodel.tombstoned
PublishModelVersionmodel_version.published
AddFieldfield.added
UpdateFieldfield.updated
TombstoneFieldfield.tombstoned
CreateRecordrecord.created
UpdateRecordrecord.updated
DeleteRecordrecord.deleted
ImportRecords (per accepted record)record.created or record.updated
InitiateAttachmentUploadattachment.upload_initiated
CompleteAttachmentUploadattachment.upload_completed
RequestAttachmentDownloadattachment.download_granted (when permitted)
The op envelope’s trace_context becomes the audit event’s correlation_id. The op’s principal becomes the audit event’s actor. The op’s idempotency_key (if present) is recorded on the audit event under causation_id (matching ADR-0007’s existing convention).

13.2 Protocol-layer audit events

The protocol layer emits a small set of audit events that are not tied to a mutating op:
Event typeWhen
session.openedAfter successful handshake.
session.closedOn session teardown (any reason).
auth.handshake_failedOn any handshake failure.
subscription.openedOn SubscriptionAccepted.
subscription.closedOn EndOfStream.
protocol.version_deprecated_warningOn ServerNotice { version_deprecation: ... } (one per session).
protocol.rate_limit_rejectionOn every OpError::RateLimited (sampled in high-volume environments).
These additions to docs/audit-event.schema.json are sketched here; the exact wire shape is committed in Phase 3 implementation per the change-flow playbook.

13.3 Subscription-as-projection

SubscribeRecords, SubscribeModels, SubscribeAuditTail are all projections of the audit log. The implementation lives in src/projections/ (per ADR-0007 and ADR-0008) — the protocol layer is the delivery channel only. A subscription is, internally, a cursor over the audit log filtered by tenant + subscription kind + Fx predicate. The cursor’s position is the resume token’s audit_position.

15. Client library shape

The client library is a separate repository (per ADR-0001) consuming the platform protocol. It is not implemented in this design — but the platform commits to the following surface so that the client library can be a thin wrapper:
  • One async type per op variant, returning Future<Output = Result<OpResponse, OpError>>.
  • A subscription type with recv() -> Future<Output = Option<Result<SubscriptionEvent, OpError>>> and cancel().
  • A session type owning the WebTransport session and the control stream; ops and subscriptions are created on the session.
  • Capability checks at session-construction time; clients can ask “is FxPredicateFilter available?” without trying an op.
The client library uses the xwt crate to abstract WebTransport across native and Wasm. The xwt-internal API is not part of the platform’s contract.