Rate limits + quotas

Every accepted ingest request flows through two gates: a per-second rate limit (token bucket) and a monthly event quota. Both are per-workspace. They are independent — a workspace can be rate-limited without exhausting its quota, and vice versa.

Per-second rate limit

A token bucket with refill rate RATE_LIMIT_RPS and bucket capacity RATE_LIMIT_BURST. Defaults: 100 rps, 200 burst.

When the bucket is empty, the request returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 1
X-Ratelimit-Reason: per_second_rate_limit
Content-Type: application/json

{ "error": { "code": "rate_limit_exceeded", "message": "per-second rate limit exceeded" } }

The SDK retry orchestrator treats 429 as transient and backs off (see Retry). The Retry-After header is always 1 for this gate — the bucket replenishes within one second of refill rate.

Monthly quota

A per-workspace monthly event ceiling. Defaults: 10_000_000 events per calendar month. Two thresholds:

Soft ceiling (QUOTA_SOFT_PCT, default 80%): the request still succeeds but a warning header is attached so operators can set up alerts. Header: X-Ratelimit-Reason: monthly_quota_soft.
Hard ceiling (100%): subsequent requests return:

HTTP/1.1 402 Payment Required
X-Ratelimit-Reason: monthly_quota_exceeded
Content-Type: application/json

{ "error": { "code": "monthly_quota_exceeded", "message": "workspace monthly event quota exhausted" } }

The SDK retry orchestrator treats 402 as permanent for the rest of the month — there is no point retrying when the cause is a quota that won’t reset until midnight on the 1st. This is a deliberate choice to prevent retry storms from sites whose plan is over-limit.

Per-workspace overrides

Defaults set via env are platform-wide. Per-workspace overrides live in the workspaces row:

Column	Purpose
`rate_limit_rps`	Per-second refill rate. NULL falls back to env default.
`rate_limit_burst`	Bucket capacity. NULL falls back to env default.
`quota_monthly_events`	Monthly hard ceiling. NULL falls back to env default.

Operators set these via direct SQL in v1.0 (a workspaces dashboard endpoint that exposes these fields lands in v1.1).

What gets counted

Every accepted event counts against the monthly quota — including those that are subsequently dropped by the bot or internal-traffic filter (in Drop mode they were “accepted” by the rate limiter and then dropped by the filter, so they do count). DLQ rows count.

What does NOT count:

Requests rejected at auth / consent (4xx never reaches the counter).
Health checks (/healthz, /metrics — they bypass the limiter entirely).

Performance

The token-bucket check is benchmarked at p99 < 1 µs, well inside the 5 ms ingest budget. The quota counter is incremented once per accepted event inside the same Postgres transaction as the storage write — no extra round trip.

Monitoring

Alerts to consider:

Rate-limit drops sustained for more than 5 min for a single workspace — usually a rogue client or an under-tuned default. The syntarie_events_dropped_total{reason="rate_limit"} metric carries no per-workspace label by design (cardinality), so you must cross-check against operator logs to identify the workspace.
Soft ceiling exceeded — alert per-workspace and contact the customer. Catching this before the hard ceiling avoids end-of-month surprises.
Hard ceiling exceeded — alert per-workspace; the customer’s events are not flowing.

The structured-log line that fires on each rate-limit decision carries the workspace id at debug level for forensic analysis.