Observability (Prometheus)
The collector exposes a Prometheus exposition at GET /metrics. Operators
scrape it from inside the same private network — there is no auth, and the
endpoint does not pass through the rate limiter so it stays reachable when
limits trip.
Endpoint
Section titled “Endpoint”GET /metricsContent-Type: text/plain; version=0.0.4; charset=utf-8200 returns the exposition; 503 is returned with an empty body when the
recorder is not initialised (test fixtures that build state without
main.rs). Production should never see 503 — alert on it as a deployment
bug.
Cardinality rules
Section titled “Cardinality rules”Labels on Prometheus metrics are bounded to closed-set categoricals:
| Label | Allowed values |
|---|---|
endpoint | events, s2s |
reason | accepted, accepted_consent, bot, internal, consent_required, rate_limit, quota, schema_invalid |
method | POST, GET |
There are no per-tenant, per-user, per-anon, or per-site labels. A
workspace_id label would push cardinality into the millions; the
platform refuses it on principle. Per-tenant breakdowns go through the
audit log and analytics queries instead.
The metric set
Section titled “The metric set”Counters:
| Metric | Labels | Meaning |
|---|---|---|
syntarie_events_received_total | endpoint | Every request, including rejects. |
syntarie_events_accepted_total | endpoint | Events that reached storage. |
syntarie_events_dropped_total | endpoint, reason | Events dropped at any gate. |
syntarie_events_quarantined_total | endpoint | Events sent to the DLQ. |
syntarie_events_deduped_total | endpoint | Events suppressed by the dedup window. |
Histograms:
| Metric | Labels | Buckets |
|---|---|---|
syntarie_ingest_duration_seconds | endpoint | 1 ms / 2 ms / 5 ms / 10 ms / 25 ms / 50 ms / 100 ms |
syntarie_validation_duration_seconds | endpoint | 100 µs / 500 µs / 1 ms / 5 ms |
syntarie_storage_write_duration_seconds | endpoint | 1 ms / 5 ms / 10 ms / 50 ms / 100 ms |
Gauges:
| Metric | Labels | Meaning |
|---|---|---|
syntarie_storage_pool_open_connections | — | Open Postgres connections. |
syntarie_storage_pool_idle_connections | — | Idle Postgres connections. |
syntarie_dlq_size | — | DLQ row count, sampled every 60 s. |
Two metrics every operator should alert on
Section titled “Two metrics every operator should alert on”1. p99 ingest latency past the 5 ms budget.
histogram_quantile( 0.99, sum(rate(syntarie_ingest_duration_seconds_bucket[5m])) by (le)) > 0.005When this fires, the cause is almost always either a slow Postgres write
or a saturated CPU. Check syntarie_storage_write_duration_seconds
first.
2. Sustained drops at any gate.
sum(rate(syntarie_events_dropped_total[5m])) by (reason) > 10A burst is normal (a release, a synthetic test). A sustained five-minute drop rate above ~10 events/sec is usually a config issue (rate limit too low, consent flag misconfigured, schema bundle too strict).
What about the query API?
Section titled “What about the query API?”The query API does not expose Prometheus metrics in v1.0. It is on the v1.1 roadmap. For now, monitor the API at the platform level (CPU, RSS, HTTP error rate) and rely on Postgres slow-query logs for read-side latency analysis.
Scraping
Section titled “Scraping”The scrape endpoint is private — operators should firewall port 8080 at
the network layer and only expose /events and /s2s/events to the
public internet. A typical Prometheus scrape config:
- job_name: 'syntarie-collector' scrape_interval: 15s static_configs: - targets: - 'collector-1.private:8080' - 'collector-2.private:8080'What gets logged
Section titled “What gets logged”Independently of Prometheus, the collector emits structured (JSON) logs for every accept / reject decision. These are operator-facing and may expand in any release; do not parse them programmatically. They are intended for log aggregators, not for alerting (which uses Prometheus).