Skip to content

Error taxonomy

Every error response carries error.code, error.type, error.message, and (where applicable) error.param. Codes are stable; messages may evolve.

Top-level error types

Type HTTP Meaning Retry?
auth_error 401, 403 Invalid, revoked, or scope-insufficient API key No — fix the key
validation_error 400 Malformed request No — fix the request
rate_limit_error 429 Rate limit exceeded Yes, after Retry-After
inference_error 502 Upstream inference provider failed Yes, with backoff
region_unavailable 503 Pinned region is degraded Conditionally — see below
policy_error 403 Request blocked by Article III screening No — review use case
methodology_error 500 Receipt generation failed (the inference may have succeeded) Yes
internal_error 500 Unspecified gateway failure Yes, with backoff

Specific error codes

auth_error

{"error": {"type": "auth_error", "code": "invalid_api_key", "message": "..."}}

Codes:

  • invalid_api_key — key not found or malformed
  • revoked_api_key — key was revoked; check audit log
  • expired_api_key — key past expiry (Audit Plus / Enterprise feature; the default Audit tier does not auto-expire)
  • insufficient_scope — key is valid but lacks the scope for this endpoint

validation_error

The error.param field tells you which parameter is invalid:

{"error": {"type": "validation_error", "code": "invalid_model", "param": "model", "message": "..."}}

Codes:

  • invalid_model — model is not supported on this region/account
  • invalid_region — region code does not exist
  • invalid_tier — requested tier is not supported on this route
  • prompt_too_long — prompt exceeds the model's context window
  • invalid_temperature / invalid_top_p / etc. — standard sampler validations

rate_limit_error

{"error": {"type": "rate_limit_error", "code": "requests_per_second_exceeded", "message": "..."}}

Codes:

  • requests_per_second_exceeded — consult Retry-After
  • daily_budget_exceeded — until midnight UTC of next day
  • concurrent_streams_exceeded — close idle streams or upgrade

policy_error

{"error": {"type": "policy_error", "code": "dual_use_screen_block", "message": "..."}}

Codes:

  • dual_use_screen_block — request blocked by Article III dual-use screening; if you believe this is in error, contact audit@vettedinference.com
  • acceptable_use_violation — request matches our published Acceptable Use Policy violation patterns
  • customer_suspended — your account is suspended pending review

region_unavailable

{"error": {"type": "region_unavailable", "code": "region_degraded", "message": "..."}}

Codes:

  • region_degraded — the pinned region is in incident mode; consult status.vettedinference.com
  • region_saturated — the pinned region is at capacity; retry with backoff or accept default routing
  • region_decommissioned — the requested region is no longer offered (rare; advance notice given)

methodology_error

If methodology fails but inference succeeded, the response carries the completion plus a degraded receipt with tier: "degraded" and an explanation. This is the rare-but-real case where you got an answer but we could not produce a confident estimate. Treat the response as a normal completion; flag the degraded receipt to your audit pipeline for re-calculation.

Idempotency

For chat completions and embeddings, set the Idempotency-Key header to a stable client-generated UUID. Identical requests within 24 hours return the cached response (and the same receipt). This is the recommended pattern for financial-controlled retry logic.

curl https://api.vettedinference.com/v1/chat/completions \
  -H "Authorization: Bearer $VETTED_API_KEY" \
  -H "Idempotency-Key: 5e8c3a2f-4b9d-4e7a-b3f2-a1d5e9c7b8e2" \
  -H "Content-Type: application/json" \
  -d '...'
import time

def call_with_retry(fn, max_attempts=4):
    for attempt in range(max_attempts):
        try:
            return fn()
        except RateLimitError as e:
            wait = e.retry_after or (2 ** attempt)
            time.sleep(wait)
        except (InferenceError, InternalError):
            if attempt == max_attempts - 1:
                raise
            time.sleep(2 ** attempt + random.random())
        # auth_error, validation_error, policy_error: do not retry