Token bucket
Rate limiting is based on a token bucket mechanism. Each tenant has a bucket of tokens that is consumed by requests and replenished continuously over time.Pool of 500 tokens...
Maximum token pool size at your disposal.
... replenishing at 4 tokens / second
Refill rate — your pool continuously replenishes at this pace.
Concurrency limits
In addition to rate limiting, the API limits the number of requests that can be processed simultaneously for a given tenant.| Parameter | Value |
|---|---|
| Maximum concurrent requests | 32 per tenant |
| Queue capacity | 128 additional requests |
Response headers
Every API response includes headers describing your current rate limit status:| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum number of requests allowed |
X-RateLimit-Remaining | Requests remaining before the limit is reached |
X-RateLimit-Rate-Amount | Number of requests replenished per interval |
X-RateLimit-Rate-Interval | Duration of each replenishment interval, in seconds |
X-RateLimit-Retry-After | Seconds to wait before a request can be accepted (0 when within limits) |
Exceeding the limit
When the limit is reached, the API responds with HTTP 429 Too Many Requests and includes aRetry-After header indicating how long to wait:
Best practices
Use response headers to self-throttle
Monitor
X-RateLimit-Remaining and reduce your request rate as it decreases. This is more reliable than reacting to 429 errors after the fact.Limit concurrency
Keep parallel requests below 10 to avoid contention and minimize impact on human users of the same tipee instance.
Respect Retry-After
When you receive a
429 response, always wait for the full duration specified in the header before retrying.Spread requests over time
A steady flow of requests uses the refill rate more efficiently than large bursts followed by idle periods.
Implement exponential backoff
For transient errors (
5xx) or repeated rate limit responses, increase the delay between retries rather than retrying immediately.