feat: auto failover APIs with LK Cloud#686
Conversation
retries in alternative datacenters on 5xx and transport failures
🦋 Changeset detectedLatest commit: d092da9 The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Add auto failover APIs for LK Cloud in livekit-server-sdk.
|
CI depends on livekit/livekit#4627 |
| return { origins, ttl }; | ||
| } | ||
|
|
||
| function parseMaxAge(cacheControl: string | null): number { |
There was a problem hiding this comment.
nit: would be nice to have some unit tests for this (can add this in a follow up)
| * cached list when available, otherwise an empty list. Forwards `headers` so a | ||
| * valid token — and any test directives — reach the discovery endpoint. | ||
| */ | ||
| export async function regionOrigins(origin: URL, headers: unknown): Promise<string[]> { |
There was a problem hiding this comment.
might be worth wrapping this in a origin keyed mutex so that we ensure only one regionOrigins request is processed at a time.
For the ones that are queued up afterwards they should be resolved immediately after as long as there's at least some ttl set for the cache.
| const timeout = options.waitUntilAnswered | ||
| ? (options.timeout ?? DEFAULT_RINGING_TIMEOUT_SECONDS) | ||
| : options.timeout; |
There was a problem hiding this comment.
🔴 WhatsApp call acceptance times out before ringing finishes when user sets a custom ring duration
The HTTP request timeout for waiting WhatsApp calls ignores the user-supplied ringing duration (options.timeout ?? DEFAULT_RINGING_TIMEOUT_SECONDS at ConnectorClient.ts:221-222) and always defaults to 30 seconds, so a call whose ringing window is set longer (e.g. 60 s) will be aborted by the SDK before the phone can be answered.
Impact: Users who set a custom ringing timeout and wait for the call to be answered will see spurious timeout errors.
Timeout computation differs between WhatsApp and SIP paths
The SIP path in SipClient.ts:762-764 correctly uses dialRequestTimeout(opts.timeout, opts.ringingTimeout) (from dialTimeout.ts:30-36), which computes Math.max(timeout ?? floor, floor) where floor = ringingTimeout + 2. This guarantees the HTTP request outlasts the ringing window by at least 2 seconds.
The WhatsApp path at ConnectorClient.ts:221-222 ignores options.ringingTimeout entirely:
const timeout = options.waitUntilAnswered
? (options.timeout ?? DEFAULT_RINGING_TIMEOUT_SECONDS)
: options.timeout;
- If user sets
ringingTimeout: 60but nottimeout, the HTTP timeout is 30 s while the server may wait 60 s. - Even with defaults (both 30 s), there is no 2 s margin — the request can abort just as the call is answered.
The AcceptWhatsAppCallOptions.timeout JSDoc at ConnectorClient.ts:91-96 explicitly promises "is raised, if needed, to stay above ringingTimeout" but the implementation does not fulfil that contract.
| const timeout = options.waitUntilAnswered | |
| ? (options.timeout ?? DEFAULT_RINGING_TIMEOUT_SECONDS) | |
| : options.timeout; | |
| const timeout = options.waitUntilAnswered | |
| ? dialRequestTimeout(options.timeout, options.ringingTimeout) | |
| : options.timeout; |
Was this helpful? React with 👍 or 👎 to provide feedback.
retries in alternative datacenters on 5xx and transport failures
also removed legacy camel-case, which was not needed since we switched to protobuf-es