Conflux Docs
Routing

Provider endpoint pools

Admin-managed pools for routing one logical model across multiple provider targets.

Provider endpoint pools diagramClick to enlarge

Status

Implementation status

Provider endpoint pools are now implemented as a DB-managed routing foundation. Existing provider endpoints and direct model usage remain unchanged. Pools are an optional layer admins can create after provider endpoints and model targets already exist.

The current release includes the admin pool UI, schema-backed pool config, runtime pool expansion, local fallback state, DB-backed member runtime state, pre-response retry for non-stream requests, persistent cooldown/circuit outcomes from 429/5xx/timeout failures, and deterministic selector tests. Pool route metadata is recorded in existing routing reason/audit fields, and the admin page shows member health summary data. In-flight counters include a stale-window guard so leaked counts from a crashed worker stop blocking target selection.

Admin workflow

  1. Add provider endpoints in Routing & Providers, the same way as today.
  2. Import or add concrete provider model targets.
  3. Open the Endpoint pools tab, create a pool from existing targets, and assign one public logical model ID.
  4. Configure strategy, weights, priorities, per-member concurrency, sticky sessions, retry attempts, and enabled state.
  5. Expose the pool through workspace and API key policy like any other public model.

Pool configuration must be stored in the database and edited from the web UI. Changing targets, weights, concurrency, cooldowns, or enabled state must not require restarting Conflux services.

Model contract

One logical IDUsers call a stable model such as conflux/local-coder. Admin logs show which concrete target handled the request.
Existing targetsPool members reference existing provider endpoint and model registry rows. Secrets stay owned by the provider layer.
Conservative capabilityThe pool advertises the safe intersection of target capabilities unless the admin explicitly enforces a stricter minimum.
Policy visible/v1/models, workspace policy, API key policy, and selectors expose the logical pool model only when policy allows it.

Routing policy

Pool routing should filter candidates first, then choose from eligible targets. The first production default is conservative, while advanced strategies are present and can be enabled per pool.

Eligibility filterRemove disabled, unavailable, over-concurrency, capability-mismatch, policy-denied, cooldown, and circuit-open targets.
Baseline selectionUse least-in-flight or priority failover for early rollout. Both are explainable, deterministic, covered by tests, and protected from stale in-flight counter leaks.
Rate-limit cooldown429/5xx/timeout outcomes update DB-backed runtime state so limited targets are avoided across API processes.
Advanced policiesWeighted routing, latency preference, sticky sessions, and per-member concurrency are implemented in the selector foundation.

Failover boundary

Retry before stream, not after

Conflux may retry an eligible alternate target before a response is committed. It must not switch providers after the first streamed byte or token is emitted, because that can duplicate work or break the response.

Failover must not cross compliance, data-residency, local-only, internal-only, or capability boundaries.

Examples

Ollama local pool
Public model: conflux/local-coder
Targets:
- ollama-host / qwen2.5-coder:32b
- ollama-a    / qwen2.5-coder:32b
- ollama-b    / qwen2.5-coder:32b
Policy:
- least_in_flight
- max concurrency per target
- cooldown on 429/5xx
- no cloud fallback unless an admin explicitly adds one
Cloud multi-key pool
Public model: conflux/sonnet-prod
Targets:
- openrouter-prod-a / provider-model-id
- openrouter-prod-b / provider-model-id
Policy:
- weighted_round_robin 50/50
- respect Retry-After
- cool down limited keys
- fail over only before streaming starts

Ollama URLs

Ollama can be used as a local gateway endpoint with no API key, but the URL is resolved by Conflux, not by the browser.

Same host:
http://127.0.0.1:11434/v1

Docker Compose service named ollama:
http://ollama:11434/v1

Conflux in container, Ollama on host:
http://host.docker.internal:11434/v1

Remote private Ollama:
https://ollama.internal.example.com/v1
Do not expose Ollama publicly

Do not put an unauthenticated Ollama endpoint on the public internet. Use a private network, firewall, reverse proxy authentication, or Conflux-only Docker network access.

Testing gate

Provider endpoint pool changes must pass automated tests that prove load-balancing behavior before deployment.

Policy testsEligibility, least-in-flight, max concurrency, weighted routing, latency preference, sticky behavior, and cooldowns.
Fake providersIntegration tests must simulate success, streaming, timeout, 429, 5xx, malformed response, and disconnect cases.
Hot reloadTests must prove pool edits affect new requests without restarting services and do not affect in-flight requests.
Production smokeDeploy behind a feature flag, create an internal pool, run stream and non-stream prompts, then run failure drills.