rtb-ai v0.1 — Multi-provider AI client¶
Status: DRAFT — awaiting review before TDD/implementation.
Parent contract: rust-tool-base.md and the v0.3 scope addendum 2026-05-01-v0.3-scope.md.
Replaces: the rtb-ai v0.1 stub (21-line placeholder).
1. Goal¶
Ship a typed, async, redaction-aware AI client that:
- Unifies five concrete providers behind one
AiClient(Anthropic, OpenAI, Gemini, Ollama, OpenAI-compatible). - Uses
genaias the multi-provider backbone but drops down to a directreqwest-on-Anthropic-Messages path for featuresgenaidoes not yet surface (prompt caching, extended thinking, citations). - Defaults to Claude 4.7 (Opus 4.7 / Sonnet 4.6 / Haiku 4.5) per CLAUDE.md.
- Implements structured output via
schemars-derived JSON Schema sent in the request andjsonschemavalidation on the response. - Sources its API key through
rtb-credentials::Resolverso tools authored on RTB can wire their secret-resolution policy in one place. - Honours the framework's redaction policy at every point a free-form string crosses an out-of-process boundary (logs, errors, telemetry).
Anthropic agents (multi-step tool-use loops with sub-agents) are explicitly deferred to a v0.3.x point release. The v0.3 surface is chat + structured output + caching + thinking + citations; agents land cleanly once that ships.
2. Public API shape¶
2.1 Crate root¶
pub use client::{AiClient, ChatRequest, ChatResponse, ChatStream};
pub use config::{Config, Provider};
pub use error::AiError;
pub use message::{Citation, ContentBlock, Message, Role, Usage};
pub use thinking::ThinkingMode;
pub mod client;
pub mod config;
pub mod error;
pub mod message;
pub mod thinking;
/// Validate a user-supplied base URL — HTTPS-only by default,
/// rejects userinfo + placeholder hosts. Mirrors
/// `rtb_vcs::http`'s base-url policy.
pub fn validate_base_url(url: &url::Url, allow_insecure: bool) -> Result<(), AiError>;
2.2 Config¶
#[derive(Debug, Clone)]
pub struct Config {
/// Which provider to target. Picks the wire protocol and the
/// auth header shape.
pub provider: Provider,
/// Model identifier — provider-specific. When empty, defaults
/// to the provider's flagship model (Anthropic: `"claude-opus-4-7"`).
pub model: String,
/// Override the provider's default endpoint. `None` uses the
/// vendor's documented production URL.
pub base_url: Option<url::Url>,
/// API key, resolved at config-build time via
/// [`rtb_credentials::Resolver`]. Stays in `SecretString` until
/// the per-request `Authorization` header is composed.
pub api_key: secrecy::SecretString,
/// Per-request timeout. Defaults to 60s in `Config::default`.
pub timeout: std::time::Duration,
/// Test-only escape hatch: when `true`, `validate_base_url`
/// accepts `http://` and `127.0.0.1` endpoints (wiremock
/// integration). `#[serde(skip)]` so config files can't downgrade.
#[serde(skip)]
pub allow_insecure_base_url: bool,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Deserialize, serde::Serialize)]
#[serde(rename_all = "lowercase")]
pub enum Provider {
/// Anthropic Cloud — uses the direct-`reqwest` path so prompt
/// caching / extended thinking / citations work.
Anthropic,
/// Self-hosted Anthropic-compatible (Claude Code Local, etc.).
AnthropicLocal,
/// OpenAI Cloud — via `genai`.
OpenAi,
/// OpenAI-compatible endpoints (Together, Fireworks, vLLM, …) — via `genai`.
OpenAiCompatible,
/// Google Gemini — via `genai`.
Gemini,
/// Local Ollama — via `genai`.
Ollama,
}
2.3 AiClient¶
impl AiClient {
/// Build a client. Validates `base_url`, builds a `reqwest::Client`
/// with HTTPS enforcement + the configured timeout, and (for
/// `genai`-backed providers) stamps the corresponding `genai::Client`.
///
/// # Errors
/// Returns [`AiError::InvalidConfig`] on a bad base URL, empty
/// API key, or unsupported provider+model combination.
pub fn new(config: Config) -> Result<Self, AiError>;
/// One-shot chat completion.
pub async fn chat(&self, req: ChatRequest) -> Result<ChatResponse, AiError>;
/// Streaming chat completion. Yields `ChatStreamEvent` items —
/// `Token(String)`, `ThinkingToken(String)` (Anthropic only),
/// `Done(Usage)`, `Error(AiError)`.
pub async fn chat_stream(&self, req: ChatRequest) -> Result<ChatStream, AiError>;
/// Structured output: sends `T`'s JSON Schema with the request,
/// validates the model's reply against it before deserialising.
pub async fn chat_structured<T>(&self, req: ChatRequest) -> Result<T, AiError>
where
T: serde::de::DeserializeOwned + schemars::JsonSchema;
}
2.4 ChatRequest¶
#[derive(Debug, Clone, Default)]
pub struct ChatRequest {
pub system: Option<String>,
pub messages: Vec<Message>,
pub temperature: Option<f32>,
pub max_tokens: Option<u32>,
/// Anthropic-only: enables prompt caching at every stable point
/// (system prompt + tools + first turn). Ignored on non-Anthropic
/// providers.
pub cache_control: bool,
/// Anthropic-only: extended-thinking budget. `None` disables.
/// Ignored on non-Anthropic providers.
pub thinking: Option<thinking::ThinkingMode>,
}
2.5 ChatResponse¶
#[derive(Debug, Clone)]
pub struct ChatResponse {
pub message: Message,
pub usage: Usage,
/// Populated only on the Anthropic-direct path when the assistant
/// output uses the citation feature.
pub citations: Vec<Citation>,
}
#[derive(Debug, Clone, Copy, Default)]
pub struct Usage {
pub input_tokens: u32,
pub output_tokens: u32,
pub cache_creation_input_tokens: u32,
pub cache_read_input_tokens: u32,
}
2.6 AiError¶
#[non_exhaustive], Clone-derivable (no Box<dyn std::error::Error> fields):
#[derive(Debug, Clone, thiserror::Error, miette::Diagnostic)]
#[non_exhaustive]
pub enum AiError {
#[error("invalid AI client config: {0}")]
#[diagnostic(code(rtb::ai::config))]
InvalidConfig(String),
#[error("provider error: {0}")]
#[diagnostic(code(rtb::ai::provider))]
Provider(String),
#[error("HTTP transport: {0}")]
#[diagnostic(code(rtb::ai::transport))]
Transport(String),
#[error("response did not validate against schema: {0}")]
#[diagnostic(code(rtb::ai::schema))]
SchemaValidation(String),
#[error("response was not valid JSON for the requested type: {0}")]
#[diagnostic(code(rtb::ai::deserialize))]
Deserialize(String),
#[error("rate limited by {host} (retry-after: {retry_after:?})")]
#[diagnostic(code(rtb::ai::rate_limited))]
RateLimited { host: String, retry_after: Option<std::time::Duration> },
}
Every String payload has been through rtb_redact::string before storage.
3. Anthropic-direct path¶
When Config::provider is Anthropic or AnthropicLocal, every method goes through a direct-reqwest implementation against POST /v1/messages (or the local equivalent). This unlocks four features genai does not yet expose:
- Prompt caching — automatic when
ChatRequest::cache_control = true. Cache breakpoints are inserted at the system prompt, the tool list (when present), and the first user message — the three "stable" points. - Extended thinking —
ChatRequest::thinking = Some(ThinkingMode::Budget(N))adds thethinkingblock. Streaming surfacesChatStreamEvent::ThinkingToken(String)separately from regularToken(String). - Citations — populated on
ChatResponse::citationswhen the model uses citation outputs. - Future: managed agents — out of scope for v0.3, but the direct-reqwest path is what unlocks them later.
Non-Anthropic providers go through genai. The cache_control / thinking fields are silently ignored on those paths.
4. Cross-cutting changes (folded into this PR)¶
4.1 Resolver::with_platform_default()¶
rtb-credentials gains a one-line constructor:
impl Resolver {
/// Convenience: build a `Resolver` over `KeyringStore::new()`
/// (the platform-native default). Equivalent to
/// `Resolver::new(Arc::new(KeyringStore::new()))`.
#[must_use]
pub fn with_platform_default() -> Self;
}
impl Default for Resolver { /* same */ }
4.2 rtb-docs — docs ask hookup¶
rtb-docs::ai::AiAnswerStream impl backed by rtb_ai::AiClient lands in the same PR (gated on rtb-docs's ai Cargo feature). The CLI surface is docs ask <question>; the tokens stream to stdout (per O5 — TUI is reserved for docs browse).
4.3 rtb-update — PAT auth¶
rtb_update::command::build_provider resolves the PAT via Resolver::with_platform_default() against a new ToolMetadata::release_credential: Option<CredentialRef>. When unset, the provider runs unauthenticated (today's behaviour). Backward-compatible.
4.4 rtb-app::ReleaseSource — six-variant expansion¶
Add Bitbucket / Gitea / Codeberg variants to match rtb_vcs::ReleaseSourceConfig. The release_source_to_config mapper inside rtb-update adds the three branches; the existing #[non_exhaustive] fallback can then go away (or stay for forward-compat).
5. Test plan (TDD)¶
Every method gets a unit-level T# criterion. HTTP-bound tests use wiremock.
- T1 —
AiClient::newrejects anhttp://base_urlunlessallow_insecure_base_url. - T2 —
AiClient::newrejects an empty API key. - T3 —
Config::default()returns Anthropic + Claude Opus 4.7. - T4 —
validate_base_urlrejects userinfo (https://user:pw@…). - T5 —
validate_base_urlrejects placeholder hosts (example.com). - T6 —
chatagainst awiremockAnthropic Messages endpoint produces the expected request shape (system / messages / cache_control header) and parses the response. - T7 —
chatagainst awiremockOpenAI endpoint produces the OpenAI request shape viagenai. - T8 —
chat_streamyieldsTokenevents for an SSE stream from awiremockserver. - T9 —
chat_structured::<T>validates the response againstT's schema; a schema mismatch surfacesAiError::SchemaValidation. - T10 — Error responses (4xx / 5xx) map to
AiError::Providerwith the body redacted viartb_redact::string. - T11 — Rate-limit responses (
429 + Retry-After) map toAiError::RateLimitedwith the duration parsed. - T12 — Anthropic prompt caching:
cache_control = trueaddscache_controlblocks at system + tools + first message; the request body matches a snapshot. - T13 — Extended thinking:
thinking = Some(ThinkingMode::Budget(N))adds thethinkingrequest block; streaming exposesThinkingTokenevents. - T14 —
Citationparsed from a sample Anthropic response with citations. - T15 —
AiErrorisClone(compile-time check).
BDD scenario: - S1 — "Given a configured AI client, When I ask a question, Then I receive a streamed response and a final usage report."
Coverage gate ≥ 90% on this crate per the v0.1 standing requirement.
6. Security requirements¶
- HTTPS-only on
base_urlunlessallow_insecure_base_url(test-only). Config::api_keyisSecretString.Debugrenders[REDACTED]. The exposed value is built into the per-requestAuthorizationheader and immediately discarded.- Every
AiError::*(String)payload runs throughrtb_redact::stringso leaked URLs / tokens / headers in the upstream error never reach our telemetry. - Logging at INFO level emits the endpoint hostname only — never the path, query string, or any header value.
- Provider DEBUG logging (
tracing::debug!) gates behindtracingfilters by default; tools that opt in still get redacted bodies.
7. Non-goals for v0.1¶
- Anthropic managed agents — deferred to v0.3.x (per scope O3).
- Function calling / tool use for non-Anthropic providers —
genaiexposes this; we'll re-wrap in a v0.3.x or v0.4 release once the basic chat loop is solid. - Embeddings —
genai::Client::embedexists; wrapping is one more pass and it's its own scope. Defer. - Token-level cost accounting —
Usagereports raw token counts; pricing tables are a separate concern. - Multi-turn conversation persistence — caller manages
ChatRequest::messageshistory.
8. Approval gate¶
This addendum is implemented when (a) status flips to APPROVED, (b) T1–T15 + S1 land green with ≥ 90% line coverage, © docs ask reaches a non-AiDisabled exit on the example tool, (d) rtb-update gains optional PAT auth via release_credential, (e) rtb-app::ReleaseSource expands to six variants.