Skip to content

rtb-ai v0.1 — Multi-provider AI client

Status: DRAFT — awaiting review before TDD/implementation. Parent contract: rust-tool-base.md and the v0.3 scope addendum 2026-05-01-v0.3-scope.md. Replaces: the rtb-ai v0.1 stub (21-line placeholder).


1. Goal

Ship a typed, async, redaction-aware AI client that:

  • Unifies five concrete providers behind one AiClient (Anthropic, OpenAI, Gemini, Ollama, OpenAI-compatible).
  • Uses genai as the multi-provider backbone but drops down to a direct reqwest-on-Anthropic-Messages path for features genai does not yet surface (prompt caching, extended thinking, citations).
  • Defaults to Claude 4.7 (Opus 4.7 / Sonnet 4.6 / Haiku 4.5) per CLAUDE.md.
  • Implements structured output via schemars-derived JSON Schema sent in the request and jsonschema validation on the response.
  • Sources its API key through rtb-credentials::Resolver so tools authored on RTB can wire their secret-resolution policy in one place.
  • Honours the framework's redaction policy at every point a free-form string crosses an out-of-process boundary (logs, errors, telemetry).

Anthropic agents (multi-step tool-use loops with sub-agents) are explicitly deferred to a v0.3.x point release. The v0.3 surface is chat + structured output + caching + thinking + citations; agents land cleanly once that ships.

2. Public API shape

2.1 Crate root

pub use client::{AiClient, ChatRequest, ChatResponse, ChatStream};
pub use config::{Config, Provider};
pub use error::AiError;
pub use message::{Citation, ContentBlock, Message, Role, Usage};
pub use thinking::ThinkingMode;

pub mod client;
pub mod config;
pub mod error;
pub mod message;
pub mod thinking;

/// Validate a user-supplied base URL — HTTPS-only by default,
/// rejects userinfo + placeholder hosts. Mirrors
/// `rtb_vcs::http`'s base-url policy.
pub fn validate_base_url(url: &url::Url, allow_insecure: bool) -> Result<(), AiError>;

2.2 Config

#[derive(Debug, Clone)]
pub struct Config {
    /// Which provider to target. Picks the wire protocol and the
    /// auth header shape.
    pub provider: Provider,
    /// Model identifier — provider-specific. When empty, defaults
    /// to the provider's flagship model (Anthropic: `"claude-opus-4-7"`).
    pub model: String,
    /// Override the provider's default endpoint. `None` uses the
    /// vendor's documented production URL.
    pub base_url: Option<url::Url>,
    /// API key, resolved at config-build time via
    /// [`rtb_credentials::Resolver`]. Stays in `SecretString` until
    /// the per-request `Authorization` header is composed.
    pub api_key: secrecy::SecretString,
    /// Per-request timeout. Defaults to 60s in `Config::default`.
    pub timeout: std::time::Duration,
    /// Test-only escape hatch: when `true`, `validate_base_url`
    /// accepts `http://` and `127.0.0.1` endpoints (wiremock
    /// integration). `#[serde(skip)]` so config files can't downgrade.
    #[serde(skip)]
    pub allow_insecure_base_url: bool,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Deserialize, serde::Serialize)]
#[serde(rename_all = "lowercase")]
pub enum Provider {
    /// Anthropic Cloud — uses the direct-`reqwest` path so prompt
    /// caching / extended thinking / citations work.
    Anthropic,
    /// Self-hosted Anthropic-compatible (Claude Code Local, etc.).
    AnthropicLocal,
    /// OpenAI Cloud — via `genai`.
    OpenAi,
    /// OpenAI-compatible endpoints (Together, Fireworks, vLLM, …) — via `genai`.
    OpenAiCompatible,
    /// Google Gemini — via `genai`.
    Gemini,
    /// Local Ollama — via `genai`.
    Ollama,
}

2.3 AiClient

impl AiClient {
    /// Build a client. Validates `base_url`, builds a `reqwest::Client`
    /// with HTTPS enforcement + the configured timeout, and (for
    /// `genai`-backed providers) stamps the corresponding `genai::Client`.
    ///
    /// # Errors
    /// Returns [`AiError::InvalidConfig`] on a bad base URL, empty
    /// API key, or unsupported provider+model combination.
    pub fn new(config: Config) -> Result<Self, AiError>;

    /// One-shot chat completion.
    pub async fn chat(&self, req: ChatRequest) -> Result<ChatResponse, AiError>;

    /// Streaming chat completion. Yields `ChatStreamEvent` items —
    /// `Token(String)`, `ThinkingToken(String)` (Anthropic only),
    /// `Done(Usage)`, `Error(AiError)`.
    pub async fn chat_stream(&self, req: ChatRequest) -> Result<ChatStream, AiError>;

    /// Structured output: sends `T`'s JSON Schema with the request,
    /// validates the model's reply against it before deserialising.
    pub async fn chat_structured<T>(&self, req: ChatRequest) -> Result<T, AiError>
    where
        T: serde::de::DeserializeOwned + schemars::JsonSchema;
}

2.4 ChatRequest

#[derive(Debug, Clone, Default)]
pub struct ChatRequest {
    pub system: Option<String>,
    pub messages: Vec<Message>,
    pub temperature: Option<f32>,
    pub max_tokens: Option<u32>,
    /// Anthropic-only: enables prompt caching at every stable point
    /// (system prompt + tools + first turn). Ignored on non-Anthropic
    /// providers.
    pub cache_control: bool,
    /// Anthropic-only: extended-thinking budget. `None` disables.
    /// Ignored on non-Anthropic providers.
    pub thinking: Option<thinking::ThinkingMode>,
}

2.5 ChatResponse

#[derive(Debug, Clone)]
pub struct ChatResponse {
    pub message: Message,
    pub usage: Usage,
    /// Populated only on the Anthropic-direct path when the assistant
    /// output uses the citation feature.
    pub citations: Vec<Citation>,
}

#[derive(Debug, Clone, Copy, Default)]
pub struct Usage {
    pub input_tokens: u32,
    pub output_tokens: u32,
    pub cache_creation_input_tokens: u32,
    pub cache_read_input_tokens: u32,
}

2.6 AiError

#[non_exhaustive], Clone-derivable (no Box<dyn std::error::Error> fields):

#[derive(Debug, Clone, thiserror::Error, miette::Diagnostic)]
#[non_exhaustive]
pub enum AiError {
    #[error("invalid AI client config: {0}")]
    #[diagnostic(code(rtb::ai::config))]
    InvalidConfig(String),

    #[error("provider error: {0}")]
    #[diagnostic(code(rtb::ai::provider))]
    Provider(String),

    #[error("HTTP transport: {0}")]
    #[diagnostic(code(rtb::ai::transport))]
    Transport(String),

    #[error("response did not validate against schema: {0}")]
    #[diagnostic(code(rtb::ai::schema))]
    SchemaValidation(String),

    #[error("response was not valid JSON for the requested type: {0}")]
    #[diagnostic(code(rtb::ai::deserialize))]
    Deserialize(String),

    #[error("rate limited by {host} (retry-after: {retry_after:?})")]
    #[diagnostic(code(rtb::ai::rate_limited))]
    RateLimited { host: String, retry_after: Option<std::time::Duration> },
}

Every String payload has been through rtb_redact::string before storage.

3. Anthropic-direct path

When Config::provider is Anthropic or AnthropicLocal, every method goes through a direct-reqwest implementation against POST /v1/messages (or the local equivalent). This unlocks four features genai does not yet expose:

  1. Prompt caching — automatic when ChatRequest::cache_control = true. Cache breakpoints are inserted at the system prompt, the tool list (when present), and the first user message — the three "stable" points.
  2. Extended thinkingChatRequest::thinking = Some(ThinkingMode::Budget(N)) adds the thinking block. Streaming surfaces ChatStreamEvent::ThinkingToken(String) separately from regular Token(String).
  3. Citations — populated on ChatResponse::citations when the model uses citation outputs.
  4. Future: managed agents — out of scope for v0.3, but the direct-reqwest path is what unlocks them later.

Non-Anthropic providers go through genai. The cache_control / thinking fields are silently ignored on those paths.

4. Cross-cutting changes (folded into this PR)

4.1 Resolver::with_platform_default()

rtb-credentials gains a one-line constructor:

impl Resolver {
    /// Convenience: build a `Resolver` over `KeyringStore::new()`
    /// (the platform-native default). Equivalent to
    /// `Resolver::new(Arc::new(KeyringStore::new()))`.
    #[must_use]
    pub fn with_platform_default() -> Self;
}

impl Default for Resolver { /* same */ }

4.2 rtb-docsdocs ask hookup

rtb-docs::ai::AiAnswerStream impl backed by rtb_ai::AiClient lands in the same PR (gated on rtb-docs's ai Cargo feature). The CLI surface is docs ask <question>; the tokens stream to stdout (per O5 — TUI is reserved for docs browse).

4.3 rtb-update — PAT auth

rtb_update::command::build_provider resolves the PAT via Resolver::with_platform_default() against a new ToolMetadata::release_credential: Option<CredentialRef>. When unset, the provider runs unauthenticated (today's behaviour). Backward-compatible.

4.4 rtb-app::ReleaseSource — six-variant expansion

Add Bitbucket / Gitea / Codeberg variants to match rtb_vcs::ReleaseSourceConfig. The release_source_to_config mapper inside rtb-update adds the three branches; the existing #[non_exhaustive] fallback can then go away (or stay for forward-compat).

5. Test plan (TDD)

Every method gets a unit-level T# criterion. HTTP-bound tests use wiremock.

  • T1AiClient::new rejects an http:// base_url unless allow_insecure_base_url.
  • T2AiClient::new rejects an empty API key.
  • T3Config::default() returns Anthropic + Claude Opus 4.7.
  • T4validate_base_url rejects userinfo (https://user:pw@…).
  • T5validate_base_url rejects placeholder hosts (example.com).
  • T6chat against a wiremock Anthropic Messages endpoint produces the expected request shape (system / messages / cache_control header) and parses the response.
  • T7chat against a wiremock OpenAI endpoint produces the OpenAI request shape via genai.
  • T8chat_stream yields Token events for an SSE stream from a wiremock server.
  • T9chat_structured::<T> validates the response against T's schema; a schema mismatch surfaces AiError::SchemaValidation.
  • T10 — Error responses (4xx / 5xx) map to AiError::Provider with the body redacted via rtb_redact::string.
  • T11 — Rate-limit responses (429 + Retry-After) map to AiError::RateLimited with the duration parsed.
  • T12 — Anthropic prompt caching: cache_control = true adds cache_control blocks at system + tools + first message; the request body matches a snapshot.
  • T13 — Extended thinking: thinking = Some(ThinkingMode::Budget(N)) adds the thinking request block; streaming exposes ThinkingToken events.
  • T14Citation parsed from a sample Anthropic response with citations.
  • T15AiError is Clone (compile-time check).

BDD scenario: - S1 — "Given a configured AI client, When I ask a question, Then I receive a streamed response and a final usage report."

Coverage gate ≥ 90% on this crate per the v0.1 standing requirement.

6. Security requirements

  • HTTPS-only on base_url unless allow_insecure_base_url (test-only).
  • Config::api_key is SecretString. Debug renders [REDACTED]. The exposed value is built into the per-request Authorization header and immediately discarded.
  • Every AiError::*(String) payload runs through rtb_redact::string so leaked URLs / tokens / headers in the upstream error never reach our telemetry.
  • Logging at INFO level emits the endpoint hostname only — never the path, query string, or any header value.
  • Provider DEBUG logging (tracing::debug!) gates behind tracing filters by default; tools that opt in still get redacted bodies.

7. Non-goals for v0.1

  • Anthropic managed agents — deferred to v0.3.x (per scope O3).
  • Function calling / tool use for non-Anthropic providers — genai exposes this; we'll re-wrap in a v0.3.x or v0.4 release once the basic chat loop is solid.
  • Embeddingsgenai::Client::embed exists; wrapping is one more pass and it's its own scope. Defer.
  • Token-level cost accountingUsage reports raw token counts; pricing tables are a separate concern.
  • Multi-turn conversation persistence — caller manages ChatRequest::messages history.

8. Approval gate

This addendum is implemented when (a) status flips to APPROVED, (b) T1–T15 + S1 land green with ≥ 90% line coverage, © docs ask reaches a non-AiDisabled exit on the example tool, (d) rtb-update gains optional PAT auth via release_credential, (e) rtb-app::ReleaseSource expands to six variants.