Skip to main content

Guardrails Architecture Diagram Progression

Published Confluence page for Project Palisade.

Confluence page ID: 6104317968 Parent folder ID: 6018662571 Remote version: 1 Last remote update: 2026-07-01T13:49:46.191Z Sync status: Published to Confluence.

Purpose

This document proposes a stepped set of Mermaid diagrams for explaining Project Palisade from the highest product level down into runtime architecture choices.

The intent is to give the team a shared visual language for how Palisade has been defined:

  • Multimodal input comes in.
  • Policy and safety configuration decide which checks run.
  • Palisade produces a structured JSON compliance report.
  • When remediation is enabled and applicable, Palisade also produces a remediated output.
  • The evaluation platform tests the same runtime contract before enforcement.

These diagrams are discussion artifacts, not final implementation blueprints.

Current Assumptions From Existing Docs

  • Palisade is the edge guardrails runtime, not the offline evaluation platform.
  • The runtime supports both proxy-style integration and explicit check API integration.
  • Policy profiles compose baseline safety rules with app-specific modules.
  • Modules can be deterministic, LLM-based, or backed by specialist models.
  • The evaluation PoC is local-first with MLflow, a mock target today, and a future Cloudflare Palisade endpoint later.
  • Cloudflare Workers should orchestrate requests, policy, and service calls. Workers are not the right place for arbitrary custom model binaries.
  • Supported catalog models can use Workers AI. Custom or specialist ML models should be called through approved HTTPS endpoints such as an AWS sandbox, Replicate, or a future approved container/model runtime.

Diagram 1: Product-Level Guardrail Flow

This is the simplest team-facing view. It shows Palisade as one guardrail box that accepts multimodal input and returns two possible outputs: a JSON report and, when enabled, a remediation result.

Step flow:

  1. A calling application sends text, image, audio, or another supported input.
  2. Safety configuration and the selected policy profile define what Palisade is allowed and required to run.
  3. Palisade normalizes the input, routes it to enabled modules, and aggregates module results.
  4. Palisade always returns the JSON report. A remediated output is returned only when policy allows remediation and the finding is eligible.

Diagram 2: Evaluation Platform Around The Same Contract

This view shows MLflow as the evaluation tool around Palisade. The evaluator validates reports and remediation output with deterministic checks, LLM-as-judge scoring, and sampled human review.

Step flow:

  1. The evaluation owner chooses a dataset, rubric, and policy profile.
  2. MLflow provides the evaluator system of record for experiments, runs, parameters, artifacts, and metrics.
  3. The runner iterates case by case so every dataset row is evaluated independently.
  4. The target adapter calls the system under evaluation, which can be a mock, local implementation, or Cloudflare Palisade endpoint.
  5. The validation layer checks the Palisade outputs with deterministic rules, LLM-as-judge scoring, and sampled human review.
  6. Metrics and failure artifacts are logged back to MLflow as validation evidence for the tested target, policy, module, and rubric versions.

Diagram 3: Input Normalization And Reusable Artifacts

Normalization converts each modality into a consistent artifact shape before checks run. Some normalized artifacts can be returned to the caller only when policy allows it.

Step flow:

  1. Palisade receives raw text, audio, image, or future video input.
  2. Each modality goes through the smallest useful normalization path.
  3. Audio can produce a transcript through STT. Images can be resized or filtered. Video can be split into sampled frames and passed to the image analysis path.
  4. Normalized artifacts feed the guardrail modules.
  5. Policy decides whether any normalized artifact, such as a transcript, can be returned to the caller for reuse.

Diagram 4: Composable Guardrail Runtime

This view opens the Palisade box. The policy profile controls which modules run, how strict they are, and whether remediation is allowed.

Step flow:

  1. The request is normalized into artifacts and metadata.
  2. The policy planner combines the baseline safety configuration, app policy profile, and module catalog.
  3. The orchestrator runs only the modules enabled by the execution plan.
  4. Module outputs are normalized into a common result format and aggregated into the JSON report.
  5. If the verdict and policy allow remediation, the remediation router runs. Otherwise the report is returned without a remediation payload.

Diagram 5: LLM-Based Module Path On Cloudflare

This is the Cloudflare-first path for LLM guardrail checks. The Worker remains the orchestrator. LLM calls go through the approved model path, with AI Gateway used for routing, observability, and control where appropriate.

Step flow:

  1. Every execution gets an execution_id. A conversation_id is optional and should be accepted only for an approved continuation flow.
  2. The Worker creates or loads a Durable Object for bounded execution/session state.
  3. The LLM harness assembles the system prompt, current input, bounded conversation context, output schema, and policy profile.
  4. The model call runs through AI Gateway to Workers AI or another approved provider.
  5. The module returns structured JSON to the report builder and emits telemetry that avoids leaking sensitive prompt or response content.

Durable Objects should be treated as bounded coordination state, not long-term unrestricted memory.

Diagram 6: Remediation Internals

Remediation is a second controlled workflow after detection. It uses the original input, normalized artifacts, and JSON report findings to decide whether and how to produce a safer output.

Step flow:

  1. Remediation receives the original input, normalized artifacts, report findings, policy profile, and remediation config.
  2. Policy decides whether the finding is eligible for remediation.
  3. The remediation router chooses the appropriate strategy for the modality and finding.
  4. Palisade re-checks the remediated output before returning it.
  5. The JSON report explains what was fine, what changed, and whether remediation passed. If no safe remediation exists, Palisade returns the report without a remediated output.

Diagram 7: Non-LLM And Specialist Model Path

This view covers deterministic rules, machine learning classifiers, computer vision models, audio classifiers, and other specialist checks. The Palisade Worker is the Cloudflare entry point and HTTP router. Workers AI can be used only when a required model is available in the supported catalog; custom or specialist models should be called through approved external HTTPS endpoints.

Step flow:

  1. The caller sends a request and policy-selected modules to the Palisade Worker.
  2. The Worker acts as the Cloudflare guardrail service: it orchestrates checks and routes model calls through fetch() over HTTPS.
  3. Deterministic and lightweight checks run directly in the Worker when practical.
  4. Specialist custom models run behind approved external HTTPS services. AI Gateway can sit in the path when useful for observability, retries, rate limits, or fallback. Workers AI is optional only for supported catalog models.
  5. Every check returns a normalized module result so the report builder does not depend on provider-specific shapes.
  6. Remediation uses the same policy-controlled decision path as other module types.

Diagram 8: Module Harness, Memory, State, And Evidence

This view shows the configuration and state around each module. It applies to LLM modules, deterministic modules, specialist model modules, and remediation modules.

Step flow:

  1. The profile compiler turns policy, safety baseline, and module catalog entries into an execution plan.
  2. The harness binds module configuration, prompt or model-card versions, schemas, thresholds, and timeout behavior.
  3. Durable Objects can provide bounded request, session, or streaming coordination when policy allows it.
  4. The module executes and emits a normalized result.
  5. The result contributes to the report and to validation evidence used by the evaluation platform.

Open Decisions

DecisionWhy it matters
First runtime surface: proxy, check API, or bothDetermines the first integration contract and test surface
First modalitiesText is likely first, but image and audio shape the module contract early
Custom model hosting pathWorkers AI for supported catalog models, or an external HTTPS model service for custom/specialist models, affects latency, compliance, and cost
Policy profile formatPolicies must compile into a clear execution plan without becoming too complex
Remediation scopeEach modality needs clear rules for when remediation is allowed versus blocked
Memory and session stateMemory should be opt-in, bounded, policy-controlled, and data-classification aware
Evaluation acceptance thresholdsThe team needs measurable success criteria before enforcement
Cloudflare approved servicesBAA, data localization, telemetry, and storage approvals affect the deployable architecture

Near-Term Recommendation

For the first MVP slice, keep the implementation target small:

  1. Define the JSON report contract.
  2. Define one policy profile format that enables a small set of text modules.
  3. Implement deterministic checks plus one LLM-based judge/check path.
  4. Add remediation only for one safe, bounded text case.
  5. Validate the behavior through the existing MLflow evaluation PoC before expanding to image, audio, memory, or custom model hosting.