1. Getting Started

RednBlue is a robustness testing platform for AI models. It runs 100% locally on your infrastructure: no model weights, no images, no prompts ever leave your machine. Only encrypted aggregated metrics are transmitted to generate the report and certificate.

4 steps to your first audit

pip install rednblue
PyPI GitHub Create Account
1

Install

pip install rednblue

Python 3.10+

2

Register

dashboard.rednblue.io

Free account

3

Buy a token

From €100

3 runs + 1 report

4

Test

rnb preview

Audit in 5-20 min

First audit, step by step

  1. Install the CLI:
    pip install rednblue
  2. Create an account: on dashboard.rednblue.io/register.
  3. Purchase a token: in the dashboard, go to Credits → Buy and choose the tier that matches your needs.
  4. Export your token:
    # Linux / macOS
    export RNB_TOKEN=RB-XXXXXX-XXXXXX
    
    # Windows
    set RNB_TOKEN=RB-XXXXXX-XXXXXX
  5. Run your first test:
    # Image classifierrnb preview --model resnet50.pth \
        --input ./images/ \
        --submit
  6. Download your report: in the dashboard, under Tests, download the PDF report and the certificate.

2. Installation

Prerequisites

  • Python 3.10+ (recommended 3.11 or 3.12)
  • pip 23+
  • RAM: 8 GB minimum, 16 GB recommended
  • GPU: optional but strongly recommended for vision (CUDA 11.8+ or MPS on Apple Silicon)
  • Disk space: 2 GB
  • Internet connection: required only for results submission (the audit itself runs 100% locally)

Linux / macOS

Standard installation in a single command:

pip install rednblue

Recommended: use a virtual environment.

python3 -m venv rednblue-env
source rednblue-env/bin/activate
pip install rednblue
rnb --version

Windows

Note for Windows

Some optional dependencies (desktop UI) require Microsoft C++ Build Tools on Windows. To avoid this, install the CLI only:

# CLI only (recommended)pip install rednblue

# With desktop UI# requires Microsoft C++ Build Toolspip install rednblue[ui]

If you see an error "Microsoft Visual C++ 14.0 or greater is required", see the Troubleshooting.

Verify the installation

rnb --version
# Expected output : rnb 3.0.0

rnb --help
# Lists available commands

GPU Setup

RednBlue auto-detects your GPU. To use CUDA, install PyTorch with the right driver:

# CUDA 11.8
pip install torch --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch --index-url https://download.pytorch.org/whl/cu121

# Verify availabilitypython -c "import torch; print(torch.cuda.is_available())"

To force the device: --device auto | cpu | cuda | mps.

3. Core Concepts

Zero-Knowledge

Your model, data, and prompts never leave your machine. RednBlue only receives encrypted aggregated metrics (AES-256 + HMAC-SHA256).

  • 100% local tests
  • No weights transmitted
  • GDPR/CNIL compliant

Tokens

One token = 3 CLI runs (for iteration) + 1 report download + 1 certificate download.

  • Freelancer: from €100
  • Enterprise: from €500

Tiers

Two tiers: Freelancer (fast assessment suite) and Enterprise (full suite, regulatory certification).

  • Freelancer: fast iteration during development
  • Enterprise: production certification

Grading

  • 🥇 GOLD — ≥ 90% (robust)
  • 🥈 SILVER — ≥ 75% (solid)
  • 🥉 BRONZE — ≥ 60% (acceptable)
  • FAIL — < 60% (no certificate)

FAIL: a technical report is generated, but no certificate is issued.

Certificates

Each audit produces a cryptographically signed certificate, valid 12 months, publicly verifiable.

  • PDF (A4 landscape)
  • Model SHA-256 hash
  • Verification URL

Reports

Ten-page technical report: per-dimension results, improvement recommendations, and executive summary.

4. Vision Models

Two model types are supported: image classifiers and object detectors. For each, RednBlue evaluates multiple robustness dimensions covering modern assessment methods from the academic literature (gradient-based perturbations, black-box methods, structured attacks).

4.1 Image Classifiers

Supported architectures: ResNet, VGG, DenseNet, MobileNet, EfficientNet, ConvNeXt, SqueezeNet, ShuffleNet, AlexNet, GoogLeNet, ViT, timm.

# Freelancer auditrnb preview --model resnet50.pth --input ./images/ --submit

# Enterprise audit (full suite)rnb preview --model resnet50.pth --input ./images/ --submit --tier enterprise

Assessed Dimensions

  • Noise Resilience — stability against sensor noise and compression artefacts
  • Input Perturbation Defense — resistance to subtle, calculated modifications
  • Classification Consistency — stability under gradient-based stress
  • Targeted Evasion Defense — resistance to deliberate misclassification attempts
  • Black-Box Resilience — resistance without access to model internals
  • Iterative Stress Tolerance — defense against sustained multi-step pressure
  • Query-Limited Defense — resistance with restricted interaction budget
  • Decision Boundary Stability — decision boundary robustness

The Freelancer tier assesses a subset of dimensions; the Enterprise tier assesses them all at multiple intensity levels.

4.2 Object Detectors

Supported versions: YOLOv5, YOLOv8, YOLOv9, YOLOv10, YOLOv11 (via Ultralytics API).

# Freelancer
rnb preview --model-type yolo --model best.pt --input ./images/ --submit

# Enterprise
rnb preview --model-type yolo --model best.pt --input ./images/ --submit \
    --tier enterprise --device cuda

Assessed Dimensions

  • Noise Resilience — stability against environmental interference
  • Input Perturbation Defense — resistance to imperceptible modifications
  • Detection Consistency — detection reliability under stress
  • Targeted Evasion Defense — resistance to forced disappearances/appearances
  • Multi-Object Stability — accuracy on crowded scenes
  • Object Persistence — detection persistence under perturbation
  • Black-Box Resilience — resistance without model access
  • Iterative Stress Tolerance — multi-step resistance
  • Query-Limited Defense — query-efficient defense

Built-in gradient masking sanity check (consistency verification between gradient-based and black-box methods).

5. LLM, RAG & Agents

RednBlue assesses LLM, RAG, and autonomous agent pipelines along dimensions aligned with OWASP LLM Top 10 and MITRE ATLAS standards.

5.1 Pipeline Auto-Detection

RednBlue parses your Python code, identifies the system type (plain LLM, RAG, autonomous agent), and recommends the matching assessment suite.

# Auto-detect + full auditrnb llm --file my_chatbot.py --auto

# Direct API providerrnb llm --provider openai --model gpt-4 --api-key $OPENAI_KEY

# Custom HTTP endpointrnb llm --endpoint https://api.example.com/chat \
    --auth-header "Authorization: Bearer $TOKEN"

Supported providers: OpenAI, Anthropic, Gemini, Mistral, Cohere, and any compatible HTTP endpoint.
Recognised frameworks: major chaining and vector storage frameworks.

5.2 Assessed Dimensions

Multiple dimension families, each tested across several scenarios to cover different levels of sophistication.

  • Guardrail Resistance — maintenance of safety guardrails against adversarial queries
  • System Prompt Confidentiality — protection against internal instruction leakage
  • Indirect Injection Defense — resistance to injections via external content
  • Tokenisation Robustness — stability against character-level manipulations
  • RAG Corpus Integrity — detection of poisoning and contextual manipulation
  • Agent Tool Safety — resistance to tool hijacking and privilege escalation
  • Goal Stability — maintenance of the initial objective across long executions
  • Backdoor Detection — identification of trigger-activated behaviours
  • Extraction Resistance — protection against model reconstruction attempts

5.3 Usage Examples

Chatbot

rnb llm --file chatbot.py --auto

RAG

rnb llm --file rag_app.py --auto

Agent

rnb llm --file agent.py --auto

6. CLI Reference

rnb preview

Runs a vision robustness audit (classifier or detector).

rnb preview [OPTIONS]

--model PATH              Path to model file (.pt, .pth, .onnx)--model-type TYPE         classifier (default) | yolo
--input PATH              Test image folder--tier TIER               freelancer (default) | enterprise
--device DEVICE           auto (default) | cpu | cuda | mps
--submit                  Submit results to the dashboard--verbose                 Verbose output

rnb llm

Runs an LLM, RAG, or agent audit.

rnb llm [OPTIONS]

--file PATH               Python file containing the pipeline--provider NAME           openai | anthropic | gemini | mistral | cohere
--model NAME              Model name--api-key STRING          API key (or via env variable)--endpoint URL            Custom HTTP endpoint--auto                    Auto-detect the suite--tier TIER               freelancer | enterprise

rnb status

Shows token and active session status.

rnb status              # current token statusrnb status --sessions   # list sessions

rnb ui

Launches the desktop UI (requires pip install rednblue[ui]).

rnb ui

7. Tokens & Pricing

Token Economics

Each token grants:

  • 3 CLI runs — to iterate on settings
  • 1 PDF report download
  • 1 certificate download

The token is consumed at report download. The certificate can be re-downloaded while the token is valid.

Pricing

Vision Freelancer

100 €

Development iteration

LLM

200 €

Conversational pipelines

Agents / RAG

300 €

Autonomous systems

Enterprise

500 €+

Production certification

All prices excluding taxes. Free re-certification within 30 days if the model is improved.

8. Regulatory Compliance

RednBlue reports and certificates are designed to serve as independent technical evidence in regulatory compliance processes. RednBlue is not a regulatory certification body; we provide the technical evidence you present to your auditor or regulator.

EU AI Act

Article 15: robustness, accuracy and cybersecurity requirements for high-risk AI systems.

NIST AI RMF

AI Risk Management Framework 1.0 — Govern, Map, Measure, Manage functions.

ISO/IEC 42001

AI Management System — requirements for organisations using AI.

UK DSIT

AI Cyber Security Code of Practice (UK Department for Science, Innovation and Technology).

Canada AIDA

Artificial Intelligence and Data Act — high-impact systems.

Singapore MAIGF

Model AI Governance Framework — IMDA/PDPC guidance.

9. Troubleshooting

"Microsoft Visual C++ 14.0 or greater is required" (Windows)

Cause: an optional dependency (desktop UI) requires a C++ compiler on Windows.

Solution:

# Option 1: install CLI only (recommended)pip install rednblue

# Option 2: install the Build Tools# https://visualstudio.microsoft.com/visual-cpp-build-tools/
# then:pip install rednblue[ui]

"CUDA not available" / GPU not detected

# Verify CUDApython -c "import torch; print(torch.cuda.is_available(), torch.version.cuda)"

# Reinstall PyTorch with CUDApip uninstall torch torchvision -y
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Force CPU if no GPUrnb preview --device cpu ...

"Token validation failed" / "Token consumed"

Possible causes:

  • Token expired (1 year since purchase)
  • Token already consumed (report downloaded)
  • RNB_TOKEN variable incorrectly set
rnb status

export RNB_TOKEN=RB-XXXXXX-XXXXXX  # Linux/macOS
set RNB_TOKEN=RB-XXXXXX-XXXXXX     # Windows

"Submit failed" / network error

# Connectivity testcurl -I https://api.rednblue.io/health

# If behind a corporate proxyexport HTTPS_PROXY=http://proxy.company.com:8080
rnb preview ... --submit

"CUDA out of memory"

# Reduce batch sizernb preview ... --batch-size 1

# Force CPUrnb preview ... --device cpu

"Unsupported model architecture"

Verify your model is in the supported architectures list. For Hugging Face models, avoid timm/ repositories (not directly loadable as classifiers).

10. FAQ

Does RednBlue see my model?

No. The model, images, and prompts stay entirely on your machine. RednBlue only receives aggregated numeric metrics (pass rates, scores, SHA-256 file hash) — never the weights, pixels, or text.

Is the RednBlue certificate regulatorily recognised?

The certificate is independent technical evidence. It is used to support your compliance file (EU AI Act, NIST, ISO 42001, etc.). RednBlue is not a regulatory certification body — we are the equivalent of an independent testing firm.

How long does an audit take?

Classifier (Freelancer): 2-5 min. Classifier (Enterprise): 5-15 min. Detector (Freelancer): 3-8 min. Detector (Enterprise): 15-40 min depending on the GPU. LLM: 2-10 min depending on the number of API calls.

Can I re-run the test after improving my model?

Yes. Free re-certification within 30 days if you address the dimensions identified as sensitive in your report. Beyond that, a new token is required.

Which model formats are supported?

Vision: .pt, .pth (PyTorch), .onnx, Hugging Face hub (except timm/), torchvision, timm.
YOLO: .pt (Ultralytics v5/v8/v9/v10/v11).
LLM: Python file (pipeline), API provider (OpenAI, Anthropic, Gemini, Mistral, Cohere), HTTP endpoint.

Is an academic license available?

Yes. Free academic license for researchers and students — contact us at contact@rednblue.io.

How do I verify a certificate's authenticity?

Every certificate carries a public verification URL. The recipient can access it and verify the model hash, audit date, and cryptographic signature.

What if my model receives a FAIL grade?

You receive a detailed technical report (sensitive dimensions are identified) but no certificate is issued until the minimum threshold is reached. The report guides you on how to improve the model.

Can I integrate RednBlue into my CI/CD?

The CLI runs in headless mode with JSON output and actionable return codes. An official GitHub Action is in development.

What is the difference between Freelancer and Enterprise?

Freelancer: fast assessment suite for iteration during development. Enterprise: full multi-intensity suite, designed for regulatory certification and production deployment.

Need Help?

Our team responds within 24 business hours.

support@rednblue.io GitHub Issues Dashboard