Documentation
Installation guides, CLI reference, assessment dimensions, and troubleshooting
1. Getting Started
RednBlue is a robustness testing platform for AI models. It runs 100% locally on your infrastructure: no model weights, no images, no prompts ever leave your machine. Only encrypted aggregated metrics are transmitted to generate the report and certificate.
Install
pip install rednblue
Python 3.10+
Buy a token
From €1003 runs + 1 report
Test
rnb preview
Audit in 5-20 min
First audit, step by step
-
Install the CLI:
pip install rednblue - Create an account: on dashboard.rednblue.io/register.
- Purchase a token: in the dashboard, go to Credits → Buy and choose the tier that matches your needs.
-
Export your token:
# Linux / macOS export RNB_TOKEN=RB-XXXXXX-XXXXXX # Windows set RNB_TOKEN=RB-XXXXXX-XXXXXX -
Run your first test:
# Image classifierrnb preview --model resnet50.pth \ --input ./images/ \ --submit - Download your report: in the dashboard, under Tests, download the PDF report and the certificate.
2. Installation
Prerequisites
- Python 3.10+ (recommended 3.11 or 3.12)
- pip 23+
- RAM: 8 GB minimum, 16 GB recommended
- GPU: optional but strongly recommended for vision (CUDA 11.8+ or MPS on Apple Silicon)
- Disk space: 2 GB
- Internet connection: required only for results submission (the audit itself runs 100% locally)
Linux / macOS
Standard installation in a single command:
pip install rednblue
Recommended: use a virtual environment.
python3 -m venv rednblue-env
source rednblue-env/bin/activate
pip install rednblue
rnb --version
Windows
Note for Windows
Some optional dependencies (desktop UI) require Microsoft C++ Build Tools on Windows. To avoid this, install the CLI only:
# CLI only (recommended)pip install rednblue
# With desktop UI# requires Microsoft C++ Build Toolspip install rednblue[ui]
If you see an error "Microsoft Visual C++ 14.0 or greater is required", see the Troubleshooting.
Verify the installation
rnb --version
# Expected output : rnb 3.0.0
rnb --help
# Lists available commands
GPU Setup
RednBlue auto-detects your GPU. To use CUDA, install PyTorch with the right driver:
# CUDA 11.8
pip install torch --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch --index-url https://download.pytorch.org/whl/cu121
# Verify availabilitypython -c "import torch; print(torch.cuda.is_available())"
To force the device: --device auto | cpu | cuda | mps.
3. Core Concepts
Zero-Knowledge
Your model, data, and prompts never leave your machine. RednBlue only receives encrypted aggregated metrics (AES-256 + HMAC-SHA256).
- 100% local tests
- No weights transmitted
- GDPR/CNIL compliant
Tokens
One token = 3 CLI runs (for iteration) + 1 report download + 1 certificate download.
- Freelancer: from €100
- Enterprise: from €500
Tiers
Two tiers: Freelancer (fast assessment suite) and Enterprise (full suite, regulatory certification).
- Freelancer: fast iteration during development
- Enterprise: production certification
Grading
- 🥇 GOLD — ≥ 90% (robust)
- 🥈 SILVER — ≥ 75% (solid)
- 🥉 BRONZE — ≥ 60% (acceptable)
- ❌ FAIL — < 60% (no certificate)
FAIL: a technical report is generated, but no certificate is issued.
Certificates
Each audit produces a cryptographically signed certificate, valid 12 months, publicly verifiable.
- PDF (A4 landscape)
- Model SHA-256 hash
- Verification URL
Reports
Ten-page technical report: per-dimension results, improvement recommendations, and executive summary.
4. Vision Models
Two model types are supported: image classifiers and object detectors. For each, RednBlue evaluates multiple robustness dimensions covering modern assessment methods from the academic literature (gradient-based perturbations, black-box methods, structured attacks).
4.1 Image Classifiers
Supported architectures: ResNet, VGG, DenseNet, MobileNet, EfficientNet, ConvNeXt, SqueezeNet, ShuffleNet, AlexNet, GoogLeNet, ViT, timm.
# Freelancer auditrnb preview --model resnet50.pth --input ./images/ --submit
# Enterprise audit (full suite)rnb preview --model resnet50.pth --input ./images/ --submit --tier enterprise
Assessed Dimensions
- Noise Resilience — stability against sensor noise and compression artefacts
- Input Perturbation Defense — resistance to subtle, calculated modifications
- Classification Consistency — stability under gradient-based stress
- Targeted Evasion Defense — resistance to deliberate misclassification attempts
- Black-Box Resilience — resistance without access to model internals
- Iterative Stress Tolerance — defense against sustained multi-step pressure
- Query-Limited Defense — resistance with restricted interaction budget
- Decision Boundary Stability — decision boundary robustness
The Freelancer tier assesses a subset of dimensions; the Enterprise tier assesses them all at multiple intensity levels.
4.2 Object Detectors
Supported versions: YOLOv5, YOLOv8, YOLOv9, YOLOv10, YOLOv11 (via Ultralytics API).
# Freelancer
rnb preview --model-type yolo --model best.pt --input ./images/ --submit
# Enterprise
rnb preview --model-type yolo --model best.pt --input ./images/ --submit \
--tier enterprise --device cuda
Assessed Dimensions
- Noise Resilience — stability against environmental interference
- Input Perturbation Defense — resistance to imperceptible modifications
- Detection Consistency — detection reliability under stress
- Targeted Evasion Defense — resistance to forced disappearances/appearances
- Multi-Object Stability — accuracy on crowded scenes
- Object Persistence — detection persistence under perturbation
- Black-Box Resilience — resistance without model access
- Iterative Stress Tolerance — multi-step resistance
- Query-Limited Defense — query-efficient defense
Built-in gradient masking sanity check (consistency verification between gradient-based and black-box methods).
5. LLM, RAG & Agents
RednBlue assesses LLM, RAG, and autonomous agent pipelines along dimensions aligned with OWASP LLM Top 10 and MITRE ATLAS standards.
5.1 Pipeline Auto-Detection
RednBlue parses your Python code, identifies the system type (plain LLM, RAG, autonomous agent), and recommends the matching assessment suite.
# Auto-detect + full auditrnb llm --file my_chatbot.py --auto
# Direct API providerrnb llm --provider openai --model gpt-4 --api-key $OPENAI_KEY
# Custom HTTP endpointrnb llm --endpoint https://api.example.com/chat \
--auth-header "Authorization: Bearer $TOKEN"
Supported providers:
OpenAI, Anthropic, Gemini, Mistral, Cohere, and any compatible HTTP endpoint.
Recognised frameworks:
major chaining and vector storage frameworks.
5.2 Assessed Dimensions
Multiple dimension families, each tested across several scenarios to cover different levels of sophistication.
- Guardrail Resistance — maintenance of safety guardrails against adversarial queries
- System Prompt Confidentiality — protection against internal instruction leakage
- Indirect Injection Defense — resistance to injections via external content
- Tokenisation Robustness — stability against character-level manipulations
- RAG Corpus Integrity — detection of poisoning and contextual manipulation
- Agent Tool Safety — resistance to tool hijacking and privilege escalation
- Goal Stability — maintenance of the initial objective across long executions
- Backdoor Detection — identification of trigger-activated behaviours
- Extraction Resistance — protection against model reconstruction attempts
5.3 Usage Examples
Chatbot
rnb llm --file chatbot.py --auto
RAG
rnb llm --file rag_app.py --auto
Agent
rnb llm --file agent.py --auto
6. CLI Reference
rnb preview
Runs a vision robustness audit (classifier or detector).
rnb preview [OPTIONS]
--model PATH Path to model file (.pt, .pth, .onnx)--model-type TYPE classifier (default) | yolo
--input PATH Test image folder--tier TIER freelancer (default) | enterprise
--device DEVICE auto (default) | cpu | cuda | mps
--submit Submit results to the dashboard--verbose Verbose output
rnb llm
Runs an LLM, RAG, or agent audit.
rnb llm [OPTIONS]
--file PATH Python file containing the pipeline--provider NAME openai | anthropic | gemini | mistral | cohere
--model NAME Model name--api-key STRING API key (or via env variable)--endpoint URL Custom HTTP endpoint--auto Auto-detect the suite--tier TIER freelancer | enterprise
rnb status
Shows token and active session status.
rnb status # current token statusrnb status --sessions # list sessions
rnb ui
Launches the desktop UI (requires pip install rednblue[ui]).
rnb ui
7. Tokens & Pricing
Token Economics
Each token grants:
- 3 CLI runs — to iterate on settings
- 1 PDF report download
- 1 certificate download
The token is consumed at report download. The certificate can be re-downloaded while the token is valid.
Pricing
Vision Freelancer
100 €
Development iteration
LLM
200 €
Conversational pipelines
Agents / RAG
300 €
Autonomous systems
Enterprise
500 €+
Production certification
All prices excluding taxes. Free re-certification within 30 days if the model is improved.
8. Regulatory Compliance
RednBlue reports and certificates are designed to serve as independent technical evidence in regulatory compliance processes. RednBlue is not a regulatory certification body; we provide the technical evidence you present to your auditor or regulator.
EU AI Act
Article 15: robustness, accuracy and cybersecurity requirements for high-risk AI systems.
NIST AI RMF
AI Risk Management Framework 1.0 — Govern, Map, Measure, Manage functions.
ISO/IEC 42001
AI Management System — requirements for organisations using AI.
UK DSIT
AI Cyber Security Code of Practice (UK Department for Science, Innovation and Technology).
Canada AIDA
Artificial Intelligence and Data Act — high-impact systems.
Singapore MAIGF
Model AI Governance Framework — IMDA/PDPC guidance.
9. Troubleshooting
"Microsoft Visual C++ 14.0 or greater is required" (Windows)
Cause: an optional dependency (desktop UI) requires a C++ compiler on Windows.
Solution:
# Option 1: install CLI only (recommended)pip install rednblue
# Option 2: install the Build Tools# https://visualstudio.microsoft.com/visual-cpp-build-tools/
# then:pip install rednblue[ui]
"CUDA not available" / GPU not detected
# Verify CUDApython -c "import torch; print(torch.cuda.is_available(), torch.version.cuda)"
# Reinstall PyTorch with CUDApip uninstall torch torchvision -y
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# Force CPU if no GPUrnb preview --device cpu ...
"Token validation failed" / "Token consumed"
Possible causes:
- Token expired (1 year since purchase)
- Token already consumed (report downloaded)
- RNB_TOKEN variable incorrectly set
rnb status
export RNB_TOKEN=RB-XXXXXX-XXXXXX # Linux/macOS
set RNB_TOKEN=RB-XXXXXX-XXXXXX # Windows
"Submit failed" / network error
# Connectivity testcurl -I https://api.rednblue.io/health
# If behind a corporate proxyexport HTTPS_PROXY=http://proxy.company.com:8080
rnb preview ... --submit
"CUDA out of memory"
# Reduce batch sizernb preview ... --batch-size 1
# Force CPUrnb preview ... --device cpu
"Unsupported model architecture"
Verify your model is in the supported architectures list. For Hugging Face models, avoid timm/ repositories (not directly loadable as classifiers).
10. FAQ
Does RednBlue see my model?
No. The model, images, and prompts stay entirely on your machine. RednBlue only receives aggregated numeric metrics (pass rates, scores, SHA-256 file hash) — never the weights, pixels, or text.
Is the RednBlue certificate regulatorily recognised?
The certificate is independent technical evidence. It is used to support your compliance file (EU AI Act, NIST, ISO 42001, etc.). RednBlue is not a regulatory certification body — we are the equivalent of an independent testing firm.
How long does an audit take?
Classifier (Freelancer): 2-5 min. Classifier (Enterprise): 5-15 min. Detector (Freelancer): 3-8 min. Detector (Enterprise): 15-40 min depending on the GPU. LLM: 2-10 min depending on the number of API calls.
Can I re-run the test after improving my model?
Yes. Free re-certification within 30 days if you address the dimensions identified as sensitive in your report. Beyond that, a new token is required.
Which model formats are supported?
Vision: .pt, .pth (PyTorch), .onnx, Hugging Face hub (except timm/), torchvision, timm.
YOLO: .pt (Ultralytics v5/v8/v9/v10/v11).
LLM: Python file (pipeline), API provider (OpenAI, Anthropic, Gemini, Mistral, Cohere), HTTP endpoint.
Is an academic license available?
Yes. Free academic license for researchers and students — contact us at contact@rednblue.io.
How do I verify a certificate's authenticity?
Every certificate carries a public verification URL. The recipient can access it and verify the model hash, audit date, and cryptographic signature.
What if my model receives a FAIL grade?
You receive a detailed technical report (sensitive dimensions are identified) but no certificate is issued until the minimum threshold is reached. The report guides you on how to improve the model.
Can I integrate RednBlue into my CI/CD?
The CLI runs in headless mode with JSON output and actionable return codes. An official GitHub Action is in development.
What is the difference between Freelancer and Enterprise?
Freelancer: fast assessment suite for iteration during development. Enterprise: full multi-intensity suite, designed for regulatory certification and production deployment.