SOVEREIGN AI INFRASTRUCTURE
Technology
A production-grade stack, fully managed.
Numero6 is built on a curated selection of best-in-class open-source technologies—chosen for performance, auditability, and the complete absence of vendor lock-in. Every layer of the stack is deployed, configured, monitored, and maintained by our engineering team. You consume the capabilities; we ensure they operate at enterprise standard.
Open-source by design
Our technology approach is deliberate. We do not depend on opaque black-box platforms or closed vendor ecosystems that restrict visibility and create long-term lock-in. Instead, Numero6 is designed around open infrastructure, open deployment patterns, and modular components.
This architectural principle allows our engineering team to inspect, harden, and optimize every layer of the stack without being constrained by a single commercial vendor’s roadmap or pricing model. For our customers, this means absolute transparency, operational flexibility, and a clear path to long-term platform stability.
This architectural principle allows our engineering team to inspect, harden, and optimize every layer of the stack without being constrained by a single commercial vendor’s roadmap or pricing model. For our customers, this means absolute transparency, operational flexibility, and a clear path to long-term platform stability.

The Numero6 platform combines a modern AI interface layer, robust model-serving infrastructure, orchestration components, encrypted storage, and secure virtualization into a unified service architecture.
Deployment Architecture
Core technology stack
A production-grade stack curated for performance, auditability, and absolute data sovereignty
| Subsystem | Operational Role | Node Status & Tech |
|---|
| AI Interface & API Gateway | Conversational UI, OpenAI-compatible API, RAG, Pipelines, RBAC |
Open WebUI
|
| Model Runtime (Standard) | Model loading, dedicated inference, multi-model orchestration |
Ollama |
| Model Runtime (High Concurrency) | Production API-scale inference with PagedAttention |
vLLM
|
| Model Runtime (Optimized) | High-throughput, low-latency enterprise serving |
NVIDIA TensorRT / Triton
|
| Web Search Engine | Private, self-hosted meta-search aggregation |
SearXNG
|
| Agentic Tool Integration | External tools, APIs, and business system connectivity |
MCP
|
| GPU Compute | Dedicated NVIDIA GPU hardware per customer instance |
Nvidia GPUs
|
| Virtualisation Layer | Isolated, dedicated VM per customer |
Proxmox VE
|
| Storage & Observability | ZFS encryption, plus metrics, traces, and logs for infrastructure health |
ZFS, OTel, Grafana
|
Dedicated by default, scalable when needed
Each Numero6 customer receives a dedicated virtual machine on our Proxmox infrastructure, with exclusive access to one or more NVIDIA GPU cards. There is no shared GPU tenancy. Your GPU VRAM, compute, and network resources are allocated solely to your instance, providing deterministic performance characteristics and complete workload isolation.
For many customers, this primary dedicated deployment is the ideal foundation: stable, secure, and operationally straightforward.
For many customers, this primary dedicated deployment is the ideal foundation: stable, secure, and operationally straightforward.
Scaling for high throughput & concurrency
When operational requirements grow—such as demanding API workloads, multi-model concurrency, or enterprise-scale RAG pipelines—the architecture can easily scale beyond a single runtime node.
vLLM Nodes: for high-throughput scenarios, additional vLLM nodes can be provisioned and integrated with your primary Open WebUI instance. These nodes aggregate multiple compute resources under a single, unified interface, allowing you to serve specific models scaled precisely to your concurrency requirements.
TensorRT-LLM & Triton: for the most performance-sensitive production environments requiring optimized low-latency serving, Numero6 architectures can incorporate NVIDIA TensorRT-LLM alongside the Triton Inference Server. This provides a highly specialized GPU inference pipeline without abandoning the core principles of dedicated isolation.
The platform is designed to evolve. It can transition smoothly from a simpler dedicated deployment into an advanced distributed serving architecture, all while maintaining a unified access layer and strict governance boundaries.
Open-source commitment
Every component of the Numero6 stack is open-source software. This is not incidental—it is a deliberate architectural choice. Open-source software is auditable, has no vendor-controlled kill switch, and allows our engineering team to customize, patch, and optimize at every layer. Your service is not subject to arbitrary pricing changes, sudden terms-of-service revisions, or deprecation decisions by any commercial AI vendor. You benefit from openness and flexibility without inheriting the operational burden.