SOVEREIGN AI INFRASTRUCTURE

Technology

A production-grade stack, fully managed.

Numero6 is built on a curated selection of best-in-class open-source technologies—chosen for performance, auditability, and the complete absence of vendor lock-in. Every layer of the stack is deployed, configured, monitored, and maintained by our engineering team. You consume the capabilities; we ensure they operate at enterprise standard.

Our technology approach is deliberate. We do not depend on opaque black-box platforms or closed vendor ecosystems that restrict visibility and create long-term lock-in. Instead, Numero6 is designed around open infrastructure, open deployment patterns, and modular components.

This architectural principle allows our engineering team to inspect, harden, and optimize every layer of the stack without being constrained by a single commercial vendor’s roadmap or pricing model. For our customers, this means absolute transparency, operational flexibility, and a clear path to long-term platform stability.

Image

The Numero6 platform combines a modern AI interface layer, robust model-serving infrastructure, orchestration components, encrypted storage, and secure virtualization into a unified service architecture.

Deployment Architecture

Core technology stack

A production-grade stack curated for performance, auditability, and absolute data sovereignty

Subsystem	Operational Role	Node Status & Tech

AI Interface & API Gateway	Conversational UI, OpenAI-compatible API, RAG, Pipelines, RBAC	Open WebUI
Model Runtime (Standard)	Model loading, dedicated inference, multi-model orchestration	Ollama
Model Runtime (High Concurrency)	Production API-scale inference with PagedAttention	vLLM
Model Runtime (Optimized)	High-throughput, low-latency enterprise serving	NVIDIA TensorRT / Triton
Web Search Engine	Private, self-hosted meta-search aggregation	SearXNG
Agentic Tool Integration	External tools, APIs, and business system connectivity	MCP
GPU Compute	Dedicated NVIDIA GPU hardware per customer instance	Nvidia GPUs
Virtualisation Layer	Isolated, dedicated VM per customer	Proxmox VE
Storage & Observability	ZFS encryption, plus metrics, traces, and logs for infrastructure health	ZFS, OTel, Grafana

Each Numero6 customer receives a dedicated virtual machine on our Proxmox infrastructure, with exclusive access to one or more NVIDIA GPU cards. There is no shared GPU tenancy. Your GPU VRAM, compute, and network resources are allocated solely to your instance, providing deterministic performance characteristics and complete workload isolation.

For many customers, this primary dedicated deployment is the ideal foundation: stable, secure, and operationally straightforward.

Scaling for high throughput & concurrency

When operational requirements grow—such as demanding API workloads, multi-model concurrency, or enterprise-scale RAG pipelines—the architecture can easily scale beyond a single runtime node.

vLLM Nodes: for high-throughput scenarios, additional vLLM nodes can be provisioned and integrated with your primary Open WebUI instance. These nodes aggregate multiple compute resources under a single, unified interface, allowing you to serve specific models scaled precisely to your concurrency requirements.

TensorRT-LLM & Triton: for the most performance-sensitive production environments requiring optimized low-latency serving, Numero6 architectures can incorporate NVIDIA TensorRT-LLM alongside the Triton Inference Server. This provides a highly specialized GPU inference pipeline without abandoning the core principles of dedicated isolation.

The platform is designed to evolve. It can transition smoothly from a simpler dedicated deployment into an advanced distributed serving architecture, all while maintaining a unified access layer and strict governance boundaries.

Open-source commitment

Every component of the Numero6 stack is open-source software. This is not incidental—it is a deliberate architectural choice. Open-source software is auditable, has no vendor-controlled kill switch, and allows our engineering team to customize, patch, and optimize at every layer. Your service is not subject to arbitrary pricing changes, sudden terms-of-service revisions, or deprecation decisions by any commercial AI vendor. You benefit from openness and flexibility without inheriting the operational burden.

SOVEREIGN AI INFRASTRUCTURE

Technology

Open-source by design

Core technology stack

Dedicated by default, scalable when needed

Scaling for high throughput & concurrency