Skip to main content
SOVEREIGN AI INFRASTRUCTURE

Technology

A production-grade stack, fully managed.
 
Numero6 is built on a curated selection of best-in-class open-source technologies—chosen for performance, auditability, and the complete absence of vendor lock-in. Every layer of the stack is deployed, configured, monitored, and maintained by our engineering team. You consume the capabilities; we ensure they operate at enterprise standard.

Open-source by design

Our technology approach is deliberate. We do not depend on opaque black-box platforms or closed vendor ecosystems that restrict visibility and create long-term lock-in. Instead, Numero6 is designed around open infrastructure, open deployment patterns, and modular components.

This architectural principle allows our engineering team to inspect, harden, and optimize every layer of the stack without being constrained by a single commercial vendor’s roadmap or pricing model. For our customers, this means absolute transparency, operational flexibility, and a clear path to long-term platform stability.
technology_page_01.jpg
The Numero6 platform combines a modern AI interface layer, robust model-serving infrastructure, orchestration components, encrypted storage, and secure virtualization into a unified service architecture.

Deployment Architecture

Core technology stack

A production-grade stack curated for performance, auditability, and absolute data sovereignty

Subsystem Operational Role Node Status & Tech
AI Interface & API Gateway Conversational UI, OpenAI-compatible API, RAG, Pipelines, RBAC
Open WebUI
Model Runtime (Standard) Model loading, dedicated inference, multi-model orchestration
Ollama
Model Runtime (High Concurrency) Production API-scale inference with PagedAttention
vLLM
Model Runtime (Optimized) High-throughput, low-latency enterprise serving
NVIDIA TensorRT / Triton
Web Search Engine Private, self-hosted meta-search aggregation
SearXNG
Agentic Tool Integration External tools, APIs, and business system connectivity
MCP
GPU Compute Dedicated NVIDIA GPU hardware per customer instance
Nvidia GPUs
Virtualisation Layer Isolated, dedicated VM per customer
Proxmox VE
Storage & Observability ZFS encryption, plus metrics, traces, and logs for infrastructure health
ZFS, OTel, Grafana

Dedicated by default, scalable when needed

Each Numero6 customer receives a dedicated virtual machine on our Proxmox infrastructure, with exclusive access to one or more NVIDIA GPU cards. There is no shared GPU tenancy. Your GPU VRAM, compute, and network resources are allocated solely to your instance, providing deterministic performance characteristics and complete workload isolation.

For many customers, this primary dedicated deployment is the ideal foundation: stable, secure, and operationally straightforward.

Scaling for high throughput & concurrency

When operational requirements grow—such as demanding API workloads, multi-model concurrency, or enterprise-scale RAG pipelines—the architecture can easily scale beyond a single runtime node.

vLLM Nodes: for high-throughput scenarios, additional vLLM nodes can be provisioned and integrated with your primary Open WebUI instance. These nodes aggregate multiple compute resources under a single, unified interface, allowing you to serve specific models scaled precisely to your concurrency requirements.
TensorRT-LLM & Triton: for the most performance-sensitive production environments requiring optimized low-latency serving, Numero6 architectures can incorporate NVIDIA TensorRT-LLM alongside the Triton Inference Server. This provides a highly specialized GPU inference pipeline without abandoning the core principles of dedicated isolation.

The platform is designed to evolve. It can transition smoothly from a simpler dedicated deployment into an advanced distributed serving architecture, all while maintaining a unified access layer and strict governance boundaries.

Open-source commitment

Every component of the Numero6 stack is open-source software. This is not incidental—it is a deliberate architectural choice. Open-source software is auditable, has no vendor-controlled kill switch, and allows our engineering team to customize, patch, and optimize at every layer. Your service is not subject to arbitrary pricing changes, sudden terms-of-service revisions, or deprecation decisions by any commercial AI vendor. You benefit from openness and flexibility without inheriting the operational burden.