Overview
LuMay ships as Docker containers. The same images run in every deployment model - LuMay-managed SaaS, your cloud subscription, your private VPC, or your on-prem Kubernetes cluster. Governance, observability, and the agent runtime are identical across all models. The only thing that changes is who manages the infrastructure.
This matters for regulated enterprises. Your compliance team shouldn't have to assess a different security architecture depending on where the product is deployed. With LuMay, the same RBAC, RLS, audit logging, and PII controls that apply in the SaaS model apply identically in the on-prem model - because they're embedded in the application, not in the infrastructure configuration.
LuMay SaaS
Customer Cloud
Private Cloud
On-Prem K8s
The repeated shape communicates portability more strongly than any claim.
Deployment Models
| Model | Who manages infra | Where data lives | Best for |
|---|---|---|---|
| LuMay SaaS | LuMay | LuMay's Azure tenancy | Fast start; teams that don't want to manage infrastructure |
| Customer cloud (BYOC) | LuMay manages apps; you own the subscription | Your Azure or AWS subscription | Data-residency requirements; shared responsibility model |
| Private cloud / VPC | Your ops team or LuMay | Your VPC with network isolation | Regulated industries with strict network controls |
| On-prem Kubernetes | Your ops team | Your data centre | Full air-gap capability; defence, government, high-security environments |
| Hybrid | Split between LuMay and customer | Management plane in cloud; data plane on-prem | Enterprises with mixed environments and evolving data strategies |
Build And Delivery Stack
| Component | Technology |
|---|---|
| Container build | Docker multi-stage builds (digest-pinned base images, non-root user) |
| Container registry | Azure Container Registry (ACR) - used across all deployment models |
| Default runtime | Azure Container Apps (ACA) with auto-scaling - used for SaaS and BYOC |
| Alternative runtime | Any Kubernetes distribution for private cloud, on-prem, and hybrid |
| CI/CD | GitHub Actions with GitVersion for semantic versioning |
| Database migrations | Alembic - run post-deploy as a separate step, not baked into container startup |
| Storage | Persistent volume claims for call recordings and voice samples |
| Observability | OpenTelemetry Collector → Grafana Tempo (traces), Prometheus (metrics), Loki (logs) |
Scaling
| Service | Min replicas | Max replicas | Scale trigger |
|---|---|---|---|
| Management API | 2 | 10 | CPU > 70% or p95 request latency > 500 ms |
| Voice Agent Engine | 1 | 20 | Concurrent WebSocket connections (active_calls_total metric) |
The Management API maintains a minimum of 2 replicas for availability. The Voice Agent Engine maintains a minimum of 1 replica to prevent cold-start latency on inbound calls - scale-to-zero is disabled for the voice service.
Related
- Architecture - the six-layer model that is constant across all deployment models
- Security & Governance - how tenant isolation and data residency work in each model
- Analytics & ROI - the same observability stack runs in every deployment model
- Platform overview - all four engines and four trust pillars