DraftConfidential

SW-SAAS-ARCH-001

Swedwise SaaS Platform - Technical Architecture

Version

1.0

Owner

Technical Lead

Effective Date

2025-01-15

Review Date

2026-01-15

Swedwise SaaS Platform - Technical Architecture

Platform: Swedwise SaaS Platform
Date: 2025-01-15
Version: 1.0
Classification: Confidential

Executive Summary

This document describes the technical architecture of the Swedwise SaaS Platform, deployed on a Kubernetes-based infrastructure in a Swedish data center. The architecture is designed for multi-tenancy, high availability, security, and scalability to support enterprise-grade SaaS service components.

Platform Characteristics:

Multi-tenant SaaS architecture with strict data isolation
Kubernetes orchestration for automatic scaling and resilience
99.9% availability SLA with redundant components
Swedish data residency for GDPR compliance
ISO 27001 certified security controls

Service Components:
The platform hosts multiple service components, each documented in separate technical architecture addendums:

Component	Document ID	Description
Communications	SW-SAAS-ARCH-COMP-001	OpenText Exstream document generation
Notifications	SW-SAAS-ARCH-COMP-002	Multi-channel notification delivery (Email, SMS)
[Future]	-	Additional service components

1. High-Level Architecture

1.1. Architecture Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                           CUSTOMER LAYER                                 │
│                                                                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                 │
│  │   Customer   │  │   Customer   │  │   Customer   │                 │
│  │   Tenant A   │  │   Tenant B   │  │   Tenant C   │                 │
│  │              │  │              │  │              │                 │
│  │  Users/Apps  │  │  Users/Apps  │  │  Users/Apps  │                 │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘                 │
└─────────┼──────────────────┼──────────────────┼────────────────────────┘
          │                  │                  │
          └──────────────────┴──────────────────┘
                             │
                   ┌─────────▼─────────┐
                   │   Internet/VPN    │
                   └─────────┬─────────┘
                             │
┌────────────────────────────▼──────────────────────────────────────────┐
│                     NETWORK SECURITY LAYER                             │
│                                                                        │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │  Fortinet Next-Gen Firewall (IDS/IPS)                        │   │
│  │  - DDoS Protection                                            │   │
│  │  - Threat Intelligence                                        │   │
│  │  - SSL/TLS Inspection                                         │   │
│  └──────────────────────┬────────────────────────────────────────┘   │
└─────────────────────────┼──────────────────────────────────────────────┘
                          │
┌─────────────────────────▼──────────────────────────────────────────────┐
│                   LOAD BALANCING & INGRESS LAYER                       │
│                                                                        │
│  ┌────────────────────────────────────────────────────────────────┐  │
│  │  Kubernetes Ingress Controllers (Redundant)                    │  │
│  │  - TLS Termination                                             │  │
│  │  - Layer 7 Routing                                             │  │
│  │  - Rate Limiting                                               │  │
│  └────────────────────────┬───────────────────────────────────────┘  │
└─────────────────────────┼──────────────────────────────────────────────┘
                          │
┌─────────────────────────▼──────────────────────────────────────────────┐
│                     KUBERNETES CLUSTER LAYER                           │
│                      (OpenText Experience Cloud)                       │
│                                                                        │
│  ┌────────────────────────────────────────────────────────────────┐  │
│  │                     CONTROL PLANE                              │  │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐                     │  │
│  │  │   etcd   │  │   API    │  │Scheduler │                     │  │
│  │  │  (HA)    │  │  Server  │  │Controller│                     │  │
│  │  └──────────┘  └──────────┘  └──────────┘                     │  │
│  └────────────────────────────────────────────────────────────────┘  │
│                                                                        │
│  ┌────────────────────────────────────────────────────────────────┐  │
│  │                     WORKER NODES (Redundant)                   │  │
│  │                                                                │  │
│  │  ┌─────────────────────────────────────────────────────────┐  │  │
│  │  │  APPLICATION POD LAYER                                  │  │  │
│  │  │                                                         │  │  │
│  │  │  ┌───────────────┐  ┌───────────────┐  ┌─────────────┐│  │  │
│  │  │  │  Service      │  │  Service      │  │  Tenant     ││  │  │
│  │  │  │  Component A  │  │  Component B  │  │  Management ││  │  │
│  │  │  │  (Pods)       │  │  (Pods)       │  │  Services   ││  │  │
│  │  │  │               │  │               │  │             ││  │  │
│  │  │  │  Multi-tenant │  │  Multi-tenant │  │             ││  │  │
│  │  │  └───────┬───────┘  └───────┬───────┘  └──────┬──────┘│  │  │
│  │  └──────────┼──────────────────┼─────────────────┼───────┘  │  │
│  │             │                  │                 │          │  │
│  │  ┌──────────▼──────────────────▼─────────────────▼───────┐  │  │
│  │  │  SHARED SERVICES LAYER                                │  │  │
│  │  │                                                        │  │  │
│  │  │  ┌────────────┐  ┌────────────┐  ┌────────────────┐  │  │  │
│  │  │  │ Identity & │  │   API      │  │   Integration  │  │  │  │
│  │  │  │   Auth     │  │  Gateway   │  │     Broker     │  │  │  │
│  │  │  │  (SSO/MFA) │  │            │  │                │  │  │  │
│  │  │  └────────────┘  └────────────┘  └────────────────┘  │  │  │
│  │  └────────────────────────────────────────────────────────┘  │  │
│  └────────────────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────────────────┘
                          │
┌─────────────────────────▼──────────────────────────────────────────────┐
│                       DATA PERSISTENCE LAYER                           │
│                                                                        │
│  ┌────────────────┐  ┌────────────────┐  ┌─────────────────────┐    │
│  │  PostgreSQL    │  │   Document     │  │   Object Storage    │    │
│  │  Cluster (HA)  │  │   Database     │  │   (S3-compatible)   │    │
│  │                │  │   (Per-Tenant) │  │                     │    │
│  │  - Tenant Meta │  │                │  │  - Generated Docs   │    │
│  │  - Config DB   │  │  - Templates   │  │  - Assets/Media     │    │
│  │  - User Data   │  │  - Job History │  │  - Archived Output  │    │
│  └────────────────┘  └────────────────┘  └─────────────────────┘    │
└────────────────────────────────────────────────────────────────────────┘
                          │
┌─────────────────────────▼──────────────────────────────────────────────┐
│                   MONITORING & OBSERVABILITY LAYER                     │
│                                                                        │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌──────────────┐   │
│  │ Prometheus │  │  Grafana   │  │    ELK     │  │  Alerting    │   │
│  │  Metrics   │  │ Dashboards │  │   Logs     │  │   Manager    │   │
│  └────────────┘  └────────────┘  └────────────┘  └──────────────┘   │
└────────────────────────────────────────────────────────────────────────┘
                          │
┌─────────────────────────▼──────────────────────────────────────────────┐
│                     BACKUP & DISASTER RECOVERY                         │
│                                                                        │
│  Primary DC (Sweden)  ───────────────────▶  Secondary DC (Sweden)     │
│  - Every 6 hours                             - Async Replication       │
│  - 7 days retention                          - DR Site                 │
└────────────────────────────────────────────────────────────────────────┘

1.2. Platform Technology Stack

Layer	Technology	Purpose
Orchestration	Kubernetes	Container orchestration, auto-scaling, self-healing
Container Runtime	Docker	Application containerization
Database	PostgreSQL (HA cluster)	Relational data storage
Object Storage	S3-compatible storage	Document, template, and asset storage
Cache	Redis Cluster	Session management, caching
Message Queue	RabbitMQ/Kafka	Asynchronous job processing
Load Balancing	Kubernetes Ingress / NGINX	Traffic distribution, SSL/TLS termination
Firewall	Fortinet Next-Gen Firewall	Network security, IDS/IPS
Monitoring	Prometheus + Grafana	Metrics collection and visualization
Logging	ELK Stack (Elasticsearch, Logstash, Kibana)	Centralized logging and analysis
Secrets Management	Kubernetes Secrets / HashiCorp Vault	Secure credential storage

Service Component Technologies (see component addendums for details):

Communications: OpenText Communications (Exstream)
Notifications: OpenText Notifications + Email/SMS Gateways

2. Platform Component Overview

2.1. Kubernetes Cluster Architecture

The platform runs on a dedicated Kubernetes cluster with the following characteristics:

Control Plane (High Availability)

3x Master Nodes: Redundant control plane for fault tolerance
etcd Cluster: Distributed key-value store for cluster state (3+ nodes)
API Server: RESTful API for cluster management
Scheduler: Pod placement and resource allocation
Controller Manager: Cluster-level functions (replication, endpoints, service accounts)

Worker Nodes

Minimum 6 Worker Nodes: Distributed across multiple physical hosts
Auto-scaling: Dynamic node provisioning based on workload
Taints and Tolerations: Dedicated nodes for sensitive workloads
Node Affinity: Pod placement rules for tenant isolation

Pod Architecture

Each application component runs as a microservice in a pod:

Platform Services:

Pod Type	Replicas	Resources	Purpose
Tenant Management	2+	2 CPU, 4 GB RAM	Multi-tenant orchestration
API Gateway	3+	2 CPU, 4 GB RAM	API routing and rate limiting
Auth Service	3+	2 CPU, 4 GB RAM	Authentication and SSO
Integration Broker	2+	2 CPU, 8 GB RAM	External system integration

Service Component Pods (see component addendums for detailed specifications):

Service Component	Document	Pod Types
Communications	SW-SAAS-ARCH-COMP-001	Exstream API, Designer
Notifications	SW-SAAS-ARCH-COMP-002	Notification Engine, Queue Workers

2.2. Database Layer

PostgreSQL High-Availability Cluster

┌─────────────────────────────────────────────────────────┐
│         PostgreSQL HA Cluster (Patroni)                 │
│                                                         │
│  ┌─────────────┐     ┌─────────────┐     ┌──────────┐ │
│  │   Primary   │────▶│  Replica 1  │────▶│Replica 2 │ │
│  │   (Write)   │     │   (Read)    │     │  (Read)  │ │
│  └─────────────┘     └─────────────┘     └──────────┘ │
│         │                                               │
│         ▼                                               │
│  ┌─────────────────────────────────────────────────┐   │
│  │  WAL Archiving & Point-in-Time Recovery        │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Key Features:

Automatic Failover: Patroni manages leader election (< 30 seconds RTO)
Streaming Replication: Synchronous replication to primary replica
Read Replicas: Scale read operations across multiple replicas
Connection Pooling: PgBouncer for efficient connection management

Database Separation:

Platform Database: Tenant metadata, user accounts, subscriptions
Tenant Databases: Per-tenant data isolation (dedicated schemas or databases)
Audit Database: Security events, access logs, change tracking

Object Storage (S3-Compatible)

Storage Classes:

Hot Storage: Frequently accessed documents (generated output, active templates)
Warm Storage: Archived documents (30-90 days)
Cold Storage: Long-term archival (compliance retention)

Storage Structure per Tenant:

/tenants/{tenant-id}/
  ├── templates/          # Document templates
  ├── assets/             # Images, fonts, branding
  ├── output/             # Generated documents
  │   ├── active/         # Last 30 days
  │   └── archive/        # Older documents
  └── uploads/            # Customer-uploaded content

2.3. Platform Services

The Swedwise SaaS Platform provides foundational multi-tenant capabilities:

Core Platform Services

Tenant Provisioning: Automated tenant creation and configuration
Identity Management: Centralized authentication with SSO/SAML support
API Management: Rate limiting, throttling, API versioning
Usage Metering: Transaction tracking for billing
Analytics Engine: Usage analytics and reporting

Integration Framework

REST API: Standard RESTful APIs for all services
Webhooks: Event-driven integrations
File Transfer: SFTP/FTPS for batch processing
Message Queue: Asynchronous job processing (Kafka/RabbitMQ)

Service Component Integration

Each service component integrates with the platform through:

Shared authentication and authorization
Common API gateway routing
Unified monitoring and logging
Centralized configuration management

For component-specific integration details, see the respective architecture addendums.

3. Multi-Tenant Architecture

3.1. Tenant Isolation Model

The platform implements a hybrid multi-tenant architecture with multiple layers of isolation:

┌─────────────────────────────────────────────────────────────┐
│                    ISOLATION LAYERS                         │
│                                                             │
│  Layer 1: Application Logic (Shared Pods)                  │
│  ├── Shared application code                               │
│  ├── Per-tenant configuration injection                    │
│  └── Context-based data filtering                          │
│                                                             │
│  Layer 2: Database Isolation                               │
│  ├── Separate database schemas per tenant                  │
│  ├── Row-level security policies                           │
│  └── Encrypted tenant keys                                 │
│                                                             │
│  Layer 3: Storage Isolation                                │
│  ├── Tenant-specific object storage paths                  │
│  ├── Access control policies (IAM)                         │
│  └── Encryption with tenant-specific keys                  │
│                                                             │
│  Layer 4: Network Isolation (Optional for Sensitive)       │
│  ├── Dedicated namespaces                                  │
│  ├── Network policies                                      │
│  └── Private networking                                    │
└─────────────────────────────────────────────────────────────┘

3.2. Tenant Configuration

Each tenant has a dedicated configuration profile:

tenant:
  id: "tenant-abc-123"
  name: "Acme Corporation"
  status: active
  tier: enterprise

  resources:
    database_schema: "tenant_abc_123"
    storage_bucket: "tenants/tenant-abc-123/"
    namespace: "default"  # or dedicated namespace

  quotas:
    max_users: 100
    max_storage_gb: 500
    max_monthly_documents: 100000
    max_monthly_notifications: 500000
    api_rate_limit: 1000/min

  features:
    sso_enabled: true
    api_access: true
    custom_branding: true
    advanced_analytics: false

  security:
    encryption_key_id: "key-abc-123"
    data_classification: "confidential"
    ip_whitelist: ["203.0.113.0/24"]
    mfa_required: true

3.3. Data Isolation Strategy

Database-Level Isolation

Option 1: Schema-per-Tenant (Current Implementation)

Each tenant has a dedicated PostgreSQL schema
Shared database instance for operational efficiency
Schema-level access control
Suitable for standard tier customers

Option 2: Database-per-Tenant (Enterprise Tier)

Dedicated PostgreSQL database for enterprise customers
Complete logical separation
Independent backup/restore capabilities
Higher isolation for regulatory requirements

Application-Level Isolation

Tenant Context Injection: Every request carries tenant ID
Query Filtering: All database queries filtered by tenant ID
Data Validation: Cross-tenant access attempts blocked at application layer
Audit Logging: All data access logged with tenant context

3.4. Resource Allocation

Pod Resource Limits (Per Tenant Workload)

resources:
  requests:
    memory: "2Gi"
    cpu: "1000m"
  limits:
    memory: "4Gi"
    cpu: "2000m"

Storage Quotas

Per-Tenant Storage Quota: Enforced at object storage level
Database Size Monitoring: Alerts when tenant exceeds 80% of quota
Automatic Scaling: Option to automatically increase quota (with billing)

4. Network Architecture and Security

4.1. Network Topology

┌─────────────────────────────────────────────────────────────┐
│                       INTERNET                              │
└────────────────────────┬────────────────────────────────────┘
                         │
                   ┌─────▼─────┐
                   │    DNS    │
                   │ (CloudFlare/Route53)
                   └─────┬─────┘
                         │
┌────────────────────────▼────────────────────────────────────┐
│                    DMZ ZONE                                 │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Fortinet Next-Gen Firewall (Active/Passive HA)    │   │
│  │  - Public IP: External interface                   │   │
│  │  - Private IP: Internal interface                  │   │
│  │  - Management IP: Admin interface                  │   │
│  └─────────────────────┬───────────────────────────────┘   │
└────────────────────────┼─────────────────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────────────────┐
│                  APPLICATION ZONE                            │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Kubernetes Cluster Network (Calico CNI)            │   │
│  │                                                      │   │
│  │  Pod Network: 10.244.0.0/16                         │   │
│  │  Service Network: 10.96.0.0/12                      │   │
│  │  Node Network: 192.168.1.0/24                       │   │
│  │                                                      │   │
│  │  ┌────────────────────────────────────────────────┐ │   │
│  │  │  Network Policies (Microsegmentation)         │ │   │
│  │  │  - Pod-to-Pod rules                           │ │   │
│  │  │  - Namespace isolation                        │ │   │
│  │  │  - Egress filtering                           │ │   │
│  │  └────────────────────────────────────────────────┘ │   │
│  └──────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────────────────┐
│                    DATA ZONE                                 │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────────┐   │
│  │ PostgreSQL   │  │  Object      │  │  Backup         │   │
│  │ Private IP   │  │  Storage     │  │  Storage        │   │
│  │ 192.168.2.x  │  │  Private Net │  │  Air-gapped     │   │
│  └──────────────┘  └──────────────┘  └─────────────────┘   │
└──────────────────────────────────────────────────────────────┘

4.2. Security Layers

Layer 1: Perimeter Security

DDoS Protection: CloudFlare or equivalent CDN with DDoS mitigation
Web Application Firewall (WAF): OWASP Top 10 protection
Rate Limiting: API and HTTP rate limiting at edge
GeoIP Filtering: Optional geographic access restrictions

Layer 2: Network Security

Next-Gen Firewall: Fortinet FortiGate (or equivalent)
- Intrusion Detection System (IDS)
- Intrusion Prevention System (IPS)
- SSL/TLS Inspection
- Application-layer filtering
- Threat intelligence feeds
VPN Access: Secure administrative access
- IPsec VPN for site-to-site connectivity
- SSL VPN for remote administration
- Multi-factor authentication required

Layer 3: Kubernetes Network Security

Calico Network Policies: Pod-level microsegmentation

# Example: Restrict database access
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: postgres-access-policy
spec:
  podSelector:
    matchLabels:
      app: postgresql
  ingress:
  - from:
    - podSelector:
        matchLabels:
          tier: application
    ports:
    - protocol: TCP
      port: 5432

Service Mesh (Optional): Istio/Linkerd for mTLS between services
Pod Security Policies: Restrict privileged containers, host networking
Secrets Management: Kubernetes Secrets or HashiCorp Vault

Layer 4: Application Security

Authentication: OpenID Connect (OIDC) / SAML 2.0
Authorization: Role-Based Access Control (RBAC)
API Security:
- OAuth 2.0 for API access
- API keys with rotation policy
- JWT tokens with short expiration
Input Validation: OWASP validation at API gateway
CORS Policies: Strict cross-origin resource sharing

Layer 5: Data Security

Encryption at Rest:
- AES-256 encryption for databases
- S3 server-side encryption (SSE)
- Tenant-specific encryption keys (optional)
Encryption in Transit:
- TLS 1.3 for all external connections
- mTLS for internal service communication
- Certificate rotation (Let's Encrypt or enterprise CA)
Data Loss Prevention (DLP):
- Sensitive data detection in documents
- PII/GDPR compliance scanning
- Automated data classification

4.3. Security Monitoring

┌─────────────────────────────────────────────────────────────┐
│               SECURITY MONITORING STACK                     │
│                                                             │
│  ┌────────────────┐  ┌────────────────┐  ┌──────────────┐ │
│  │  Firewall      │  │  Kubernetes    │  │ Application  │ │
│  │  Logs          │──│  Audit Logs    │──│ Logs         │ │
│  └────────┬───────┘  └────────┬───────┘  └──────┬───────┘ │
│           │                   │                  │         │
│           └───────────────────┴──────────────────┘         │
│                               │                            │
│                     ┌─────────▼─────────┐                  │
│                     │  Log Aggregation  │                  │
│                     │  (Logstash/Fluentd)│                 │
│                     └─────────┬─────────┘                  │
│                               │                            │
│                     ┌─────────▼─────────┐                  │
│                     │  SIEM Platform    │                  │
│                     │  (ELK/Splunk)     │                  │
│                     │                   │                  │
│                     │  - Correlation    │                  │
│                     │  - Alerting       │                  │
│                     │  - Dashboards     │                  │
│                     └─────────┬─────────┘                  │
│                               │                            │
│                     ┌─────────▼─────────┐                  │
│                     │  Incident Response│                  │
│                     │  (PagerDuty/Ops)  │                  │
│                     └───────────────────┘                  │
└─────────────────────────────────────────────────────────────┘

Security Events Monitored:

Failed authentication attempts
Privilege escalation attempts
Unusual API access patterns
Cross-tenant access attempts
Database query anomalies
Network traffic anomalies
Configuration changes
Certificate expiration warnings

5. Scalability and High Availability

5.1. Horizontal Scaling

Automatic Pod Scaling (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: opentext-comms-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: opentext-comms-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: document_generation_queue_depth
      target:
        type: AverageValue
        averageValue: "100"

Scaling Triggers:

CPU utilization > 70%
Memory utilization > 80%
Request queue depth > 100 jobs
Response time > 2 seconds (p95)
Custom metrics: Documents/minute, notifications/minute

Cluster Autoscaling

Kubernetes Cluster Autoscaler: Adds worker nodes when pods can't be scheduled
Node Pools: Different node types for different workloads
- Compute-optimized: Document generation
- Memory-optimized: Template caching
- General-purpose: Application services

5.2. High Availability Design

Service-Level HA

Component	HA Configuration	RPO	RTO
Control Plane	3 master nodes, etcd quorum	N/A	< 1 min
Application Pods	Min 3 replicas, anti-affinity	N/A	< 30 sec
PostgreSQL	Primary + 2 replicas, Patroni	< 1 min	< 30 sec
Object Storage	3x replication	0	Immediate
Load Balancers	Active/Active	N/A	< 5 sec
Firewall	Active/Passive HA	N/A	< 10 sec

Availability Zones

Multi-AZ Deployment: Worker nodes distributed across 3 availability zones

Pod Anti-Affinity: Replicas scheduled on different physical hosts

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app
          operator: In
          values:
          - opentext-comms
      topologyKey: "kubernetes.io/hostname"

Health Checks

# Liveness probe: Restart unhealthy containers
livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

# Readiness probe: Remove unhealthy pods from load balancer
readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 2

5.3. Load Balancing Strategy

External Load Balancing

Layer 4 (TCP/UDP): Firewall load balancing to Kubernetes ingress
Layer 7 (HTTP/HTTPS): Kubernetes Ingress Controllers
- Session affinity (sticky sessions) for stateful operations
- Weighted routing for blue/green deployments
- Geographic routing (future: multi-region)

Internal Load Balancing

Kubernetes Services: ClusterIP services for internal communication
Service Mesh: Istio for advanced traffic management (optional)
- Circuit breaking
- Retry policies
- Timeout configuration
- A/B testing

6. Disaster Recovery Architecture

6.1. DR Strategy

DR Objectives:

RTO (Recovery Time Objective): 4 hours
RPO (Recovery Point Objective): 6 hours (backup frequency)
SLA Impact: DR events excluded from availability SLA calculation

6.2. Backup Architecture

┌─────────────────────────────────────────────────────────────┐
│                  PRIMARY DATA CENTER                        │
│                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────────┐  │
│  │  PostgreSQL  │  │    Object    │  │  Kubernetes     │  │
│  │  Continuous  │  │    Storage   │  │  Config         │  │
│  │  Archiving   │  │  Replication │  │  Backups        │  │
│  └──────┬───────┘  └──────┬───────┘  └────────┬────────┘  │
│         │                 │                    │           │
│         └─────────────────┴────────────────────┘           │
│                           │                                │
│                  Every 6 hours                             │
│                           │                                │
└───────────────────────────┼────────────────────────────────┘
                            │
                  ┌─────────▼─────────┐
                  │  Secure Transfer  │
                  │  (TLS/VPN)        │
                  └─────────┬─────────┘
                            │
┌───────────────────────────▼────────────────────────────────┐
│              SECONDARY DATA CENTER (DR SITE)               │
│                                                            │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────────┐ │
│  │  PostgreSQL  │  │    Object    │  │  Kubernetes     │ │
│  │  Standby     │  │    Storage   │  │  Standby        │ │
│  │  (Read-only) │  │  Replica     │  │  Cluster        │ │
│  └──────────────┘  └──────────────┘  └─────────────────┘ │
│                                                            │
│  - 7-day backup retention                                 │
│  - Point-in-time recovery capability                      │
│  - Quarterly DR testing                                   │
└────────────────────────────────────────────────────────────┘

6.3. Backup Components

Database Backups

Continuous WAL Archiving: PostgreSQL Write-Ahead Logs streamed to DR site
Daily Full Backups: Automated via pg_basebackup
Incremental Backups: Every 6 hours using WAL archiving
Point-in-Time Recovery: Restore to any point within 7-day window
Backup Encryption: AES-256 encryption of backup files

Object Storage Backups

Cross-Region Replication: Async replication to DR site (near real-time)
Versioning: Last 30 versions of each object retained
Lifecycle Policies:
- Active: 30 days
- Archive: 90 days
- Compliance: 7 years (if required)

Configuration Backups

Kubernetes Manifests: Git repository with all configurations
Secrets: Encrypted backup of secrets (separate from configs)
Infrastructure as Code: Terraform/Ansible scripts for cluster rebuild

6.4. DR Procedures

Failover Scenarios

Scenario 1: Single Component Failure

Detection: Automatic via health checks
Action: Kubernetes automatically restarts failed pods
Impact: No customer impact (< 30 seconds)
Escalation: None (automatic recovery)

Scenario 2: Database Failure

Detection: Patroni detects primary failure
Action: Automatic promotion of replica to primary
Impact: 30-60 seconds of database unavailability
Escalation: Operations team notified

Scenario 3: Availability Zone Failure

Detection: Multiple pod/node failures
Action: Pods rescheduled to healthy zones
Impact: 2-5 minutes (pod startup time)
Escalation: Incident declared, management notified

Scenario 4: Complete Data Center Failure

Detection: All health checks fail, ops team declares disaster
Action: Manual DR failover procedure
Steps:
1. Activate DR site (T+0)
2. Promote standby database to primary (T+15 min)
3. Update DNS to point to DR site (T+30 min)
4. Verify all services operational (T+60 min)
5. Customer notification (T+90 min)
Impact: Up to 4 hours RTO
Escalation: Executive team, all customers notified

6.5. DR Testing

Test Type	Frequency	Scope
Component Failover	Monthly	Single pod/database replica failover
Backup Restore	Monthly	Restore sample tenant from backup
Partial Failover	Quarterly	Failover non-critical services to DR
Full DR Exercise	Annually	Complete failover, customer notification simulation

7. Monitoring and Observability Stack

7.1. Monitoring Architecture

┌─────────────────────────────────────────────────────────────┐
│                    DATA COLLECTION LAYER                    │
│                                                             │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐           │
│  │ Prometheus │  │  Node      │  │  cAdvisor  │           │
│  │ Exporters  │  │  Exporter  │  │ (Container)│           │
│  └─────┬──────┘  └─────┬──────┘  └─────┬──────┘           │
│        │               │               │                   │
│        └───────────────┴───────────────┘                   │
│                        │                                   │
│              ┌─────────▼─────────┐                         │
│              │   Prometheus      │                         │
│              │   (HA Pair)       │                         │
│              │   - Time-series   │                         │
│              │   - Alerting      │                         │
│              │   - 30-day retention                        │
│              └─────────┬─────────┘                         │
└────────────────────────┼─────────────────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────────────────┐
│                  VISUALIZATION LAYER                         │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Grafana Dashboards                     │    │
│  │                                                     │    │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────┐ │    │
│  │  │ Infrastructure│  │ Application │  │  Tenant  │ │    │
│  │  │   Dashboard  │  │  Dashboard  │  │Dashboard │ │    │
│  │  └──────────────┘  └──────────────┘  └──────────┘ │    │
│  └─────────────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────────────────┐
│                    LOGGING LAYER                             │
│                                                              │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐            │
│  │  Fluentd/  │  │ Logstash   │  │Elasticsearch│            │
│  │  Fluent Bit│──│ (Parse)    │──│   Cluster   │            │
│  │ (Collect)  │  │            │  │  (7-day)    │            │
│  └────────────┘  └────────────┘  └──────┬──────┘            │
│                                          │                   │
│                                  ┌───────▼──────┐            │
│                                  │    Kibana    │            │
│                                  │  (Visualize) │            │
│                                  └──────────────┘            │
└──────────────────────────────────────────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────────────────┐
│                   TRACING LAYER                              │
│                                                              │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐            │
│  │  Jaeger    │  │  Zipkin    │  │  OpenTelemetry          │
│  │  Collector │──│  (Storage) │  │  (Optional) │            │
│  └────────────┘  └────────────┘  └────────────┘            │
└──────────────────────────────────────────────────────────────┘
                         │
┌────────────────────────▼─────────────────────────────────────┐
│                 ALERTING & INCIDENT LAYER                    │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │           Alert Manager (Prometheus)                │    │
│  │  - Alert aggregation and deduplication             │    │
│  │  - Routing rules (team, severity, time)            │    │
│  │  - Silencing and inhibition                        │    │
│  └────────────────────┬────────────────────────────────┘    │
│                       │                                     │
│            ┌──────────┴──────────┐                          │
│            │                     │                          │
│   ┌────────▼────────┐   ┌────────▼────────┐                │
│   │   PagerDuty     │   │   Email/Slack   │                │
│   │  (On-call)      │   │  (Notifications)│                │
│   └─────────────────┘   └─────────────────┘                │
└──────────────────────────────────────────────────────────────┘

7.2. Key Metrics

Infrastructure Metrics

Metric	Threshold	Alert
Node CPU utilization	> 80% for 5 min	Warning
Node memory utilization	> 85% for 5 min	Warning
Disk utilization	> 80%	Warning, > 90% Critical
Network errors	> 0.1% packet loss	Warning
Pod restart count	> 3 in 10 min	Critical

Application Metrics

Metric	Threshold	Alert
API response time (p95)	> 2 seconds	Warning
API response time (p99)	> 5 seconds	Critical
Error rate	> 1%	Warning, > 5% Critical
Queue depth	> 1000 jobs	Warning
Authentication failures	> 10/min	Warning

Service Component Metrics (see component addendums):

Communications: Document generation success rate, template load time
Notifications: Delivery rate, bounce rate, queue depth

Business Metrics

Metric	Purpose
Documents generated/hour	Capacity planning
Notifications sent/hour	Capacity planning
Active users per tenant	Usage tracking
API calls per tenant	Billing verification
Storage consumed per tenant	Quota management

7.3. Dashboard Structure

Operations Dashboard

Cluster Health: Node status, pod status, resource utilization
Service Health: Service availability, response times, error rates
Capacity: CPU/memory/storage trends, forecasting
Alerts: Active alerts, alert history

Tenant Dashboard (Per Customer)

Usage Metrics: Documents generated, notifications sent
Performance: Response times, success rates
Quota Status: Storage used, API calls, user licenses
SLA Status: Uptime percentage, incident history

Security Dashboard

Authentication Events: Login attempts, failures, MFA usage
API Security: Rate limiting triggers, blocked requests
Network Security: Firewall blocks, IDS/IPS events
Compliance: Audit log entries, policy violations

7.4. Log Management

Log Sources

Application Logs: Structured JSON logs from all services
Access Logs: HTTP access logs (ingress, API gateway)
Audit Logs: Security events, configuration changes
System Logs: OS, Kubernetes, database logs

Log Retention

Log Type	Retention	Storage
Application logs	7 days (hot)	Elasticsearch
Application logs	90 days (warm)	S3/archive
Audit logs	7 years	S3/compliance tier
Access logs	30 days	Elasticsearch

Log Analysis Use Cases

Debugging: Trace requests across microservices
Security: Detect suspicious patterns, intrusion attempts
Compliance: Audit trail for data access
Performance: Identify slow queries, bottlenecks

7.5. Distributed Tracing

OpenTelemetry Implementation:

Trace Context Propagation: Trace ID passed through all services
Span Collection: Each service records timing and metadata
Sampling: 100% of errors, 10% of successful requests
Retention: 7 days of trace data

Trace Analysis:

Identify slow services in request chain
Detect cascading failures
Optimize inter-service communication
Troubleshoot timeout issues

8. Performance Optimization

8.1. Caching Strategy

Application-Level Caching

┌─────────────────────────────────────────────────────────────┐
│                      CACHING LAYERS                         │
│                                                             │
│  Layer 1: CDN Cache (CloudFlare)                           │
│  ├── Static assets (images, CSS, JS)                       │
│  ├── TTL: 1 hour to 1 day                                  │
│  └── Purge on deployment                                   │
│                                                             │
│  Layer 2: Application Cache (Redis)                        │
│  ├── Session data (TTL: 24 hours)                          │
│  ├── User profiles (TTL: 1 hour)                           │
│  ├── Tenant configuration (TTL: 5 minutes)                 │
│  └── API responses (TTL: varies by endpoint)               │
│                                                             │
│  Layer 3: Database Query Cache                             │
│  ├── PostgreSQL shared_buffers (4 GB)                      │
│  ├── PgBouncer connection pooling                          │
│  └── Read replicas for read-heavy queries                  │
│                                                             │
│  Layer 4: Template Cache                                   │
│  ├── Compiled document templates                           │
│  ├── TTL: Until template version changes                   │
│  └── Pre-warming on deployment                             │
└─────────────────────────────────────────────────────────────┘

Redis Cluster Configuration

Topology: 3-node cluster with replication
Persistence: RDB snapshots every 15 minutes + AOF
Eviction Policy: LRU (Least Recently Used)
Max Memory: 16 GB per node

8.2. Database Optimization

Connection Pooling

Application Pods (50 pods × 10 connections) = 500 connections
                    │
                    ▼
        ┌───────────────────────┐
        │   PgBouncer Pool      │
        │   (Transaction Mode)  │
        │   Max: 100 connections│
        └───────────┬───────────┘
                    │
                    ▼
        ┌───────────────────────┐
        │  PostgreSQL Primary   │
        │  Max: 200 connections │
        └───────────────────────┘

Query Optimization

Indexes: Covering indexes on frequently queried columns
Partitioning: Time-based partitioning for large tables (job history, audit logs)
Materialized Views: Pre-aggregated data for dashboards
Query Plan Analysis: Regular EXPLAIN ANALYZE on slow queries

8.3. Content Delivery Optimization

Document Generation Pipeline

Request → Queue → Worker Pool → Template Cache → Generate → S3 Upload
   │         │          │              │             │           │
   │         │          │              │             │           └─ Async
   │         │          │              │             └─ Parallel processing
   │         │          │              └─ In-memory cache
   │         │          └─ Auto-scaling (3-20 workers)
   │         └─ RabbitMQ/Kafka (persistent)
   └─ Immediate response with job ID

Optimization Techniques:

Batch Processing: Group similar documents for efficiency
Template Pre-compilation: Cache compiled templates
Parallel Rendering: Multi-threaded document generation
Output Streaming: Stream large documents to storage

8.4. Network Optimization

HTTP/2: Multiplexing for reduced latency
Compression: Gzip/Brotli compression for API responses
Keep-Alive: Persistent connections to reduce overhead
DNS Caching: Aggressive DNS caching (5 min TTL)

9. Compliance and Audit

9.1. Compliance Requirements

Regulation	Scope	Implementation
GDPR	EU data protection	Swedish data residency, data processing agreements, right to erasure
ISO 27001	Information security	Full ISMS implementation, regular audits
PCI DSS	Payment data (if applicable)	Tokenization, network segmentation (future)
Swedish Data Protection	National regulations	Data residency, DPA compliance

9.2. Audit Logging

Audit Events

Authentication: Login, logout, failed attempts, MFA events
Authorization: Permission changes, role assignments
Data Access: Document views, downloads, exports
Configuration Changes: Tenant settings, user management
Administrative Actions: Database access, system configuration

Audit Log Format

{
  "timestamp": "2025-01-15T10:30:45.123Z",
  "event_type": "document.view",
  "actor": {
    "user_id": "user-123",
    "email": "john.doe@example.com",
    "ip_address": "203.0.113.45",
    "user_agent": "Mozilla/5.0..."
  },
  "tenant_id": "tenant-abc-123",
  "resource": {
    "type": "document",
    "id": "doc-456",
    "path": "/templates/invoice.docx"
  },
  "action": "view",
  "result": "success",
  "metadata": {
    "session_id": "sess-789",
    "request_id": "req-012"
  }
}

9.3. Data Residency

Commitment: All customer data stored within Sweden

Primary DC: Sweden (Entiros AB)
DR DC: Sweden (separate facility)
No Cross-Border Transfer: Data never leaves Swedish jurisdiction
Subprocessor Control: All subprocessors bound by DPA

10. Deployment and Release Management

10.1. CI/CD Pipeline

┌─────────────────────────────────────────────────────────────┐
│                    CI/CD PIPELINE                           │
│                                                             │
│  1. Code Commit (Git)                                       │
│     └─▶ 2. CI Build (GitHub Actions/GitLab CI)            │
│            ├─ Unit Tests                                   │
│            ├─ Integration Tests                            │
│            ├─ Security Scanning (SAST)                     │
│            ├─ Dependency Check                             │
│            └─ Build Docker Image                           │
│                 └─▶ 3. Push to Container Registry         │
│                        └─▶ 4. Deploy to Test Cluster      │
│                               ├─ Automated E2E Tests       │
│                               └─ Performance Tests         │
│                                   └─▶ 5. Manual Approval   │
│                                          └─▶ 6. Deploy to Production
│                                                 ├─ Canary (10%)
│                                                 ├─ Monitor (15 min)
│                                                 └─ Full Rollout
└─────────────────────────────────────────────────────────────┘

10.2. Deployment Strategies

Blue/Green Deployment

Two Environments: Blue (current), Green (new)
Traffic Switch: Instant cutover via DNS/load balancer
Rollback: Switch back to Blue if issues detected
Use Case: Major version upgrades

Canary Deployment

Gradual Rollout: 10% → 25% → 50% → 100%
Monitoring: Watch error rates, performance during rollout
Automated Rollback: If metrics exceed thresholds
Use Case: Standard releases

Rolling Update

Default Strategy: Kubernetes rolling update
Max Unavailable: 25% of pods
Max Surge: 25% additional pods
Use Case: Minor updates, patches

10.3. Release Cadence

Release Type	Frequency	Scope	Customer Impact
Hotfix	As needed	Critical bug fix	Immediate, minimal downtime
Patch	Monthly	Bug fixes, minor features	Scheduled maintenance window
Minor	Quarterly	New features, improvements	Scheduled, tested in advance
Major	Annually	Breaking changes, major features	Advanced notice, migration support

11. Future Architecture Enhancements

11.1. Planned Improvements (6-12 months)

Enhancement	Benefit	Timeline
Service Mesh (Istio)	mTLS, advanced traffic management	Q2 2025
Multi-Region Deployment	Lower latency, geographic redundancy	Q3 2025
AI/ML Integration	Document intelligence, predictive analytics	Q4 2025
GraphQL API	Flexible querying, reduced overfetching	Q3 2025
Event-Driven Architecture	Improved scalability, decoupling	Q2 2025

11.2. Capacity Planning

Current Platform Capacity (Launch):

Tenants: Up to 50 active tenants
Users: Up to 5,000 concurrent users
API Requests: 10,000 requests/minute

12-Month Projection:

Tenants: 100-150 active tenants
Users: 10,000-15,000 concurrent users
API Requests: 50,000 requests/minute

Scaling Path:

Compute: Add 3-5 worker nodes per quarter
Database: Upgrade to larger instance, add read replicas
Storage: Linear scaling with object storage (no limits)
Network: Upgrade bandwidth as needed (10 Gbps → 40 Gbps)

Service Component Capacity (see component addendums):

Communications: Document generation throughput
Notifications: Delivery throughput by channel

12. Service Component Architecture Addendums

This platform architecture document is supplemented by component-specific technical architecture addendums:

Document ID	Title	Description
SW-SAAS-ARCH-COMP-001	Communications Technical Architecture	OpenText Exstream document generation architecture
SW-SAAS-ARCH-COMP-002	Notifications Technical Architecture	Multi-channel notification delivery architecture

Each addendum provides:

Component-specific pod configurations and resource requirements
Component-specific APIs and integration patterns
Component-specific monitoring, metrics, and alerting
Component-specific performance tuning and optimization
Component-specific backup and recovery procedures

13. Appendices

13.1. Technology Version Matrix

Component	Version	EOL Date
Kubernetes	1.28.x	Oct 2024 (upgrade to 1.29 planned)
Docker	24.0.x	Ongoing support
PostgreSQL	15.x	Nov 2027
Redis	7.2.x	Ongoing support
Prometheus	2.48.x	Ongoing support
Grafana	10.2.x	Ongoing support
OpenText Comms	[Version TBD]	Per OpenText support policy

13.2. Network Ports and Protocols

Port	Protocol	Purpose	Access
443	HTTPS	Web application, API	Public
80	HTTP	Redirect to HTTPS	Public
22	SSH	Server administration	VPN only
5432	PostgreSQL	Database	Internal only
6379	Redis	Cache	Internal only
9090	Prometheus	Metrics	Internal only
3000	Grafana	Monitoring dashboards	VPN only
5601	Kibana	Log visualization	VPN only

13.3. DNS Configuration

Record Type	Name	Value	TTL
A	app.swedwise.com	[Load Balancer IP]	300
CNAME	api.swedwise.com	app.swedwise.com	300
CNAME	www.swedwise.com	app.swedwise.com	300
MX	swedwise.com	[Mail server]	3600
TXT	_dmarc.swedwise.com	[DMARC policy]	3600
TXT	swedwise.com	[SPF record]	3600

13.4. SSL/TLS Configuration

Certificate Authority: Let's Encrypt (automated renewal)
Certificate Type: Wildcard (*.swedwise.com)
TLS Version: TLS 1.2+ (TLS 1.3 preferred)
Cipher Suites: Modern, secure ciphers only (no RC4, 3DES)
HSTS: Enabled with 1-year max-age
OCSP Stapling: Enabled

13.5. Contact Information

Role	Responsibility	Contact
Technical Lead	Architecture decisions	tech-lead@swedwise.com
Operations Manager	24/7 operations	ops@swedwise.com
Security Officer	Security incidents	security@swedwise.com
Data Center Partner	Infrastructure	Entiros AB - support@entiros.se

Document Control

Version	Date	Author	Changes
1.0	2025-01-15	Technical Lead	Initial platform architecture document
1.1	2025-01-15	Technical Lead	Refactored to platform-level; Communications and Notifications moved to addendums

Classification: Confidential
Distribution: Internal use and customer NDAs only
Review Date: 2026-01-15

Related Documents:

SW-SAAS-ARCH-COMP-001: Communications Technical Architecture Addendum
SW-SAAS-ARCH-COMP-002: Notifications Technical Architecture Addendum

This document contains confidential technical information about the Swedwise SaaS Platform architecture. Unauthorized distribution or disclosure is prohibited.