E

EdgeUp Colleges — AWS Solution Architecture

Cluster: college-qa-final · ap-south-1 (Mumbai) · source: Terraform/Terragrunt + live discovery · 12-Jun-2026
⌂ Back to cost matrix
● LIVE — load-tested to 2,000 concurrent EKS 1.35 · Karpenter + KEDA Idle fleet: 5 nodes — 2 on-demand + 3 spot Peak (2,000 conc): 20–22 nodes — auto, then shrinks back Aurora MySQL Serverless v2 · 0.5–24 ACU VPC 10.2.0.0/16 · 2 AZs · 1 NAT
Show services for (tap to add/remove):
👩‍🎓 Students web & mobile DNS Route 53 CDN / Edge CloudFront CDN GitHub CI — build & push Container Registry ECR ×7 HTTPS VPC 10.2.0.0/16 · ap-south-1 · 2 availability zones (1a / 1b) PUBLIC SUBNETS Web Firewall (WAF) AWS WAF Load Balancer / Ingress Shared ALB NAT / Egress NAT Gateway ×1 egress Kubernetes Cluster — EKS (college-qa-final) · v1.35 · Karpenter (spot-first) + KEDA IDLE FLEET — 5 NODES (2 ON-DEMAND + 3 SPOT) ① System node — t3.large · ON-DEMAND Karpenter · KEDA · ArgoCD · LB controller · external-dns external-secrets · metrics-server  ·  taint: CriticalAddonsOnly ② Infra node — m6i.large · ON-DEMAND MySQL · Redis · Kafka · Qdrant · Neo4j (in-cluster, 150 GB gp3) stateful — never on spot  ·  taint: dedicated=infra ③ web node — SPOT backend (KEDA 2→20) frontend (2→10) ④ web node — SPOT ai-socket · backend replicas m5 / m6i / m6a / c6i xlarge ⑤ ai node — SPOT ai-chatbot · ai-cron ai-service · ai-timetable ⚡ AUTOSCALE — Karpenter adds spot nodes on demand: 5 → 20–22 at 2,000 concurrent (load-tested), consolidates back after peak KEDA triggers: ALB requests/sec · CPU · Kafka lag — per-service pod scaling, 2s polling Managed Database — Aurora MySQL Serverless v2 (college-qa-final-db) 0.5 ACU idle → auto-scales to 24 ACU max · single writer · storage replicated across 3 AZs credentials in Secrets Manager (7-day auto-rotation) · slow-query log · 1-day backups (QA) Object Storage (S3) college-qa-final-assets · versioned · AES-256 served via CloudFront Secrets Store (Secrets Manager) DB creds (auto-rotated) · AI service keys synced into pods by external-secrets Monitoring — central Prometheus remote-write + promtail logs → central Grafana/Loki (no local stack cost) GitOps — ArgoCD watches Git → syncs 5 apps (api, ai, web, infra, system) · zero manual deploys push image On-demand — critical, never interrupted (2) Spot — ~60-70% cheaper, burst capacity (3 → 20) Managed data layer (Aurora) AWS managed services

Design decisions a reviewer will ask about

Source of truth: Terraform/Terragrunt repo (01-foundation → 02-compute → 03-data → 04-platform) + live AWS discovery, 12-Jun-2026. AWS reference design — other providers in the cost matrix map to equivalent services.