AI Story Companion Ecosystem — Functional requirements, technology stack, infrastructure architecture, and AI API cost estimates.
| ID | Requirement | Priority | Component |
|---|---|---|---|
| D-01 | System must detect wake word with <500ms latency in ambient noise up to 60dB | MUS | Firmware |
| D-02 | ASR must transcribe child speech with >90% accuracy (ages 4–10) | MUS | FW + Cloud |
| D-03 | Device must operate in offline mode for basic storytelling (pre-cached stories) | MUS | FW |
| D-04 | OTA firmware update must complete without user interaction; rollback on failure | MUS | FW + DevOps |
| D-05 | Device must boot to ready state in <15 seconds | SHO | FW |
| D-06 | Audio output must support stereo at ≥48kHz / 16-bit | MUS | FW |
| D-07 | Camera must capture 1080p at 30fps for personalisation features | SHO | FW |
| D-08 | BLE pairing must complete within 30 seconds with mobile app | MUS | FW + Mobile |
| D-09 | Battery must support minimum 4 hours continuous playback | MUS | HW + FW |
| D-10 | LED ring must animate in sync with story events (<100ms latency) | SHO | FW |
| D-11 | All microphone audio must be processed and discarded; never stored | MUS | FW + Cloud |
| D-12 | Device must support projection sync with DreamDome over Wi-Fi (<200ms) | SHO | FW |
| ID | Requirement | Priority | Component |
|---|---|---|---|
| C-01 | Story generation API must return first audio chunk in <800ms (p95) | MUS | Story Orchestrator |
| C-02 | System must enforce child-safe content at every LLM output | MUS | Safety Service |
| C-03 | All child data must be stored encrypted at rest (AES-256) | MUS | All services |
| C-04 | System must support horizontal auto-scaling to 100K concurrent devices | MUS | Infra / K8s |
| C-05 | Parent can delete all child data; deletion propagates within 30 days | MUS | All services |
| C-06 | Story memory must persist across sessions and be retrievable by child ID | MUS | Memory Service |
| C-07 | Music engine must produce contextually appropriate audio within 2s | MUS | Music Service |
| C-08 | Illustration engine must generate scene image within 5 seconds | SHO | Illustration Svc |
| C-09 | System must support multi-language TTS (min: EN, FR, DE, ES at launch) | SHO | TTS Service |
| C-10 | API must return structured error codes; no raw exceptions to client | MUS | API Gateway |
| C-11 | All API endpoints must require authentication; no unauthenticated access | MUS | Auth Service |
| C-12 | System must log all safety filter triggers for audit and review | MUS | Safety Service |
| C-13 | Sleep motion data must be anonymised before analytics processing | MUS | Analytics Svc |
| ID | Requirement | Priority | Component |
|---|---|---|---|
| M-01 | App must support iOS 16+ and Android 12+ | MUS | Mobile App |
| M-02 | Parent must be able to create and manage up to 5 child profiles | MUS | Mobile App |
| M-03 | App must provide content filtering controls (themes, age level, topics) | MUS | Mobile App |
| M-04 | App must display sleep summary with story and motion timeline | SHO | Mobile App |
| M-05 | App must support bedtime schedule configuration with automatic enforcement | MUS | Mobile App |
| M-06 | App must allow purchase and management of subscriptions | MUS | Mobile App |
| M-07 | App must work offline for settings management (sync when reconnected) | SHO | Mobile App |
| M-08 | Push notifications for sleep summary delivery and usage alerts | COUld | Mobile App |
| M-09 | App must display story history and allow playback of saved stories | COUld | Mobile App |
| Layer | Technology | Justification |
|---|---|---|
| Embedded OS | Linux (Yocto/Buildroot 2024) | Minimal footprint, full control, wide hardware support |
| Embedded Language | Python 3.11 + asyncio | Rapid prototyping, async I/O for audio/network |
| Wake Word | Porcupine SDK (on-device) | Privacy-first, no cloud dependency, <5mW |
| Local STT | Whisper.cpp (tiny/base) | Offline fallback, acceptable accuracy for simple commands |
| Cloud STT | Google Cloud Speech / AWS Transcribe | High accuracy, multi-language, child voice models |
| LLM | Anthropic Claude 3 Sonnet (primary), GPT-4o (fallback) | Safety features, quality, cost balance |
| Orchestration | LangChain + LangGraph | Narrative state machine, tool use, memory integration |
| Vector DB | pgvector (PostgreSQL) / Pinecone | Story memory, semantic search, low-latency retrieval |
| TTS | ElevenLabs (primary), Azure Cognitive Services (fallback) | Natural child-friendly voices, low latency |
| Music | MusicGen (HuggingFace) + S3 curated library | Dynamic generation + reliable fallback |
| Image Generation | DALL-E 3 / Stable Diffusion XL | Quality illustrations, child-safe safety filters |
| Backend Language | Python 3.11 + FastAPI | Async, fast, typed, excellent ecosystem |
| Backend Framework | FastAPI + Pydantic v2 | Auto OpenAPI docs, validation, performance |
| Message Queue | RabbitMQ / AWS SQS | Async task dispatch for generation services |
| Primary DB | PostgreSQL 16 | ACID, pgvector, mature, excellent managed options |
| Cache | Redis 7 | Session tokens, TTS cache, rate limiting |
| Time-Series DB | InfluxDB 2 / TimescaleDB | Sleep/motion sensor data |
| Analytics | ClickHouse | High-volume anonymised event analytics |
| Object Storage | AWS S3 / Cloudflare R2 | Illustrations, audio cache, firmware bins |
| CDN | Cloudflare | Global low-latency media delivery |
| Container Runtime | Docker + Kubernetes (EKS/GKE) | Scalable microservices, managed K8s |
| CI/CD | GitHub Actions + ArgoCD | GitOps, automated deploy, rollback |
| Monitoring | Prometheus + Grafana + Loki | Metrics, dashboards, log aggregation |
| Alerting | PagerDuty + Slack | On-call rotation, incident management |
| Mobile | React Native + Expo | Cross-platform iOS + Android, code sharing |
| Auth | Keycloak + JWT | OIDC/OAuth2, SSO, parental consent flows |
| IaC | Terraform + Helm | Reproducible infra, version-controlled deployments |
The platform is deployed on AWS (primary) with GCP as a failover/multi-cloud option. All services run containerised on Kubernetes. Environment separation: dev / staging / production.
| Service | AWS Service | Sizing (initial) | Scaling |
|---|---|---|---|
| Kubernetes Cluster | EKS (Kubernetes 1.30) | 3 × m6i.xlarge nodes | Auto-scale 3–20 nodes |
| Relational DB | RDS PostgreSQL 16 | db.t3.large (Multi-AZ) | Read replicas at 10K DAU |
| Cache | ElastiCache Redis 7 | cache.t3.medium (cluster) | Scale with session volume |
| Message Queue | Amazon SQS | Standard queues | Managed, auto-scales |
| Object Storage | S3 Standard + Intelligent Tiering | Unlimited | Managed |
| CDN | CloudFront + Cloudflare | Global PoPs | Managed |
| Container Registry | ECR | Private repos per service | Managed |
| Secrets | AWS Secrets Manager | Per-service secrets | Managed |
| DNS & Load Balancer | Route53 + ALB | Regional ALB | Managed, auto-scales |
| Monitoring | CloudWatch + managed Prometheus Workspace | — | Managed |
| Log Aggregation | CloudWatch Logs + Loki (Grafana Cloud) | Retained 30 days | Managed |
| CI/CD Runners | GitHub Actions (managed) | 8-core runners | Managed |
| Analytics DB | ClickHouse Cloud (startup tier) | 2 shards | Scale with data volume |
Assuming 20 minutes average daily usage, 30 days/month, ~1,200 LLM tokens per story exchange. Estimates at 10,000 MAD (Monthly Active Devices).
| Service | Volume / month (10K MAD) | Unit Cost | Monthly Total |
|---|---|---|---|
| LLM (Claude Sonnet / GPT-4o) | 360M tokens in + 90M out | $3/$15 per 1M | ~$4,700 |
| STT (Google Cloud Speech) | ~6,000 hours audio | $0.016/min | ~$5,760 |
| TTS (ElevenLabs Pro) | ~12,000 hours audio | $0.18/1K chars | ~$3,200 |
| DALL-E 3 (illustrations) | ~200K images | $0.04/image | ~$8,000 |
| AWS Infra (K8s + DB + S3) | Fixed + variable | — | ~$2,500 |
| CDN + Storage | Media delivery | $0.02/GB out | ~$800 |
| Total estimated | ~$25,000/mo | ||
| Per active device | ~$2.50/device/mo | ||