Director of Engineering · Platform Architect
Building at the frontier of Scale & Intelligence
15+ years turning complex distributed challenges into elegant, scalable systems. Expert in platforms that sustain 20M+ concurrent connections, process 100M+ daily events, and intelligently self-monitor — from deep IoT infrastructure to production AI agent ecosystems.
I am an engineering leader and systems architect with 15+ years building platforms at the edge of what's technically possible — bridging architectural depth with the leadership discipline to deliver reliably at scale.
As Director of Engineering at CultureMonkey, I drive technical architecture for a global SaaS serving 10M+ enterprise employees — leading teams, architecting event-driven microservices, and building AI and observability infrastructure for platform reliability.
Previously at Sentienz Solutions as Chief Architect, I engineered IoT platforms load-tested to 20M+ concurrent connections, real-time analytics pipelines at 100M+ daily events, and distributed campaign engines with bidirectional notification loops.
I actively build and deploy AI agents — 20+ in production — and share engineering knowledge on YouTube via Sentienz Solutions.
“The best platforms don't just process data — they create clarity from chaos, connecting millions of signals into outcomes that matter.”
— Siva Samraj S, Director of EngineeringKafka, Cassandra, Ignite, Elasticsearch, Redis, Aerospike — tuned to 20M+ concurrent connections with system-level TCP and JVM optimization.
Datadog, Prometheus, OpenTelemetry. Mobile alerting for on-call triage. SLO-driven reliability culture across all platforms.
Google Maps & Directions API for geofencing. Bidirectional notification systems for real-time device–server loops at scale.
20+ production AI agents across marketing and engineering. RAG pipelines, LLM chatbots, intelligent automation.
Oracle, MySQL, PostgreSQL for transactional workloads. Cassandra, Aerospike, Redis, Ignite for high-throughput, low-latency distributed data.
B.E. Computer Science, Madras Institute of Technology (MIT), Anna University, Chennai.
Hands-on mastery across distributed platforms, databases, real-time data, AI, and IoT — battle-tested at production scale.
Cluster design, consumer tuning, schema evolution. Spark Structured Streaming + StreamSets at 100M+ events/day.
Production Cassandra for high-throughput writes. Ignite in-memory + Aerospike for ultra-low latency workloads — kernel-level tuning for p99 performance.
Millions of event queries in real time. Redis for sub-millisecond caching, pub/sub, and session management. Optimised shard strategies and query caching.
Enterprise-grade relational database design, query optimisation, indexing strategies, stored procedures, and schema migrations at scale.
Multi-tenant IoT load-tested to 20M+ connections. System-level TCP, kernel, JVM tuning on AWS and Azure.
Datadog APM + Prometheus + OpenTelemetry. Mobile alerting — on-call engineers get severity-triaged alerts with acknowledgement and escalation.
20+ production AI agents — marketing and engineering. RAG pipelines, LLM chatbots, multi-turn context, notification integration.
Maps and Directions API for geofencing and route-aware targeting. Bidirectional notification system with delivery receipts at scale.
TCP limits, kernel buffer tuning, JVM GC, heap profiling. Benchmarked 20M+ concurrent connections with full p50/p95/p99 profiles on AWS and Azure.
End-to-end pipeline — raw device telemetry to queryable Elasticsearch where support teams diagnose playback failures in real time across millions of indexed events.
App events, playback errors, buffering metrics emitted in real time.
Partitioned by user ID for ordered, fault-tolerant, zero-loss delivery.
Real-time enrichment — session stitching, error classification.
Tuned mappings and shard strategy for support query patterns.
Millions of events queryable by user, device, error, or time — instant root cause.
Every system designed to sustain, self-recover, and scale without limits
Multi-tenant IoT across healthcare, telematics & energy. Load-benchmarked to 20M+ concurrent connections. System-level TCP/kernel tuning on AWS & Azure. Kafka, Cassandra, Ignite, Aerospike, bidirectional notifications.
Full-stack pipeline: device → Kafka → Spark → Elasticsearch. Support teams query millions of events by user, device, session, or error — diagnosing playback failures instantly.
20+ production AI agents — content generation, lead qualification, code review, incident triage. Enterprise chatbots with RAG, multi-turn context, and bidirectional notification integration.
100+ concurrent location-based campaigns. Google Maps for route-aware targeting. Bidirectional delivery receipts. Ignite in-memory over 20+ node Hadoop. Datadog + mobile alerting.
3-layer observability: Datadog APM, Prometheus SLO tracking, OpenTelemetry tracing. Bespoke mobile alerting with severity triage, escalation chains, and runbook links. Load frameworks benchmarked to 20M+ concurrent connections — full p50/p95/p99 profiles on AWS and Azure.
Open to high-impact engineering conversations, platform challenges, and collaboration at the frontier of distributed systems and AI infrastructure.
“The best platforms don't just process data — they create clarity from chaos, connecting millions of signals into outcomes that matter.”
Madras Institute of Technology (MIT Campus, Chrompet)
Anna University, Chennai