0M+ Concurrent
Connections Benchmarked
0M+ Events
Processed Daily
0+ AI Agents
Built & Deployed
0% Latency
Reduction Achieved
0Years of
Engineering Mastery
About & Vision

Engineering Systems
That Define Scale

I am an engineering leader and systems architect with 15+ years building platforms at the edge of what's technically possible — bridging architectural depth with the leadership discipline to deliver reliably at scale.

As Director of Engineering at CultureMonkey, I drive technical architecture for a global SaaS serving 10M+ enterprise employees — leading teams, architecting event-driven microservices, and building AI and observability infrastructure for platform reliability.

Previously at Sentienz Solutions as Chief Architect, I engineered IoT platforms load-tested to 20M+ concurrent connections, real-time analytics pipelines at 100M+ daily events, and distributed campaign engines with bidirectional notification loops.

I actively build and deploy AI agents — 20+ in production — and share engineering knowledge on YouTube via Sentienz Solutions.

“The best platforms don't just process data — they create clarity from chaos, connecting millions of signals into outcomes that matter.”

— Siva Samraj S, Director of Engineering

Distributed Systems at Scale

Kafka, Cassandra, Ignite, Elasticsearch, Redis, Aerospike — tuned to 20M+ concurrent connections with system-level TCP and JVM optimization.

👁

Observability & Reliability

Datadog, Prometheus, OpenTelemetry. Mobile alerting for on-call triage. SLO-driven reliability culture across all platforms.

🌍

Location & Notifications

Google Maps & Directions API for geofencing. Bidirectional notification systems for real-time device–server loops at scale.

🤖

AI Agents & Automation

20+ production AI agents across marketing and engineering. RAG pipelines, LLM chatbots, intelligent automation.

🗄

Databases — Relational & NoSQL

Oracle, MySQL, PostgreSQL for transactional workloads. Cassandra, Aerospike, Redis, Ignite for high-throughput, low-latency distributed data.

🎓

Education — MIT Alumni

B.E. Computer Science, Madras Institute of Technology (MIT), Anna University, Chennai.

Technical Depth

What Sets Me Apart

Hands-on mastery across distributed platforms, databases, real-time data, AI, and IoT — battle-tested at production scale.

100M+/day Stream Processing

Apache Kafka & Spark

Cluster design, consumer tuning, schema evolution. Spark Structured Streaming + StreamSets at 100M+ events/day.

KafkaSparkStreamSets100M+
in-memory Distributed Storage
📊

Cassandra, Ignite & Aerospike

Production Cassandra for high-throughput writes. Ignite in-memory + Aerospike for ultra-low latency workloads — kernel-level tuning for p99 performance.

CassandraApache IgniteAerospikeKernel Tuning
REDIS search cache Search & Cache
🔍

Elasticsearch & Redis

Millions of event queries in real time. Redis for sub-millisecond caching, pub/sub, and session management. Optimised shard strategies and query caching.

ElasticsearchRedisM+ QueriesSub-ms Cache
Oracle MySQL Postgres relational · transactional · enterprise Relational Databases
🗄

Oracle, MySQL & PostgreSQL

Enterprise-grade relational database design, query optimisation, indexing strategies, stored procedures, and schema migrations at scale.

OracleMySQLPostgreSQLQuery Tuning
20M+ nodes IoT Infrastructure
📡

IoT — 20M+ Concurrent Connections

Multi-tenant IoT load-tested to 20M+ connections. System-level TCP, kernel, JVM tuning on AWS and Azure.

20M+ BenchmarkedMQTTAWS & Azure
ALERT -60% Observability Stack
👁

Observability & Mobile Alerting

Datadog APM + Prometheus + OpenTelemetry. Mobile alerting — on-call engineers get severity-triaged alerts with acknowledgement and escalation.

DatadogPrometheusOTELMobile Alerts
20+ agents AI & Automation
🤖

AI Agents & Intelligent Systems

20+ production AI agents — marketing and engineering. RAG pipelines, LLM chatbots, multi-turn context, notification integration.

20+ AgentsLLMsRAGChatbots
BIDIRECTIONAL Location & Geo
🌍

Google Maps & Bidirectional Notifications

Maps and Directions API for geofencing and route-aware targeting. Bidirectional notification system with delivery receipts at scale.

Google MapsGeofencingBidirectional
p99 20M+ Performance Engineering
🔧

System Tuning & Load Engineering

TCP limits, kernel buffer tuning, JVM GC, heap profiling. Benchmarked 20M+ concurrent connections with full p50/p95/p99 profiles on AWS and Azure.

20M+ TestedJVM Tuningp99 Profiling
📺 Featured Architecture

OTTPlay — Full-Stack Analytics Pipeline

End-to-end pipeline — raw device telemetry to queryable Elasticsearch where support teams diagnose playback failures in real time across millions of indexed events.

1

User Device Telemetry

App events, playback errors, buffering metrics emitted in real time.

2

Kafka Ingestion

Partitioned by user ID for ordered, fault-tolerant, zero-loss delivery.

3

Spark Stream Processing

Real-time enrichment — session stitching, error classification.

4

Elasticsearch Indexing

Tuned mappings and shard strategy for support query patterns.

5

Support Query Layer

Millions of events queryable by user, device, error, or time — instant root cause.

Engineered for Unprecedented Scale

Every system designed to sustain, self-recover, and scale without limits

0M+ Concurrent Connections
0M+ Daily Events
0% Latency Reduction
0% Faster Releases
Career Journey

15 Years of
Defining Scale

Mar 2025 — Present
CultureMonkey
Chennai, India
Director of Engineering
KafkaClickHouseRedisMicroservicesElasticsearchDatadogAI AgentsPostgreSQLOTEL
  • Driving complete engineering architecture for a global SaaS serving 10M+ enterprise employees — aligning technical OKRs to revenue, product, and reliability goals.
  • Transformed monolithic architecture into event-driven microservices — 40% velocity increase, independent team ownership.
  • Real-time feedback pipelines (Kafka, Redis, ClickHouse) delivering 10x analytics performance; Elasticsearch at 100K+ batch ingestion scale.
  • AI-driven sentiment clustering across 200K+ feedbacks; omnichannel campaigns for Slack, Teams, WhatsApp, and Email.
  • Full-stack observability (Datadog APM, Prometheus, OTEL) and mobile alerting for on-call incident response.
Oct 2017 — Feb 2025
Sentienz Solutions
Bangalore, India
Chief Architect & Sr. Engineering Manager
KafkaSparkCassandraApache IgniteAerospikeRedisElasticsearchStreamSetsIoTAWS & Azure
  • Architected Akiro IoT platform — load-benchmarked at 20M+ concurrent connections; system-level TCP, JVM, kernel tuning on AWS and Azure.
  • Built Jarvis Central Data Platform — Spark and StreamSets pipelines feeding Elasticsearch, BigQuery, and S3 with zero data loss.
  • Designed OTTPlay analytics pipeline — device through Kafka and Spark to Elasticsearch, real-time support queries across millions of events.
  • Built RTRS with Google Maps, bidirectional notifications, and Apache Ignite — 100+ concurrent real-time campaigns.
  • Datadog APM, Prometheus, OpenTelemetry; mobile alerting system with severity triage and escalation chains.
2010 — 2017
Sony
Bangalore / US Onsite
Software Engineer → Technical Lead
HadoopKafkaOracleMySQLBI Modernization
  • Led Oracle to Hadoop Data Lake migration — reducing ETL processing time by 60%. Kafka streaming pipelines for BI modernization as US onsite technical liaison.
  • Progressed from Software Engineer to Technical Lead over 7 years through consistent delivery and technical excellence.
Technical Skills

The Full Toolkit

⚙️

Distributed Systems & Data

  • Kafka & Spark Stream Processing
  • Cassandra, Ignite & Aerospike
  • Elasticsearch — Millions of Queries
  • Redis — Sub-ms Cache & Pub/Sub
  • StreamSets Pipeline Orchestration
  • IoT — 20M+ Connections Benchmarked
🗄

Relational & RDBMS

  • Oracle — Enterprise Schema Design
  • MySQL — High-Throughput Transactions
  • PostgreSQL — Complex Query Optimisation
  • Index Strategies & Query Tuning
  • Stored Procedures & Migrations
  • BI Modernisation & Data Lake Migration
🤖

AI & Intelligent Systems

  • AI Agent Engineering (20+ built)
  • LLM Application Development
  • RAG Pipelines & Vector Search
  • Enterprise Chatbot Architecture
  • Marketing & Engineering Automation
👁

Observability & Platform Ops

  • Datadog APM & Infrastructure
  • Prometheus & Alertmanager
  • OpenTelemetry Distributed Tracing
  • Mobile Alerting & On-Call Systems
  • Google Maps & Geofencing APIs
  • AWS & Azure Multi-Cloud
🔧

Performance Engineering

  • Load Generation to 20M+ Connections
  • Benchmarking — p50 / p95 / p99
  • TCP Stack & Kernel Tuning
  • JVM GC & Heap Optimization
  • Network Buffer & Throughput Tuning
🏛

Engineering Leadership

  • Technical Strategy & Roadmapping
  • Team Building, Hiring & Mentorship
  • Architecture Governance & OKRs
  • Cross-Functional Collaboration
  • Technical Debt Management
Technologies & Tools
JavaScalaPython Apache KafkaApache Spark CassandraApache Ignite RedisAerospike ElasticsearchClickHouse OracleMySQLPostgreSQL Hadoop / HDFSStreamSets AWSAzure DatadogPrometheusOpenTelemetry Google Maps API Bidirectional Notifications Mobile Alert Systems Load Testing (20M+) JVM & TCP Tuning BigQueryDocker Spring BootVert.x LangChainLLM APIs RAGMQTTJenkins
Key Projects

Platforms That
Redefined Scale

20M+ concurrent nodes benchmarked IoT · 20M+ Benchmarked
Akiro IoT Platform

Multi-tenant IoT across healthcare, telematics & energy. Load-benchmarked to 20M+ concurrent connections. System-level TCP/kernel tuning on AWS & Azure. Kafka, Cassandra, Ignite, Aerospike, bidirectional notifications.

20M+Connections
100M+Daily Messages
Multi-CloudAWS & Azure
DEVICE KAFKA SPARK ELASTIC Analytics · Real-Time
OTTPlay Analytics Pipeline

Full-stack pipeline: device → Kafka → Spark → Elasticsearch. Support teams query millions of events by user, device, session, or error — diagnosing playback failures instantly.

M+Events Indexed
Real-TimeQueries
ZeroRaw Log Access
AI MKT RAG ENG 20+ production agents deployed AI · 20+ Agents
AI Agents Ecosystem

20+ production AI agents — content generation, lead qualification, code review, incident triage. Enterprise chatbots with RAG, multi-turn context, and bidirectional notification integration.

20+In Production
2Domains
RAGPowered
Campaign · Geo · Real-Time
Real-Time Response System (RTRS)

100+ concurrent location-based campaigns. Google Maps for route-aware targeting. Bidirectional delivery receipts. Ignite in-memory over 20+ node Hadoop. Datadog + mobile alerting.

100+Concurrent
Bidir.Notifications
↓60%Latency
p99 latency ms requests/sec ALERT ↑ 📱 ALERT on-call mobile
Observability · Mobile Alerting · Load Engineering
Full-Stack Monitoring & Load Benchmarking

3-layer observability: Datadog APM, Prometheus SLO tracking, OpenTelemetry tracing. Bespoke mobile alerting with severity triage, escalation chains, and runbook links. Load frameworks benchmarked to 20M+ concurrent connections — full p50/p95/p99 profiles on AWS and Azure.

3-LayerObservability
MobileAlert Platform
20M+Load Tested
p99Profiled
Let’s Connect

Ready to Build
the Future Together

Open to high-impact engineering conversations, platform challenges, and collaboration at the frontier of distributed systems and AI infrastructure.

“The best platforms don't just process data — they create clarity from chaos, connecting millions of signals into outcomes that matter.”
Education

Bachelor of Engineering — Computer Science

Madras Institute of Technology (MIT Campus, Chrompet)
Anna University, Chennai