~/engineering $

Siva
Samraj S

Director of Engineering · Platform Architect
Building at the frontier of Scale & Intelligence

Apache Kafka & Spark Cassandra & Ignite Redis · Aerospike Elasticsearch 20M+ IoT Benchmarked Datadog · Prometheus · OTEL Google Maps API Bidirectional Notifications 20+ AI Agents Oracle · MySQL · Postgres

15+ years turning complex distributed challenges into elegant, scalable systems. Expert in platforms that sustain 20M+ concurrent connections, process 100M+ daily events, and intelligently self-monitor — from deep IoT infrastructure to production AI agent ecosystems.

Explore My Work → Get in Touch ▶ YouTube

0Yrs Experience

0M+ Connections

0+ AI Agents

0M+ Daily Events

About & Vision

Engineering Systems
That Define Scale

I am an engineering leader and systems architect with 15+ years building platforms at the edge of what's technically possible — bridging architectural depth with the leadership discipline to deliver reliably at scale.

As Director of Engineering at CultureMonkey, I drive technical architecture for a global SaaS serving 10M+ enterprise employees — leading teams, architecting event-driven microservices, and building AI and observability infrastructure for platform reliability.

Previously at Sentienz Solutions as Chief Architect, I engineered IoT platforms load-tested to 20M+ concurrent connections, real-time analytics pipelines at 100M+ daily events, and distributed campaign engines with bidirectional notification loops.

I actively build and deploy AI agents — 20+ in production — and share engineering knowledge on YouTube via Sentienz Solutions.

“The best platforms don't just process data — they create clarity from chaos, connecting millions of signals into outcomes that matter.”

— Siva Samraj S, Director of Engineering

⚡

Distributed Systems at Scale

Kafka, Cassandra, Ignite, Elasticsearch, Redis, Aerospike — tuned to 20M+ concurrent connections with system-level TCP and JVM optimization.

👁

Observability & Reliability

Datadog, Prometheus, OpenTelemetry. Mobile alerting for on-call triage. SLO-driven reliability culture across all platforms.

🌍

Location & Notifications

Google Maps & Directions API for geofencing. Bidirectional notification systems for real-time device–server loops at scale.

🤖

AI Agents & Automation

20+ production AI agents across marketing and engineering. RAG pipelines, LLM chatbots, intelligent automation.

🗄

Databases — Relational & NoSQL

Oracle, MySQL, PostgreSQL for transactional workloads. Cassandra, Aerospike, Redis, Ignite for high-throughput, low-latency distributed data.

🎓

Education — MIT Alumni

B.E. Computer Science, Madras Institute of Technology (MIT), Anna University, Chennai.

Technical Depth

What Sets Me Apart

Hands-on mastery across distributed platforms, databases, real-time data, AI, and IoT — battle-tested at production scale.

Stream Processing

⚡

Apache Kafka & Spark

Cluster design, consumer tuning, schema evolution. Spark Structured Streaming + StreamSets at 100M+ events/day.

KafkaSparkStreamSets100M+

Distributed Storage

📊

Cassandra, Ignite & Aerospike

Production Cassandra for high-throughput writes. Ignite in-memory + Aerospike for ultra-low latency workloads — kernel-level tuning for p99 performance.

CassandraApache IgniteAerospikeKernel Tuning

Search & Cache

🔍

Elasticsearch & Redis

Millions of event queries in real time. Redis for sub-millisecond caching, pub/sub, and session management. Optimised shard strategies and query caching.

ElasticsearchRedisM+ QueriesSub-ms Cache

Relational Databases

🗄

Oracle, MySQL & PostgreSQL

Enterprise-grade relational database design, query optimisation, indexing strategies, stored procedures, and schema migrations at scale.

OracleMySQLPostgreSQLQuery Tuning

IoT Infrastructure

📡

IoT — 20M+ Concurrent Connections

Multi-tenant IoT load-tested to 20M+ connections. System-level TCP, kernel, JVM tuning on AWS and Azure.

20M+ BenchmarkedMQTTAWS & Azure

Observability Stack

👁

Observability & Mobile Alerting

Datadog APM + Prometheus + OpenTelemetry. Mobile alerting — on-call engineers get severity-triaged alerts with acknowledgement and escalation.

DatadogPrometheusOTELMobile Alerts

AI & Automation

🤖

AI Agents & Intelligent Systems

20+ production AI agents — marketing and engineering. RAG pipelines, LLM chatbots, multi-turn context, notification integration.

20+ AgentsLLMsRAGChatbots

Location & Geo

🌍

Google Maps & Bidirectional Notifications

Maps and Directions API for geofencing and route-aware targeting. Bidirectional notification system with delivery receipts at scale.

Google MapsGeofencingBidirectional

Performance Engineering

🔧

System Tuning & Load Engineering

TCP limits, kernel buffer tuning, JVM GC, heap profiling. Benchmarked 20M+ concurrent connections with full p50/p95/p99 profiles on AWS and Azure.

20M+ TestedJVM Tuningp99 Profiling

📺 Featured Architecture

OTTPlay — Full-Stack Analytics Pipeline

End-to-end pipeline — raw device telemetry to queryable Elasticsearch where support teams diagnose playback failures in real time across millions of indexed events.

User Device Telemetry

App events, playback errors, buffering metrics emitted in real time.

Kafka Ingestion

Partitioned by user ID for ordered, fault-tolerant, zero-loss delivery.

Spark Stream Processing

Real-time enrichment — session stitching, error classification.

Elasticsearch Indexing

Tuned mappings and shard strategy for support query patterns.

Support Query Layer

Millions of events queryable by user, device, error, or time — instant root cause.

Career Journey

15 Years of
Defining Scale

Mar 2025 — Present

CultureMonkey

Chennai, India

Director of Engineering

KafkaClickHouseRedisMicroservicesElasticsearchDatadogAI AgentsPostgreSQLOTEL

Driving complete engineering architecture for a global SaaS serving 10M+ enterprise employees — aligning technical OKRs to revenue, product, and reliability goals.
Transformed monolithic architecture into event-driven microservices — 40% velocity increase, independent team ownership.
Real-time feedback pipelines (Kafka, Redis, ClickHouse) delivering 10x analytics performance; Elasticsearch at 100K+ batch ingestion scale.
AI-driven sentiment clustering across 200K+ feedbacks; omnichannel campaigns for Slack, Teams, WhatsApp, and Email.
Full-stack observability (Datadog APM, Prometheus, OTEL) and mobile alerting for on-call incident response.

Oct 2017 — Feb 2025

Sentienz Solutions

Bangalore, India

Chief Architect & Sr. Engineering Manager

KafkaSparkCassandraApache IgniteAerospikeRedisElasticsearchStreamSetsIoTAWS & Azure

Architected Akiro IoT platform — load-benchmarked at 20M+ concurrent connections; system-level TCP, JVM, kernel tuning on AWS and Azure.
Built Jarvis Central Data Platform — Spark and StreamSets pipelines feeding Elasticsearch, BigQuery, and S3 with zero data loss.
Designed OTTPlay analytics pipeline — device through Kafka and Spark to Elasticsearch, real-time support queries across millions of events.
Built RTRS with Google Maps, bidirectional notifications, and Apache Ignite — 100+ concurrent real-time campaigns.
Datadog APM, Prometheus, OpenTelemetry; mobile alerting system with severity triage and escalation chains.

2010 — 2017

Sony

Bangalore / US Onsite

Software Engineer → Technical Lead

HadoopKafkaOracleMySQLBI Modernization

Led Oracle to Hadoop Data Lake migration — reducing ETL processing time by 60%. Kafka streaming pipelines for BI modernization as US onsite technical liaison.
Progressed from Software Engineer to Technical Lead over 7 years through consistent delivery and technical excellence.

Technical Skills

The Full Toolkit

⚙️

Distributed Systems & Data

Kafka & Spark Stream Processing
Cassandra, Ignite & Aerospike
Elasticsearch — Millions of Queries
Redis — Sub-ms Cache & Pub/Sub
StreamSets Pipeline Orchestration
IoT — 20M+ Connections Benchmarked

🗄

Relational & RDBMS

Oracle — Enterprise Schema Design
MySQL — High-Throughput Transactions
PostgreSQL — Complex Query Optimisation
Index Strategies & Query Tuning
Stored Procedures & Migrations
BI Modernisation & Data Lake Migration

🤖

AI & Intelligent Systems

AI Agent Engineering (20+ built)
LLM Application Development
RAG Pipelines & Vector Search
Enterprise Chatbot Architecture
Marketing & Engineering Automation

👁

Observability & Platform Ops

Datadog APM & Infrastructure
Prometheus & Alertmanager
OpenTelemetry Distributed Tracing
Mobile Alerting & On-Call Systems
Google Maps & Geofencing APIs
AWS & Azure Multi-Cloud

🔧

Performance Engineering

Load Generation to 20M+ Connections
Benchmarking — p50 / p95 / p99
TCP Stack & Kernel Tuning
JVM GC & Heap Optimization
Network Buffer & Throughput Tuning

🏛

Engineering Leadership

Technical Strategy & Roadmapping
Team Building, Hiring & Mentorship
Architecture Governance & OKRs
Cross-Functional Collaboration
Technical Debt Management

Technologies & Tools

JavaScalaPython Apache KafkaApache Spark CassandraApache Ignite RedisAerospike ElasticsearchClickHouse OracleMySQLPostgreSQL Hadoop / HDFSStreamSets AWSAzure DatadogPrometheusOpenTelemetry Google Maps API Bidirectional Notifications Mobile Alert Systems Load Testing (20M+) JVM & TCP Tuning BigQueryDocker Spring BootVert.x LangChainLLM APIs RAGMQTTJenkins

Key Projects

Platforms That
Redefined Scale

IoT · 20M+ Benchmarked

Akiro IoT Platform

Multi-tenant IoT across healthcare, telematics & energy. Load-benchmarked to 20M+ concurrent connections. System-level TCP/kernel tuning on AWS & Azure. Kafka, Cassandra, Ignite, Aerospike, bidirectional notifications.

20M+Connections

100M+Daily Messages

Multi-CloudAWS & Azure

Analytics · Real-Time

OTTPlay Analytics Pipeline

Full-stack pipeline: device → Kafka → Spark → Elasticsearch. Support teams query millions of events by user, device, session, or error — diagnosing playback failures instantly.

M+Events Indexed

Real-TimeQueries

ZeroRaw Log Access

AI · 20+ Agents

AI Agents Ecosystem

20+ production AI agents — content generation, lead qualification, code review, incident triage. Enterprise chatbots with RAG, multi-turn context, and bidirectional notification integration.

20+In Production

2Domains

RAGPowered

Campaign · Geo · Real-Time

Real-Time Response System (RTRS)

100+ concurrent location-based campaigns. Google Maps for route-aware targeting. Bidirectional delivery receipts. Ignite in-memory over 20+ node Hadoop. Datadog + mobile alerting.

100+Concurrent

Bidir.Notifications

↓60%Latency

Observability · Mobile Alerting · Load Engineering

Full-Stack Monitoring & Load Benchmarking

3-layer observability: Datadog APM, Prometheus SLO tracking, OpenTelemetry tracing. Bespoke mobile alerting with severity triage, escalation chains, and runbook links. Load frameworks benchmarked to 20M+ concurrent connections — full p50/p95/p99 profiles on AWS and Azure.

3-LayerObservability

MobileAlert Platform

20M+Load Tested

p99Profiled

Siva
Samraj S

Director of Engineering

Engineering Systems
That Define Scale

Distributed Systems at Scale

Observability & Reliability

Location & Notifications

AI Agents & Automation

Databases — Relational & NoSQL

Education — MIT Alumni

What Sets Me Apart

Apache Kafka & Spark

Cassandra, Ignite & Aerospike

Elasticsearch & Redis

Oracle, MySQL & PostgreSQL

IoT — 20M+ Concurrent Connections

Observability & Mobile Alerting

AI Agents & Intelligent Systems

Google Maps & Bidirectional Notifications

System Tuning & Load Engineering

OTTPlay — Full-Stack Analytics Pipeline

User Device Telemetry

Kafka Ingestion

Spark Stream Processing

Elasticsearch Indexing

Support Query Layer

Engineered for Unprecedented Scale

15 Years of
Defining Scale

The Full Toolkit

Distributed Systems & Data

Relational & RDBMS

AI & Intelligent Systems

Observability & Platform Ops

Performance Engineering

Engineering Leadership

Platforms That
Redefined Scale

Ready to Build
the Future Together

Bachelor of Engineering — Computer Science

Engineering SystemsThat Define Scale

Distributed Systems at Scale

Observability & Reliability

Location & Notifications

AI Agents & Automation

Databases — Relational & NoSQL

Education — MIT Alumni

What Sets Me Apart

Apache Kafka & Spark

Cassandra, Ignite & Aerospike

Elasticsearch & Redis

Oracle, MySQL & PostgreSQL

IoT — 20M+ Concurrent Connections

Observability & Mobile Alerting

AI Agents & Intelligent Systems

Google Maps & Bidirectional Notifications

System Tuning & Load Engineering

OTTPlay — Full-Stack Analytics Pipeline

User Device Telemetry

Kafka Ingestion

Spark Stream Processing

Elasticsearch Indexing

Support Query Layer

Engineered for Unprecedented Scale

15 Years ofDefining Scale

The Full Toolkit

Distributed Systems & Data

Relational & RDBMS

AI & Intelligent Systems

Observability & Platform Ops

Performance Engineering

Engineering Leadership

Platforms ThatRedefined Scale

Ready to Buildthe Future Together

Bachelor of Engineering — Computer Science

Engineering Systems
That Define Scale

15 Years of
Defining Scale

Platforms That
Redefined Scale

Ready to Build
the Future Together