SURENDIRAN SELVAM
AI Lead Engineer
Building and delivering end-to-end AI platforms with hybrid RAG, scalable AI infrastructure, agentic workflows and multi-agent systems.
Trusted in enterprise-grade AI platforms and live production environments.
10+
Years Engineering
Software & AI
3+
AI Projects
RAG • Agentic AI • Multi Agents • AI Infra
10+
Automation Projects
Infra • Platform • Dev Productivity
5+
Domain Expertise
ERP • BFSI • Health
Logistics • CMS
2+
Cloud Platforms
GCP • Azure
What I Do
AI Platform Architecture
Architecting enterprise-grade AI platforms from the ground up. Designing scalable backends, distributed services, orchestration layers, cloud-native pipelines, and production observability systems that power mission-critical applications at scale.
RAG & LLM-OPS
Building production RAG systems with hybrid search, semantic indexing, and comprehensive evaluation workflows. Delivering end-to-end LLM-OPS solutions with monitoring, versioning, and optimization for reliable AI responses in production environments.
Multi-Agent Systems
Designing intelligent agentic systems with reasoning graphs, tool-calling workflows, and autonomous multi-agent orchestration. Creating systems that handle complex decision-making, planning, and coordination for enterprise-scale use cases.
Infrastructure & Automation
Engineering robust CI/CD systems, automation frameworks, and platform tooling that enhance reliability and accelerate deployment. Building infrastructure that enables teams to ship faster with confidence and maintain production stability.
Featured Projects
🚀 Multi-Agent Product Intelligence Platform
Production-ready multi-agent AI platform demonstrating enterprise-grade architecture with intelligent orchestration, hybrid search, and scalable infrastructure. Built with end-to-end observability and type-safe AI systems.
Technical Skills
AI & LLM Engineering
- RAG Pipelines
- Hybrid Search (Semantic + Keyword)
- LangChain / LangGraph
- LangSmith (Observability)
- Embedding Models
- Prompt Engineering & Context Design
- Vector Databases (Qdrant)
- RAG Evaluation (RAGAS)
- Generative AI / Foundation Models
- Agent-to-Agent Protocol (A2A)
- Model Context Protocol (MCP)
Backend & API Engineering
- Python
- Java
- FastAPI
- REST / GraphQL
- Async Architectures & Middleware Systems
- SQL Databases
Cloud & Infrastructure
- GCP Vertex AI
- Microsoft Azure
- Docker
- Kubernetes
- Jenkins
- Prometheus / Grafana
- Terraform
- Linux
- LLM Serving & Inference (vLLM)
- ELK Stack (Elasticsearch, Logstash, Kibana)
Automation & Dev Productivity
- CI/CD Automation
- Selenium
- Rest Assured
- Pytest
- JUnit
- Github Actions
- AI-Assisted Development (GitHub Copilot, Cursor)
Experience
Senior Engineer | AI Technical Lead
CurrentJan 2024 – Present
- • Led Confluence-to-RAG ingestion for Fortune client, designed extraction/normalization standards that improved knowledge base quality and cut ingestion runtime by 35–45%
- • Managed EliteA agent workflows in Python, achieving 90% accuracy and supporting hundreds of users daily with policy-aware responses
- • Enhanced Qdrant hybrid retrieval by 25–35%, reducing false-positives by 20–30% and improving query precision
- • Lowered P95 latency by 20–25% and triage time by 50–60% using trace/log-driven troubleshooting with LangSmith, and guided AI vendor selection achieving 40% cost reduction
Lead Automation Engineer
Aug 2021 – Jan 2024
- • Maintained 99%+ success rate for daily runs by developing automation platform for 8 teams, significantly enhancing operational efficiency
- • Decreased manual tasks and re-runs by 50–70% through Azure DevOps pipeline optimization, facilitating continuous validation
- • Decreased environment setup time from hours to under an hour using Kubernetes-based tooling via Helm, and ingested tens of GB of logs daily reducing diagnosis time by 40–50%
- • Scaled Selenium Grid to 50+ concurrent sessions, reducing end-to-end suite runtime by 50–65%, and automated failure-to-triage workflows decreasing triage time by 40–45%
Software Engineer
Aug 2019 – Aug 2021
- • Led 3 engineers, updated stakeholders, and ensured on-time releases on Mobius View project (Federated Enterprise Content Search & Archive)
- • Enabled Docker/Kubernetes-based environments, cutting provisioning time by 60% through containerized infrastructure
- • Built Jenkins pipelines for WebLogic/Tomcat deployments, facilitating systematized, zero-downtime deployments
- • Developed CI/CD pipelines in Jenkins (Groovy + shell + Docker) for continuous validation, enabling systematized deployments with zero-downtime rollbacks
Software Engineer
Jun 2018 – Jul 2019
- • Worked on Mobius View project (Federated Enterprise Content Search & Archive) - Search, display, and archive content from multiple federated sources
- • Built and maintained test environments across Tomcat/Oracle WebLogic, DBs (Oracle/PostgreSQL/SQL Server), and clustered deployments
- • Enabled container-based environments using Docker/Docker Compose and Kubernetes deployments via Helm charts
- • Developed CI/CD pipelines in Jenkins (Groovy + shell + Docker) and built REST validation using Postman/Swagger and maintained API automation via SoapUI Pro
Software Engineer
Mar 2018 – May 2018
- • Worked on TNPDS — Public Distribution System (Tamil Nadu): Large-scale statewide retail network (~25,000 distribution points) for transparent commodity delivery
- • Created test suites and identified automation candidates; maintained RTM and functional coverage mapping
- • Developed automation scripts using Java, Selenium, TestNG, Maven; implemented stable locator strategies and page object design
- • Performed REST API validation using Postman and executed release cycles with deployment, failure analysis, and defect triaging
Software Engineer
Jun 2015 – Feb 2018
- • Worked on TNPDS — Public Distribution System (Tamil Nadu): Large-scale statewide retail network (~25,000 distribution points) for transparent commodity delivery
- • Created test suites and identified automation candidates; maintained RTM and functional coverage mapping, achieving 70% automation coverage
- • Developed automation scripts using Java, Selenium, TestNG, Maven; implemented stable locator strategies and page object design, reducing script maintenance effort by 40%
- • Executed release cycles: deployed builds to test environments, ran suites, performed failure analysis, and triaged defects
Certifications
About Me
I architect and build enterprise AI platforms that solve complex business challenges at scale. With 10+ years of engineering experience, I specialize in designing production-grade AI systems from the ground up — combining deep technical expertise with strategic thinking to deliver solutions that are both innovative and reliable.
My approach centers on clean architecture, scalable infrastructure, and production-first design. I've led the development of multi-agent AI systems, hybrid RAG platforms, and observability frameworks that power real-world applications serving thousands of users.
Beyond code, I focus on the entire lifecycle: architecture design, implementation, deployment, monitoring, and continuous optimization. I believe great AI systems are built on solid engineering foundations — proper observability, robust error handling, and thoughtful design patterns that enable teams to iterate quickly and deploy with confidence.
Let's Build Something Great Together
Interested in discussing AI architecture, production systems, or technical consulting? Let's explore how we can collaborate on your next AI initiative.
Download PDF
Deep-dive AI Insights
Live demos & prod systems