SURENDIRAN SELVAM

AI Lead Engineer

Building and delivering end-to-end AI platforms with hybrid RAG, scalable AI infrastructure, agentic workflows and multi-agent systems.

📈 AI Case Studies 🚀 Production-Ready AI Systems Download Resume

Trusted in enterprise-grade AI platforms and live production environments.

10+

Years Engineering

Software & AI

AI Projects

RAG • Agentic AI • Multi Agents • AI Infra

10+

Automation Projects

Infra • Platform • Dev Productivity

Domain Expertise

ERP • BFSI • Health
Logistics • CMS

Cloud Platforms

GCP • Azure

What I Do

AI Platform Architecture

Architecting enterprise-grade AI platforms from the ground up. Designing scalable backends, distributed services, orchestration layers, cloud-native pipelines, and production observability systems that power mission-critical applications at scale.

RAG & LLM-OPS

Building production RAG systems with hybrid search, semantic indexing, and comprehensive evaluation workflows. Delivering end-to-end LLM-OPS solutions with monitoring, versioning, and optimization for reliable AI responses in production environments.

Multi-Agent Systems

Designing intelligent agentic systems with reasoning graphs, tool-calling workflows, and autonomous multi-agent orchestration. Creating systems that handle complex decision-making, planning, and coordination for enterprise-scale use cases.

Infrastructure & Automation

Engineering robust CI/CD systems, automation frameworks, and platform tooling that enhance reliability and accelerate deployment. Building infrastructure that enables teams to ship faster with confidence and maintain production stability.

Featured Projects

🚀 Multi-Agent Product Intelligence Platform

Production-ready multi-agent AI platform demonstrating enterprise-grade architecture with intelligent orchestration, hybrid search, and scalable infrastructure. Built with end-to-end observability and type-safe AI systems.

OpenAIFastAPILangChainQdrantLangSmithRagasDockerGCP

View Project →

View All Projects →

Technical Skills

AI & LLM Engineering

RAG Pipelines
Hybrid Search (Semantic + Keyword)
LangChain / LangGraph
LangSmith (Observability)
Embedding Models
Prompt Engineering & Context Design
Vector Databases (Qdrant)
RAG Evaluation (RAGAS)
Generative AI / Foundation Models
Agent-to-Agent Protocol (A2A)
Model Context Protocol (MCP)

Backend & API Engineering

Python
Java
FastAPI
REST / GraphQL
Async Architectures & Middleware Systems
SQL Databases

Cloud & Infrastructure

GCP Vertex AI
Microsoft Azure
Docker
Kubernetes
Jenkins
Prometheus / Grafana
Terraform
Linux
LLM Serving & Inference (vLLM)
ELK Stack (Elasticsearch, Logstash, Kibana)

Automation & Dev Productivity

CI/CD Automation
Selenium
Rest Assured
Pytest
JUnit
Github Actions
AI-Assisted Development (GitHub Copilot, Cursor)

Experience

Senior Engineer | AI Technical Lead

Current

Jan 2024 – Present

EPAM Systems

PythonLangChainLangGraphLangSmithOpenAIGeminiGroqLiteLLMInstructorPydanticQdrantHybrid Search (BM25 + Semantic)RRFRAGASFastAPIPostgreSQLEliteA

• Led Confluence-to-RAG ingestion for Fortune client, designed extraction/normalization standards that improved knowledge base quality and cut ingestion runtime by 35–45%
• Managed EliteA agent workflows in Python, achieving 90% accuracy and supporting hundreds of users daily with policy-aware responses
• Enhanced Qdrant hybrid retrieval by 25–35%, reducing false-positives by 20–30% and improving query precision
• Lowered P95 latency by 20–25% and triage time by 50–60% using trace/log-driven troubleshooting with LangSmith, and guided AI vendor selection achieving 40% cost reduction

Lead Automation Engineer

Aug 2021 – Jan 2024

Encora Innovation Labs

Azure DevOpsAzure PipelinesJenkinsDockerDocker ComposeKubernetesHelmELK StackPrometheusGrafanaSonarQubeSelenium GridJava 17SeleniumRest AssuredSpring BootJMeterPostman

• Maintained 99%+ success rate for daily runs by developing automation platform for 8 teams, significantly enhancing operational efficiency
• Decreased manual tasks and re-runs by 50–70% through Azure DevOps pipeline optimization, facilitating continuous validation
• Decreased environment setup time from hours to under an hour using Kubernetes-based tooling via Helm, and ingested tens of GB of logs daily reducing diagnosis time by 40–50%
• Scaled Selenium Grid to 50+ concurrent sessions, reducing end-to-end suite runtime by 50–65%, and automated failure-to-triage workflows decreasing triage time by 40–45%

Software Engineer

Aug 2019 – Aug 2021

ASG Technologies

JavaSelenium WebDriverJUnit4MavenJenkinsDockerKubernetesHelmTomcatOracle WebLogicPostgreSQLSQL ServerApache httpdNginxSoapUI ProPostmanSwaggerJMeter

• Led 3 engineers, updated stakeholders, and ensured on-time releases on Mobius View project (Federated Enterprise Content Search & Archive)
• Enabled Docker/Kubernetes-based environments, cutting provisioning time by 60% through containerized infrastructure
• Built Jenkins pipelines for WebLogic/Tomcat deployments, facilitating systematized, zero-downtime deployments
• Developed CI/CD pipelines in Jenkins (Groovy + shell + Docker) for continuous validation, enabling systematized deployments with zero-downtime rollbacks

Software Engineer

Jun 2018 – Jul 2019

Wipro

JavaSelenium WebDriverJUnit4MavenJenkinsDockerKubernetesHelmTomcatOracle WebLogicPostgreSQLSQL ServerApache httpdNginxSoapUI ProPostmanSwaggerJMeter

• Worked on Mobius View project (Federated Enterprise Content Search & Archive) - Search, display, and archive content from multiple federated sources
• Built and maintained test environments across Tomcat/Oracle WebLogic, DBs (Oracle/PostgreSQL/SQL Server), and clustered deployments
• Enabled container-based environments using Docker/Docker Compose and Kubernetes deployments via Helm charts
• Developed CI/CD pipelines in Jenkins (Groovy + shell + Docker) and built REST validation using Postman/Swagger and maintained API automation via SoapUI Pro

Software Engineer

Mar 2018 – May 2018

OASYS Cybernetics

Java 1.8Selenium 3.6.0TestNGMavenJenkinsPostgreSQL 9Apache Tomcat 8PostmanSQL

• Worked on TNPDS — Public Distribution System (Tamil Nadu): Large-scale statewide retail network (~25,000 distribution points) for transparent commodity delivery
• Created test suites and identified automation candidates; maintained RTM and functional coverage mapping
• Developed automation scripts using Java, Selenium, TestNG, Maven; implemented stable locator strategies and page object design
• Performed REST API validation using Postman and executed release cycles with deployment, failure analysis, and defect triaging

Software Engineer

Jun 2015 – Feb 2018

Finatel Technologies

Java 1.8Selenium 3.6.0TestNGMavenJenkinsPostgreSQL 9Apache Tomcat 8PostmanSQL

• Worked on TNPDS — Public Distribution System (Tamil Nadu): Large-scale statewide retail network (~25,000 distribution points) for transparent commodity delivery
• Created test suites and identified automation candidates; maintained RTM and functional coverage mapping, achieving 70% automation coverage
• Developed automation scripts using Java, Selenium, TestNG, Maven; implemented stable locator strategies and page object design, reducing script maintenance effort by 40%
• Executed release cycles: deployed builds to test environments, ran suites, performed failure analysis, and triaged defects

Certifications

Google Cloud Professional Machine Learning Engineer

Google Cloud

October 2025

Verify Download

Certified Kubernetes Administrator (CKA)

Cloud Native Computing Foundation

September 2025

Verify Download

Microsoft Azure AI Engineer Associate

Microsoft

April 2025

Verify Download

Microsoft Azure Fundamentals

Microsoft

April 2025

Verify Download

About Me

I architect and build enterprise AI platforms that solve complex business challenges at scale. With 10+ years of engineering experience, I specialize in designing production-grade AI systems from the ground up — combining deep technical expertise with strategic thinking to deliver solutions that are both innovative and reliable.

My approach centers on clean architecture, scalable infrastructure, and production-first design. I've led the development of multi-agent AI systems, hybrid RAG platforms, and observability frameworks that power real-world applications serving thousands of users.

Beyond code, I focus on the entire lifecycle: architecture design, implementation, deployment, monitoring, and continuous optimization. I believe great AI systems are built on solid engineering foundations — proper observability, robust error handling, and thoughtful design patterns that enable teams to iterate quickly and deploy with confidence.

Connect

Let's Build Something Great Together

Interested in discussing AI architecture, production systems, or technical consulting? Let's explore how we can collaborate on your next AI initiative.

contact@surendiran.ai

LinkedIn GitHub Medium

Download Resume

Download PDF

📖 Read Blogs

Deep-dive AI Insights

🤖 View AI Projects

Live demos & prod systems