ML Engineer & AI Systems Architect

Ayush Solanki

Building production-grade GenAI systems, retrieval stacks, and orchestration engines that make AI products measurable, reliable, and deployable.

Technical Arsenal

Full-stack ML engineering from research to production

LLM & Generative AI

Machine Learning

Backend & Systems

Cloud & Infrastructure

Product & Engineering

Data & Databases

Programming Languages

LLMs

RAG

Agents

Prompt

NLP

PyTorch

LangChain

LangGraph

LlamaIndex

Tools

Memory

MCP

Deep Research

FastAPI

Spring

Flask

REST

AWS

Azure

Docker

K8s

Postgres

Mongo

VectorDB

Python

C++

Java

SQL

ETL

MVP

SaaS

CI/CD

Git

Obs

Routing

Finetune

Gateway

Auth

Rate

Cache

Async

Deploy

Embed

Index

Eval

Cost

LLM & Generative AI

LLMs1-1

Large Language Models

RAG1-2

Retrieval Augmented Gen

Agents1-3

Agentic AI

Prompt1-11

Prompt Engineering

LangChain3-1

LLM Framework

LangGraph3-2

Agent Graphs

LlamaIndex3-3

LlamaIndex

Tools3-11

Tool Calling

Memory3-12

Agent Memory

MCP3-9

Model Context Protocol

Deep Research3-10

Deep Research

Routing1-12

LLM Routing

Machine Learning

ML2-1

Machine Learning

NLP2-2

Natural Language Proc.

CV2-3

Computer Vision

PyTorch2-9

Deep Learning

TF2-10

TensorFlow

HF2-11

HuggingFace

Finetune2-12

Model Finetuning

Backend & Systems

FastAPI4-1

Async APIs

Spring4-2

Spring Boot

Flask4-3

Lightweight APIs

REST4-4

API Design

Gateway5-12

LLM Gateway

Auth4-9

AuthN / AuthZ

Rate4-10

Rate Limiting

Cache4-11

Caching

Async4-12

Async Workers

Cloud & Infrastructure

AWS5-1

Cloud Platform

Azure5-2

Cloud Services

Docker5-3

Containers

K8s5-4

Kubernetes

Obs7-12

Observability

Deploy7-11

Model Deployment

Product & Engineering

MVP7-1

Prototyping

SaaS7-2

AI Products

CI/CD7-3

Automation

Git7-4

Version Control

Eval7-9

Model Evaluation

Cost7-10

Cost Optimization

Data & Databases

Postgres5-9

SQL DB

Mongo5-10

NoSQL

VectorDB5-11

Vector DBs

SQL6-10

Query Lang

ETL6-11

Data Pipelines

Embed6-9

Embeddings

Index6-12

Vector Indexing

Programming Languages

Python6-1

Primary Lang

C++6-2

Systems

Java6-3

Enterprise

Go6-4

Concurrency

ALWAYS AN ACTIVE LEARNER :)

Technical Expertise

Full-stack ML engineering from research to production

ML & LLM Systems

TransformersLoRA Fine-tuningRAG PipelinesEmbeddingsPrompt Engineering

Cloud & Infra

AWSDockerKubernetesCI/CD PipelinesService Orchestration

Backend & Systems

PythonGoC++FastAPIREST APIsAuthentication

Data & Retrieval

PostgreSQLWeaviateBM25HNSWHybrid Retrieval

Featured Projects

Production systems that drive real business impact

MiniVec: High-Performance HNSW Vector Search Engine

Built a from-scratch C++17 HNSW engine with pybind11 bindings, reaching 89% recall@10 and 11.7ms P50 latency on a 1M-vector benchmark.

C++17HNSWpybind11Vector Search

StepEngine: Distributed Workflow Orchestration Engine

Designed a Go-based workflow engine with DAG execution, retries, fan-in/fan-out routing, and exactly-once transactional state handling.

GoPostgreSQLDAGDistributed Systems

Unified LLM Tool-Calling Gateway

Implemented a multi-provider gateway for OpenAI, Gemini, Grok, and DeepSeek with schema standardization, dynamic routing, and graceful fallbacks.

OpenAIGeminiDeepSeekFastAPI

Healthcare RAG & Workflow Orchestration Stack

Shipped production healthcare Q&A pipelines with hybrid retrieval, cross-encoder reranking, multi-agent orchestration, and MCP-based context isolation.

RAGBM25MCPLLM Agents

Experience

Building applied AI systems with measurable production impact

AI/ML Engineer

Optimoz Engineering - AI Driven Healthcare Solutions

Sep 2024 - Present

Fine-tuned and evaluated LLaMA and DeepSeek models for healthcare Q&A, improving answer accuracy by 26% and reducing response inconsistency by 30% through prompt calibration.
Designed a unified LLM tool-calling gateway across OpenAI, Gemini, Grok, and DeepSeek, cutting new model integration time from 2 days to under 2-4 hours.
Architected hybrid-search RAG pipelines with BM25, dense embeddings, and cross-encoder reranking, delivering 35-40% hallucination reduction and 25% better retrieval precision@5.
Built JSON-schema-driven workflow orchestration with multi-agent DAG execution, conditional routing, and MCP-based user context isolation.

AI/ML Engineering Intern

KenexAI

Jan 2024 - Sep 2024

Developed an end-to-end NLP-to-SQL system over relational schemas, reaching 85% query correctness on production-style multi-table workloads.
Built an LLM-powered meeting summarization system that generated summaries, action items, and technical notes while reducing post-meeting documentation time by 70%.

Education

Strong engineering fundamentals behind the ML systems work

Government Engineering College, Gandhinagar

Bachelor of Engineering in Information Technology

2021 - 2024

CGPA

9.07

Research & Publications

Work spanning representation learning, multimodal modeling, and data mining

Class-Conditional Regularization for Cross-Lingual Representation Stability

Under review at PRL

Enhancing Emotion Recognition Using Multimodal Deep Neural Networks

DOI referenced in resume

Efficient Join Operations for Utility List-Based High-Utility Mining

DOI referenced in resume

Certifications & Achievements

Signals of depth across cloud architecture and competitive engineering work

AWS Certified Solutions Architect

Google Solution Challenge - Global Top 100

UNESCO Hackathon Finalist

Smart India Hackathon Winner

Let's Build ML Systems That Hold Up In Production

Open to ML engineering, applied AI, and GenAI platform roles, including remote opportunities.

Contact Me→