Available for Hire
Sarath Tharayil
Senior Data Scientist
Senior Data Scientist with ~4 years of experience developing forecasting models and scalable data solutions that support commercial decision-making. Experienced in time-series modelling, demand forecasting, and statistical analysis, with strong expertise in Python, SQL and cloud platforms including AWS and Azure. Proven track record of leading end-to-end analytical initiatives, from data engineering and model development to stakeholder communication and deployment of production forecasting systems.
/ Professional Experience
- ·Designed and implemented 15+ conceptual and logical data models for ML evaluation frameworks, reducing evaluation cycle time by 35%
- ·Collaborated with 8-member cross-functional agile team across bi-weekly sprint planning and retrospectives
- ·Maintained high-quality data standards through systematic validation and review processes supporting production ML evaluation workflows
- ·Applied NLP techniques (text classification, sentiment analysis, named entity recognition) to process customer feedback data
- ·Designed feature engineering pipelines on GCP BigQuery, improving model performance by 18%
- ·Applied Bayesian inference, hypothesis testing, and experimental design to validate model performance and business impact
- ·Built MLOps infrastructure using MLflow, Docker, and CI/CD pipelines with 98% deployment success rate
- ·Led a data engineering team of 10 designing scalable data transformation and ingestion pipelines on AWS
- ·Designed and deployed an end-to-end forecasting system for 60+ petrochemical products across daily, weekly, and monthly horizons
- ·Developed hybrid ML solutions (ARIMA, XGBoost) for time-series forecasting, improving forecast accuracy by 18%
- ·Built production-grade ML pipelines processing 3M+ daily transactions using Python, PySpark, and Airflow
- ·Led and mentored a team of 23 data scientists, improving team delivery velocity by 30%
/ Education
/ Selected Projects
End-to-end forecasting system predicting demand and pricing for 60+ petrochemical products across multiple regions at daily, weekly, and monthly horizons. Hybrid approach combining ARIMA, exponential smoothing, and XGBoost with engineered features including seasonality decomposition, lagged variables, and market indicators.
- ·18% improvement in forecast accuracy over baseline statistical models
- ·Automated retraining and monitoring pipelines on AWS SageMaker processing 3M+ daily transactions
- ·Backtesting framework with rolling-origin cross-validation across multiple forecast horizons
Always-on autonomous trading tool with real-time market monitoring, configurable alert criteria, and automated buy/sell execution. Multi-agent architecture using LangGraph for orchestration, with a criteria engine that evaluates technical indicators, sentiment signals, and portfolio constraints before executing trades.
- ·Multi-agent pipeline: market watcher → signal generator → risk evaluator → execution agent
- ·Configurable criteria engine supporting custom rule composition and backtesting against historical data
- ·Real-time alerting and audit trail with full decision provenance logging
Deep learning system for distinguishing AI-generated human faces from real photographs, developed as MSc dissertation at the University of Sheffield. Investigated model robustness under adversarial conditions and distribution shift between GAN-generated and diffusion-model-generated images.
- ·Evaluated CNNs, Vision Transformers, and hybrid architectures on curated datasets of GAN and diffusion-generated faces
- ·Analysed decision boundaries and failure modes using Grad-CAM and SHAP attribution methods
- ·Achieved >92% classification accuracy with cross-dataset generalisation experiments
Graph-augmented retrieval system that combines a Neo4j knowledge graph with vector embeddings for context-aware document retrieval. Built for enterprise use cases where entity relationships and domain ontology improve retrieval precision beyond standard dense-retrieval RAG.
- ·Graph-traversal + semantic search hybrid retrieval outperforming naive RAG by 31% on domain-specific QA benchmarks
- ·Automated entity extraction and relationship mapping pipeline using NER and co-reference resolution
- ·LangGraph orchestration with memory management, tool use, and multi-hop reasoning across graph hops
End-to-end NLP pipeline processing customer support tickets and feedback at scale. Multi-task model simultaneously handles intent classification, sentiment scoring, and named entity extraction to route tickets, surface product insights, and identify churn signals.
- ·Fine-tuned transformer models (BERT/RoBERTa) on domain-specific labelled data achieving 89% classification F1
- ·Feature engineering on GCP BigQuery with 18% downstream model performance improvement
- ·Real-time inference API with <200ms p99 latency, deployed via Docker on Cloud Run
Reusable MLOps infrastructure template built across multiple freelance engagements. Covers the full model lifecycle: experiment tracking, model registry, automated retraining triggers, drift detection, and staged rollout with shadow deployment support.
- ·98% deployment success rate across 10+ model rollouts using GitHub Actions CI/CD
- ·Automated data drift detection with statistical tests (KS, PSI) triggering retraining workflows
- ·Model registry with lineage tracking, A/B testing support, and automated rollback on metric degradation