SARATH THARAYILSARATH THARAYIL
WRITEUPSCONCEPTSPROJECTSLABABOUT
മ
/ SYSTEM

Building thoughtful software, writing notes, and shipping experiments across data, AI, and the web.

No cookies, no tracking. Preferences are stored locally in your browser. Anonymous view counts are kept server-side.

© 2026 Sarath Tharayil/IST --:--:--

Available for Hire

/ HIRE-ME
recruiter.txtavailable

Sarath Tharayil

Senior Data Scientist

Open to
Data ScientistML EngineerData EngineerData AnalystAI Engineer

Senior Data Scientist with ~4 years of experience developing forecasting models and scalable data solutions that support commercial decision-making. Experienced in time-series modelling, demand forecasting, and statistical analysis, with strong expertise in Python, SQL and cloud platforms including AWS and Azure. Proven track record of leading end-to-end analytical initiatives, from data engineering and model development to stakeholder communication and deployment of production forecasting systems.

Contact
iamsaraththarayil@gmail.comlinkedin.com/in/SarathTharayilgithub.com/saraththarayilsaraththarayil.com+44 7533 886825 (UK)+91 7012604990 (IN)
Download Resume (PDF)

/ Professional Experience

Google· London
Oct 2025 – Jan 2026
Data Analyst (Contract)
  • ·Designed and implemented 15+ conceptual and logical data models for ML evaluation frameworks, reducing evaluation cycle time by 35%
  • ·Collaborated with 8-member cross-functional agile team across bi-weekly sprint planning and retrospectives
  • ·Maintained high-quality data standards through systematic validation and review processes supporting production ML evaluation workflows
Upwork· Remote
Jan 2024 – Oct 2025
Freelance Data Scientist
  • ·Applied NLP techniques (text classification, sentiment analysis, named entity recognition) to process customer feedback data
  • ·Designed feature engineering pipelines on GCP BigQuery, improving model performance by 18%
  • ·Applied Bayesian inference, hypothesis testing, and experimental design to validate model performance and business impact
  • ·Built MLOps infrastructure using MLflow, Docker, and CI/CD pipelines with 98% deployment success rate
MuSigma· Bengaluru
Sep 2020 – Sep 2022
Senior Data Scientist
  • ·Led a data engineering team of 10 designing scalable data transformation and ingestion pipelines on AWS
  • ·Designed and deployed an end-to-end forecasting system for 60+ petrochemical products across daily, weekly, and monthly horizons
  • ·Developed hybrid ML solutions (ARIMA, XGBoost) for time-series forecasting, improving forecast accuracy by 18%
  • ·Built production-grade ML pipelines processing 3M+ daily transactions using Python, PySpark, and Airflow
  • ·Led and mentored a team of 23 data scientists, improving team delivery velocity by 30%

/ Education

The University of SheffieldSep 2022 – Jan 2024
Master of Science, Data ScienceDissertation: Classification of Real and AI-generated human faces
Cochin University of Science and TechnologyAug 2016 – Aug 2020
B.Tech, Computer ScienceDissertation: DDoS attack analysis via IoT botnets using ML classifiers

/ Selected Projects

Petrochemical Demand & Price Forecasting Platform

End-to-end forecasting system predicting demand and pricing for 60+ petrochemical products across multiple regions at daily, weekly, and monthly horizons. Hybrid approach combining ARIMA, exponential smoothing, and XGBoost with engineered features including seasonality decomposition, lagged variables, and market indicators.

  • ·18% improvement in forecast accuracy over baseline statistical models
  • ·Automated retraining and monitoring pipelines on AWS SageMaker processing 3M+ daily transactions
  • ·Backtesting framework with rolling-origin cross-validation across multiple forecast horizons
PythonXGBoostARIMAPySparkAWS SageMakerAirflowDocker
Agentic Trading Intelligence System (Metis)

Always-on autonomous trading tool with real-time market monitoring, configurable alert criteria, and automated buy/sell execution. Multi-agent architecture using LangGraph for orchestration, with a criteria engine that evaluates technical indicators, sentiment signals, and portfolio constraints before executing trades.

  • ·Multi-agent pipeline: market watcher → signal generator → risk evaluator → execution agent
  • ·Configurable criteria engine supporting custom rule composition and backtesting against historical data
  • ·Real-time alerting and audit trail with full decision provenance logging
PythonLangGraphLangChainLLMNeo4jFastAPIStreamlit
AI vs. Real Face Classification (Dissertation)

Deep learning system for distinguishing AI-generated human faces from real photographs, developed as MSc dissertation at the University of Sheffield. Investigated model robustness under adversarial conditions and distribution shift between GAN-generated and diffusion-model-generated images.

  • ·Evaluated CNNs, Vision Transformers, and hybrid architectures on curated datasets of GAN and diffusion-generated faces
  • ·Analysed decision boundaries and failure modes using Grad-CAM and SHAP attribution methods
  • ·Achieved >92% classification accuracy with cross-dataset generalisation experiments
PythonPyTorchVision TransformersGrad-CAMSHAPScikit-learn
Enterprise Knowledge Graph & RAG Pipeline

Graph-augmented retrieval system that combines a Neo4j knowledge graph with vector embeddings for context-aware document retrieval. Built for enterprise use cases where entity relationships and domain ontology improve retrieval precision beyond standard dense-retrieval RAG.

  • ·Graph-traversal + semantic search hybrid retrieval outperforming naive RAG by 31% on domain-specific QA benchmarks
  • ·Automated entity extraction and relationship mapping pipeline using NER and co-reference resolution
  • ·LangGraph orchestration with memory management, tool use, and multi-hop reasoning across graph hops
PythonNeo4jLangGraphLangChainOpenAIFAISSspaCy
NLP Customer Intelligence Platform

End-to-end NLP pipeline processing customer support tickets and feedback at scale. Multi-task model simultaneously handles intent classification, sentiment scoring, and named entity extraction to route tickets, surface product insights, and identify churn signals.

  • ·Fine-tuned transformer models (BERT/RoBERTa) on domain-specific labelled data achieving 89% classification F1
  • ·Feature engineering on GCP BigQuery with 18% downstream model performance improvement
  • ·Real-time inference API with <200ms p99 latency, deployed via Docker on Cloud Run
PythonBERTHuggingFaceGCP BigQueryDockerCloud RunMLflow
Production MLOps Framework

Reusable MLOps infrastructure template built across multiple freelance engagements. Covers the full model lifecycle: experiment tracking, model registry, automated retraining triggers, drift detection, and staged rollout with shadow deployment support.

  • ·98% deployment success rate across 10+ model rollouts using GitHub Actions CI/CD
  • ·Automated data drift detection with statistical tests (KS, PSI) triggering retraining workflows
  • ·Model registry with lineage tracking, A/B testing support, and automated rollback on metric degradation
PythonMLflowDockerGitHub ActionsAWS ECR/ECSEvidently AIFastAPI

/ Skills & Tools

Core
PythonSQLRJavaScriptMachine LearningDeep LearningNLPTime Series ForecastingStatistical AnalysisRAGAgentic AIPrompt EngineeringData ModellingData Visualisation
Platforms & Tools
AWS (S3, Redshift, Glue, SageMaker, Lambda)GCP / BigQueryAzure (ADLS, ADF, Databricks)Apache AirflowPySparkSpark / HadoopMLflowDockerKubernetesdbtDatabricksSnowflakeTableauPower BILookerLangChainLangGraphNeo4jGitHub ActionsJenkinsAzure DevOpsKafkaElasticsearch