sarath tharayilopen to work

HIRE ME

freelance & full-time opportunitiesKerala, India

HIRE ME

/ HIRE-ME

[■][▲][✖]

RECRUITER.TXTavailable

Sarath Tharayil

Senior Data Scientist

Open to

Data ScientistML EngineerData EngineerData AnalystAI Engineer

Senior Data Scientist with ~4 years of experience developing forecasting models and scalable data solutions that support commercial decision-making. Experienced in time-series modelling, demand forecasting, and statistical analysis, with strong expertise in Python, SQL and cloud platforms including AWS and Azure. Proven track record of leading end-to-end analytical initiatives, from data engineering and model development to stakeholder communication and deployment of production forecasting systems.

Contact

◆iamsaraththarayil@gmail.com ◆linkedin.com/in/SarathTharayil ◆github.com/saraththarayil ◆saraththarayil.com ◆+44 7533 886825 (UK)◆+91 7012604990 (IN)

[ DOWNLOAD RESUME | >>> ]

/ PROFESSIONAL EXPERIENCE //

Google· London

Oct 2025 – Jan 2026

Data Analyst (Contract)

◆Designed and implemented 15+ conceptual and logical data models for ML evaluation frameworks, reducing evaluation cycle time by 35%
◆Collaborated with 8-member cross-functional agile team across bi-weekly sprint planning and retrospectives
◆Maintained high-quality data standards through systematic validation and review processes supporting production ML evaluation workflows

Upwork· Remote

Jan 2024 – Oct 2025

Freelance Data Scientist

◆Applied NLP techniques (text classification, sentiment analysis, named entity recognition) to process customer feedback data
◆Designed feature engineering pipelines on GCP BigQuery, improving model performance by 18%
◆Applied Bayesian inference, hypothesis testing, and experimental design to validate model performance and business impact
◆Built MLOps infrastructure using MLflow, Docker, and CI/CD pipelines with 98% deployment success rate

MuSigma· Bengaluru

Sep 2020 – Sep 2022

Senior Data Scientist

◆Led a data engineering team of 10 designing scalable data transformation and ingestion pipelines on AWS
◆Designed and deployed an end-to-end forecasting system for 60+ petrochemical products across daily, weekly, and monthly horizons
◆Developed hybrid ML solutions (ARIMA, XGBoost) for time-series forecasting, improving forecast accuracy by 18%
◆Built production-grade ML pipelines processing 3M+ daily transactions using Python, PySpark, and Airflow
◆Led and mentored a team of 23 data scientists, improving team delivery velocity by 30%

/ EDUCATION //

The University of SheffieldSep 2022 – Jan 2024

Master of Science, Data ScienceDissertation: Classification of Real and AI-generated human faces

Cochin University of Science and TechnologyAug 2016 – Aug 2020

B.Tech, Computer ScienceDissertation: DDoS attack analysis via IoT botnets using ML classifiers

/ SELECTED PROJECTS //

Petrochemical Demand & Price Forecasting Platform

End-to-end forecasting system predicting demand and pricing for 60+ petrochemical products across multiple regions at daily, weekly, and monthly horizons. Hybrid approach combining ARIMA, exponential smoothing, and XGBoost with engineered features including seasonality decomposition, lagged variables, and market indicators.

◆18% improvement in forecast accuracy over baseline statistical models
◆Automated retraining and monitoring pipelines on AWS SageMaker processing 3M+ daily transactions
◆Backtesting framework with rolling-origin cross-validation across multiple forecast horizons

PythonXGBoostARIMAPySparkAWS SageMakerAirflowDocker

Agentic Trading Intelligence System (Metis)

Always-on autonomous trading tool with real-time market monitoring, configurable alert criteria, and automated buy/sell execution. Multi-agent architecture using LangGraph for orchestration, with a criteria engine that evaluates technical indicators, sentiment signals, and portfolio constraints before executing trades.

◆Multi-agent pipeline: market watcher → signal generator → risk evaluator → execution agent
◆Configurable criteria engine supporting custom rule composition and backtesting against historical data
◆Real-time alerting and audit trail with full decision provenance logging

PythonLangGraphLangChainLLMNeo4jFastAPIStreamlit

AI vs. Real Face Classification (Dissertation)

Deep learning system for distinguishing AI-generated human faces from real photographs, developed as MSc dissertation at the University of Sheffield. Investigated model robustness under adversarial conditions and distribution shift between GAN-generated and diffusion-model-generated images.

◆Evaluated CNNs, Vision Transformers, and hybrid architectures on curated datasets of GAN and diffusion-generated faces
◆Analysed decision boundaries and failure modes using Grad-CAM and SHAP attribution methods
◆Achieved >92% classification accuracy with cross-dataset generalisation experiments

PythonPyTorchVision TransformersGrad-CAMSHAPScikit-learn

Enterprise Knowledge Graph & RAG Pipeline

Graph-augmented retrieval system that combines a Neo4j knowledge graph with vector embeddings for context-aware document retrieval. Built for enterprise use cases where entity relationships and domain ontology improve retrieval precision beyond standard dense-retrieval RAG.

◆Graph-traversal + semantic search hybrid retrieval outperforming naive RAG by 31% on domain-specific QA benchmarks
◆Automated entity extraction and relationship mapping pipeline using NER and co-reference resolution
◆LangGraph orchestration with memory management, tool use, and multi-hop reasoning across graph hops

PythonNeo4jLangGraphLangChainOpenAIFAISSspaCy

NLP Customer Intelligence Platform

End-to-end NLP pipeline processing customer support tickets and feedback at scale. Multi-task model simultaneously handles intent classification, sentiment scoring, and named entity extraction to route tickets, surface product insights, and identify churn signals.

◆Fine-tuned transformer models (BERT/RoBERTa) on domain-specific labelled data achieving 89% classification F1
◆Feature engineering on GCP BigQuery with 18% downstream model performance improvement
◆Real-time inference API with <200ms p99 latency, deployed via Docker on Cloud Run

PythonBERTHuggingFaceGCP BigQueryDockerCloud RunMLflow

Production MLOps Framework

Reusable MLOps infrastructure template built across multiple freelance engagements. Covers the full model lifecycle: experiment tracking, model registry, automated retraining triggers, drift detection, and staged rollout with shadow deployment support.

◆98% deployment success rate across 10+ model rollouts using GitHub Actions CI/CD
◆Automated data drift detection with statistical tests (KS, PSI) triggering retraining workflows
◆Model registry with lineage tracking, A/B testing support, and automated rollback on metric degradation

PythonMLflowDockerGitHub ActionsAWS ECR/ECSEvidently AIFastAPI

/ SKILLS & TOOLS //

// CORE

PythonSQLRJavaScriptMachine LearningDeep LearningNLPTime Series ForecastingStatistical AnalysisRAGAgentic AIPrompt EngineeringData ModellingData Visualisation

// PLATFORMS & TOOLS

AWS (S3, Redshift, Glue, SageMaker, Lambda)GCP / BigQueryAzure (ADLS, ADF, Databricks)Apache AirflowPySparkSpark / HadoopMLflowDockerKubernetesdbtDatabricksSnowflakeTableauPower BILookerLangChainLangGraphNeo4jGitHub ActionsJenkinsAzure DevOpsKafkaElasticsearch

/ PROFESSIONAL EXPERIENCE //

Google· London

Oct 2025 – Jan 2026

Data Analyst (Contract)

◆Designed and implemented 15+ conceptual and logical data models for ML evaluation frameworks, reducing evaluation cycle time by 35%
◆Collaborated with 8-member cross-functional agile team across bi-weekly sprint planning and retrospectives
◆Maintained high-quality data standards through systematic validation and review processes supporting production ML evaluation workflows

Upwork· Remote

Jan 2024 – Oct 2025

Freelance Data Scientist

◆Applied NLP techniques (text classification, sentiment analysis, named entity recognition) to process customer feedback data
◆Designed feature engineering pipelines on GCP BigQuery, improving model performance by 18%
◆Applied Bayesian inference, hypothesis testing, and experimental design to validate model performance and business impact
◆Built MLOps infrastructure using MLflow, Docker, and CI/CD pipelines with 98% deployment success rate

MuSigma· Bengaluru

Sep 2020 – Sep 2022

Senior Data Scientist

◆Led a data engineering team of 10 designing scalable data transformation and ingestion pipelines on AWS
◆Designed and deployed an end-to-end forecasting system for 60+ petrochemical products across daily, weekly, and monthly horizons
◆Developed hybrid ML solutions (ARIMA, XGBoost) for time-series forecasting, improving forecast accuracy by 18%
◆Built production-grade ML pipelines processing 3M+ daily transactions using Python, PySpark, and Airflow
◆Led and mentored a team of 23 data scientists, improving team delivery velocity by 30%

/ EDUCATION //

The University of SheffieldSep 2022 – Jan 2024

Master of Science, Data ScienceDissertation: Classification of Real and AI-generated human faces

Cochin University of Science and TechnologyAug 2016 – Aug 2020

B.Tech, Computer ScienceDissertation: DDoS attack analysis via IoT botnets using ML classifiers

/ SELECTED PROJECTS //

Petrochemical Demand & Price Forecasting Platform

◆18% improvement in forecast accuracy over baseline statistical models
◆Automated retraining and monitoring pipelines on AWS SageMaker processing 3M+ daily transactions
◆Backtesting framework with rolling-origin cross-validation across multiple forecast horizons

PythonXGBoostARIMAPySparkAWS SageMakerAirflowDocker

Agentic Trading Intelligence System (Metis)

◆Multi-agent pipeline: market watcher → signal generator → risk evaluator → execution agent
◆Configurable criteria engine supporting custom rule composition and backtesting against historical data
◆Real-time alerting and audit trail with full decision provenance logging

PythonLangGraphLangChainLLMNeo4jFastAPIStreamlit

AI vs. Real Face Classification (Dissertation)

◆Evaluated CNNs, Vision Transformers, and hybrid architectures on curated datasets of GAN and diffusion-generated faces
◆Analysed decision boundaries and failure modes using Grad-CAM and SHAP attribution methods
◆Achieved >92% classification accuracy with cross-dataset generalisation experiments

PythonPyTorchVision TransformersGrad-CAMSHAPScikit-learn

Enterprise Knowledge Graph & RAG Pipeline

◆Graph-traversal + semantic search hybrid retrieval outperforming naive RAG by 31% on domain-specific QA benchmarks
◆Automated entity extraction and relationship mapping pipeline using NER and co-reference resolution
◆LangGraph orchestration with memory management, tool use, and multi-hop reasoning across graph hops

PythonNeo4jLangGraphLangChainOpenAIFAISSspaCy

NLP Customer Intelligence Platform

◆Fine-tuned transformer models (BERT/RoBERTa) on domain-specific labelled data achieving 89% classification F1
◆Feature engineering on GCP BigQuery with 18% downstream model performance improvement
◆Real-time inference API with <200ms p99 latency, deployed via Docker on Cloud Run

PythonBERTHuggingFaceGCP BigQueryDockerCloud RunMLflow

Production MLOps Framework

◆98% deployment success rate across 10+ model rollouts using GitHub Actions CI/CD
◆Automated data drift detection with statistical tests (KS, PSI) triggering retraining workflows
◆Model registry with lineage tracking, A/B testing support, and automated rollback on metric degradation

PythonMLflowDockerGitHub ActionsAWS ECR/ECSEvidently AIFastAPI

/ SKILLS & TOOLS //

// CORE

PythonSQLRJavaScriptMachine LearningDeep LearningNLPTime Series ForecastingStatistical AnalysisRAGAgentic AIPrompt EngineeringData ModellingData Visualisation

// PLATFORMS & TOOLS