feedback Give us your feedback

AI-Powered Novel Virus Detection from Wastewater

WastHAI – Machine Learning for Novel Virus Identification from Wastewater Samples

SUCCESS STORIES
Thu 21 Aug 2025

Use Case:

Detection of novel or re-emerging viral pathogens by applying unsupervised learning and outlier detection on virus-enriched sequencing datasets derived from wastewater enabling early health surveillance.

Outcome:

  • Achieved 79% accuracy in identifying novel cold virus strains absent in reference months;
  • Novel use of alignment scores instead of raw sequences improved model robustness;
  • Successful validation of Local Outlier Factor (LOF) as a suitable ML technique for novelty detection;
  • Fully data-driven solution leveraging public NCBI virome datasets and AI-based clustering;
  • TRL 7 prototype demonstrated with clear deployment roadmap for public health utility.

Ecosystem Support:

  • Technical mentorship by AI experts;
  • Stage 2 implementation under the StairwAI program;
  • Domain-specific AI solutioning without reliance on HPC compute.

AI Relevance:

WastHAI shows how public datasets, algorithmic design, and minimal-infrastructure AI workflows can create impactful, scalable tools for health and biosecurity. It exemplifies:

  1. Domain-specific AI adoption without deep AI expertise;
  2. Interpretability of outcomes via sequence-level clustering;
  3. Novel use of outlier detection for early-warning pathogen detection systems.

Summary:

Greenseq Oy Ltd developed WastHAI, an AI-based solution for identifying novel viruses in wastewater using unsupervised machine learning. Recognizing that viral sequencing data is typically noisy and diverse, the team designed a robust pipeline leveraging public sequencing data from NCBI’s PRJNA966185 project and the Twist Biosciences Comprehensive Viral Panel. Instead of using nucleotide sequences as input—unreliable due to random alignment points—they derived virus-specific features such as alignment score distributions and sequence replication counts. Using dimensionality reduction (t-SNE) and clustering (DBSCAN), they structured the high-dimensional data, followed by applying the Local Outlier Factor (LOF) algorithm. This model identified novel winter virus strains not present in July samples with 79% accuracy, outperforming more complex classifiers like Isolation Forest or SVM. This success confirms LOF’s efficacy in detecting deviation patterns amidst genomic noise, especially due to its local neighborhood-based approach. By demonstrating a functional TRL 7 prototype, the project establishes a path for deploying AI in environmental virology, enabling early detection of health threats through low-cost, scalable, and transferable analytics. The pilot received expert technical mentoring and executed its roadmap without deviation, with commercial readiness and go-to-market planning also completed successfully.

Date modified 26.11.2025
Date Published 21.08.2025