EVENFLOW Scalability Toolkit
Scalable AI infrastructure for high-volume data environments
What is it about?
The Scalability Toolkit is a set of four open-source components that are designed to enhance the scalability and efficiency of Machine Learning (ML) workflows. The toolkit is specifically designed for distributed and high-volume environments, and it enables streamlined training and prediction. Its four integrated components are:
Synopses-based Training Optimisation, which accelerates ML training using data summaries and Bayesian optimisation.
Synopses Data Engine as a Service (SDEaaS), which provides efficient stream processing on Apache Flink and Dask for real-time data summarisation.
Advanced Distributed-Parallel Training, which reduces training time and communication lag via data-driven synchronisation.
SuBiTO, which integrates all components into one solution for production-grade ML pipelines.
Who is it for?
Data Scientists and ML Engineers who handle large datasets (or even continuous data streams)
Enterprises that deploy AI models, at scale, in production environments
Research labs who work on AI model efficiency and scalability
DevOps teams who want to reduce infrastructure load during AI training
Why use it?
Reduces training times without sacrificing model accuracy
Improves resource efficiency and CPU usage
Compatible with widely used platforms (Apache Flink, Kafka)
The modular design enables using one or all components, depending on the users’ need
Each component is accompanied by:
Implementation documentation
APIs and use examples
Instructions for integrating with real-time data systems

