Supercharging machine learning with event sourcing

The evolving world of machine learning is deeply rooted in data. The more high-quality data we have, the better our models perform. But how can we ensure continuous access to vast, relevant datasets? Enter event sourcing—a method that's proving to be a goldmine for machine learning enthusiasts.

Understanding event sourcing

Before diving deep, let's demystify event sourcing. It's essentially a design paradigm where changes to an application's state are captured as a series of events. These events, cataloged over time, offer a treasure trove of data. But how does this aid machine learning model training? Let's explore.

1. Vast data reservoirs

With event sourcing, data is constantly recorded in an event log. This log documents numerous operational parameters and system interactions. As time progresses, this log burgeons, forming a vast repository of data. For machine learning practitioners, this translates into an ever-growing pool of information to train their models.

2. Ready availability of crucial data

Beyond sheer volume, event sourcing ensures access to crucial data. Consider a scenario where a company uses event logs from an ERP system and a work logging system to train a model. With event sourcing, they have every piece of vital data at their disposal, bypassing the hefty cost and complexity of gathering data through other means.

3. Generation of quality training datasets

A continually expanding event log provides a basis for growing training datasets. As the log swells, the data it houses becomes more comprehensive and diverse, driving further improvements in model accuracy.

4. Real-time model refinement

An exciting dimension of event sourcing is the possibility of real-time feedback. As the models churn out predictions, these can be juxtaposed with actual outcomes in the event source. This comparison creates a feedback loop, allowing models to refine themselves continuously based on fresh data and real-world feedback.

However, it's not all rosy

While event sourcing presents a promising picture, there are some hiccups to consider:

  • Data duplication: The method could lead to increased disk space usage because of data duplication.
  • Scalability concerns: With a high number of concurrent users, scalability might become an issue.
  • Bootup delays: There might be longer startup times due to the vast amounts of data.
  • Memory usage: High memory consumption can be a challenge.

However, solutions like Axon Framework and Axon Server are designed to mitigate many of these issues. Axon offers features like event upcasting, horizontal scalability, snapshotting, and efficient caching. This makes it more feasible to adopt event sourcing, as challenges are addressed with modern, scalable solutions.

In a nutshell

Event sourcing has emerged as a formidable ally for machine learning model training. By continuously capturing and cataloging vast amounts of high-quality data, and allowing for on-the-fly refinements, it promises a brighter, more accurate future for machine learning. And while there are challenges to grapple with, the overwhelming benefits make event sourcing worth the effort.

Stefan Dragisic
Senior Software Engineer Stefan has years of experience and passion to software architecture, reactive programming and JVM technologies.
Stefan Dragisic