Event sourcing is not scary

Introduction

Using the wrong tools can bite you. I remember trying to drill holes in the ceiling for a light fixture using a simple cordless drill. It always took too long, and I sometimes needed the help of a hammer to finish the job. Everything changed when I got a hammer drill, making it much more manageable. But I’ll never forget the time I needed to drill a hole in a wall and didn’t check what kind of wall it was. Suddenly, I had an enormous hole in the wall from using the hammer drill. Using the proper tool for the job is vital if you want good results.

Before I joined AxonIQ, I mainly worked with traditional CRUD apps, sometimes partly using an event-driven architecture. An application can become more event-driven by slowly replacing how specific persistent data structures are changed. Instead of changing data depending on a REST call by another application, it can update the data by listening to an event stream. While event streaming seems to have a lot of traction, many developers still hesitate to use event sourcing.

Moving towards event sourcing might seem overly complicated. Starting from an application where the database is the source of truth, moving to event sourcing requires some changes. I struggled trying to do event sourcing using Kafka. It didn’t seem to fit. Once I learned about and started using the Axon Framework, it became much easier to build an event-sourcing application. Many of the concerns I had with my earlier attempts no longer applied. It was like using the hammer drill for the first time. Using the proper tools can make a lot of difference.

Axon Framework can make it much easier to implement event sourcing. The framework, created over ten years ago, was written in Java. To make it even easier to use the framework, support for Spring Boot was later added. And, many extensions are available, for example, to use MongoDB to create and query projections or to use Kafka as a distributed event bus. I will explain some of the characteristics of the Axon Framework and how it solves the challenges related to event sourcing.

Different message types

One of the most helpful things Axon Framework does is to make a distinction between different message types. The most important ones for event sourcing are the event and command messages. You likely are already familiar with event messages. They contain the description of an event that happened in the past. Since this will never change, we can store them forever. Because we keep events forever, it’s essential to store them so we can deserialize them later. With event streaming, events are usually the only kind of message.

There needs to be something that triggers the creation of those events. One of the ways to do this is through database changes. It allows a CRUD application to become the source of events. In this case, we don’t use event sourcing since the source of truth is still the state of the database, not the events themselves. The quality of these events can also be questionable. It will tell you exactly what changed but not why it changed.

In addition to event sourcing, publishing events without considering the existing events is still possible. An example is when a deadline is triggered. Most of the time you want to ensure some level of consistency. In the typical example of a bank transaction, you need to avoid over-drafting. With multiple applications running at the same time, this is not trivial. We could have multiple ‘TransactionStarted’ events, leading to an incorrect state.

To break down this complexity, we make use of a command message. A command message is a message that intends to create one or more events. The command message could be a ‘StartTransaction’ command. Compared to the event message, command messages are different in several ways. For one, you might want to know if the command succeeded. A command message should also be sent to only one application instance, while event messages might need to be sent to multiple instances.

The third message type is queries, although they don’t matter for the event sourcing part we are focusing on now. Query messages can retrieve information from a projection. You don’t need to know which application can give you the result. Axon Server can efficiently route all these message types. The routing also brings locational transparency, as each application only needs to connect to an Axon Server instance to send and receive messages.

Aggregate

Before diving deeper into how Axon Framework will solve the problems associated with event sourcing, I need to explain the concept of an aggregate. It’s neither feasible nor scalable to replay all events whenever we want to add another event. Therefore we need to split the events into groups to ensure consistency for each group of events.

A group of events can be considered an aggregate. The group could be all the events related to one account in the financial domain. This group could contain events such as an account was created, a transaction was started, or a transaction was rolled back. But this would only pertain to the events directly related to one account.

In Axon Framework, a Java class can represent an aggregate. It’s possible to replay all the events belonging to an instance of an aggregate using this class to construct the current state. The state is used to validate command messages. If the command is valid, one or more events can be added.

How to tackle event sourcing challenges?

With some of the building blocks explained, we can now look at different challenges with event sourcing and how to tackle them. Although Axon Framework has implementations for these, it’s possible to implement the same principles yourself.

Being able to read old events

For both the write and the read side, it’s crucial to be able to read old events. For the write side, it might be necessary to change the aggregate structure. This means we can’t rely upon snapshots and must recreate the aggregate state from all the events. We might need a new projection for the read side, which needs old events.

Ideally, updates to the event structure are backward compatible and only add properties. When combined with a serializer, such as Jackson which can handle missing fields, we can easily change the format of the events. Sometimes backward incompatible changes might be needed. In that case, Axon Framework provides Upcasters to transform the raw form of the events and make them compatible with the new ones.

As an application runs for several years, the list of upcasters could grow. Changing the format of the stored events can be done with event transformation. Once the events are changed, you no longer need the upcasters.

Consistency control

Since the events are the source of truth, we need mechanisms in place to keep the events consistent. We can’t do this afterward because deleting events would mean some instances would already have read the event and others would not, leading to inconsistencies. So we need to know each event is valid when published, while at the same time, we want to be able to publish events from multiple instances.

All the events of an aggregate should have an increasing sequence number. Optimistic locking based on these will prevent concurrent changes to the same aggregate from multiple instances. Optimistic locking will work for all event store implementations of the framework. A distributed command bus will route command messages from the same aggregate to the same instance, making consistency errors less likely. One distributed command bus implementation relies on Axon Server, but extensions offer other alternative implementations.

It’s possible to use optimistic locking directly on the database to achieve a similar solution. However, this can be tricky and hard to test. Kafka supports transactions, but we can’t use them to implement optimistic locking.

Quickly rebuilding the aggregate state

We need to build the aggregate state quickly to keep the application performant. One of the best ways to do this is by having small aggregates. For example, in the banking domain, if all transactions belong to the same aggregate it will grow quickly. We would need to rebuild the state for all accounts for any command message related to accounts.

The first way the framework helps to quickly rebuild the aggregate state is to allow abstractions to get only the events belonging to the aggregate instance. Another way is to configure a cache. If the aggregate state is still in the cache, it only needs to get the events published after it was cached. There are likely no new events when using Axon Server because of the smart routing.

As the cache is typically only local in memory, the first time an instance receives a command message for a certain aggregate, it still needs to read all the events belonging to it. A common way in event sourcing to solve this issue is to ‘close the books’ at specific intervals. If you close the books, you no longer look at all the events but start from the last time you closed the books. Closing the books creates a type of summary that is enough to validate commands. In Axon Framework, closing the books is supported via snapshots.

A snapshot contains the serialized state of an aggregate. Multiple options exist to set a trigger to create a snapshot. One such trigger is the time it takes to build the aggregate state.

Steps are involved in implementing snapshots such as first trying if there is already a snapshot before getting all the events. The Axon Framework supports storing snapshots in Axon Server, relational databases, and MongoDB. If you use Spring Boot you only need to add an annotation and ensure the aggregate state is serializable to start using snapshots.

Reacting to the passing of time

Sometimes, you want to respond to the passing of time. For example, once someone starts a checkout at an online store you want to reserve the items in the basket. But if the checkout takes too long, the items should be released. Getting the correct state again is a challenge compared to a CRUD app.

Axon Framework solves this problem with a deadline handler. A deadline manager implementation typically uses a database for durability. Several implementations are available, such as db-scheduler and JobRunner. Together with specific handlers for when the deadline is triggered, the framework provides the aggregate state without needing additional code. It’s also easy to cancel deadlines, for example, when the checkout succeeded and the items are ready to be shipped.

The need for unlimited storage

Since the key concept of event sourcing is to have only the events as the source of truth, we need to keep every event from the first time an application was deployed to production. This can be especially scary when you are responsible for running the applications. It’s also the case that most databases tend to get slower when they increase in size.

This challenge is one of the main reasons why Axon Server exists. Taking advantage of how the data grows, it keeps performant, no matter how many events are stored. Axon Server makes it possible to have active or passive backup nodes. You can have an up-to-date backup in case of failure this way. One of the things recently introduced to Axon Server is tiered storage. This allows you to use cheaper storage for old events or even remove old events. This must be done cautiously, but if the aggregates don’t live long, this is done easily and safely.

If you don’t use Axon Server, things become a bit harder. You could still resort to deleting part of the events. If you notice degrading performance, migrating the events to Axon Server is easy with the migration tool.

Testing aggregate logic

Writing good tests can be complicated for stateful applications. You need to have a way to either run an in-memory version of the thing you use to keep the state, or the actual thing, for example, using Docker.

Because of the isolation level of the aggregate, we can use test fixtures that help test the aggregates' validity. With these, we can load an aggregate, add existing events, and test assumptions when handling a command. Since it’s all in memory these tests will run very fast. In case of a production issue, the first step might be to create a test that replays the bug.

Building test fixtures yourself takes effort, and you might wonder if the test or the text fixture is correct when there are failures. The test fixtures in the Axon Framework are battle tested. They integrate with the other solutions, so also make it easy to test deadlines.

Conclusion

There is more to Axon Framework than just the tools for event sourcing. Another big part is building projections using event handlers and querying those using query messages. It can be daunting, but you can focus on the event sourcing parts when starting.

If you want to add an Axon Framework application to an existing architecture, there are many ways to get started. You don’t have to change all of your applications to Axon applications. There are several extensions, like the Kafka extension, that make it easy to share events with other applications. You can build an event sourcing application for new functionality, and some events in the application can be shared with the rest of the company if needed. This way you get the best of event sourcing and event streaming. One of the best ways to learn more about Axon Framework is with the two free courses available on AxonIQ Academy. Since it’s open source you can also dive into the code on GitHub.

Gerard Klijs
Software Engineer. With over 10 years of experience as a backend engineer, Gerard Klijs is a contributor to several GraphQL libraries, and also the creator and maintainer of a Rust library to use Confluent Schema Registry. He has an interest in event sourcing and CQRS and likes sharing knowledge via blogs, talks, and demo projects.
Gerard Klijs

Share: