Reliable Event Scheduling in Distributed Axoniq Systems with JobRunr
How JobRunr integrates with the Axon Framework as an event scheduler, tackling the hard parts of distributed scheduling: persistence, observability, and guaranteed execution.
This guestblog was submitted by our partners at JobRunr. Learn more about their offerings here.
If you are running Axon Framework in production, you have likely hit a point where you need to schedule something to happen in the future. Maybe a payment confirmation needs to arrive within five minutes. Maybe an order should be cancelled if it is not picked up within 24 hours. Maybe a compliance report needs to run at the end of every month.
These are event scheduling problems, and in a distributed system they are harder than they look.
The Scheduling Problem in Distributed Systems
Axon Framework provides the EventScheduler interface for exactly this purpose: schedule an event for future publication. The concept is simple. The implementation challenges are not.
In a single-server setup, you could keep scheduled events in memory. But the moment you move to a distributed architecture with multiple nodes, things get complicated:
Persistence: If a node goes down, in-memory schedules are lost. That payment timeout? Gone. That compliance report? Never fires.
Duplicate execution: With multiple nodes, how do you make sure a scheduled event fires exactly once, not once per node?
Observability: When you have hundreds or thousands of scheduled events across your system, how do you know what is pending, what has fired, and what failed?
These are not hypothetical concerns. They are the exact problems teams run into when they move from development to production with distributed Axon applications.
How JobRunr Solves This
JobRunr is a Java library for background job processing that persists jobs in your existing database. It handles distributed locking, automatic retries, and comes with a built-in dashboard. It also implements Axon's EventScheduler interface through the JobRunrEventScheduler, which means you can drop it into any Axon application.
When you schedule an event through JobRunr's EventScheduler, here is what happens behind the scenes:
JobRunr creates a persisted job in your database with the event payload and the scheduled time
Only one worker across your entire cluster will pick up and execute that job (no duplicates)
If a node goes down before the event fires, another node picks it up automatically
If the event handler fails, JobRunr retries it with an exponential backoff strategy
Here is a practical example. Say you want to publish a TransferDeadlineExpiredEvent if a bank transfer has not completed within five minutes:
That single line gives you a persisted, distributed, observable scheduled event. If the transfer completes before the deadline, you cancel it. If it does not, the event fires and your saga or event handler takes the appropriate compensating action.
Thanks to JobRunr, you are also able to observe the scheduled event in a dashboard that comes out of the box, on the dashboard you are able to get a concrete overview of what is happening at this time in your system. Once you fire off the scheduled event above, you’ll be able to see it in the “Scheduled” section of the dashboard.

If you click on the job, you can also get more insight as to what it will do when it triggers, which in this case is publishing the TransferDeadlineExpiredEvent.

What You Get Out of the Box
Because JobRunr was built for background job processing at scale, you get capabilities that purpose-built schedulers often lack:
Persistence in your existing database. JobRunr stores jobs in the same database your application already uses. PostgreSQL, MySQL, MariaDB, Oracle, MongoDB, and more are all supported. No separate infrastructure to maintain. If you are already running Axon with an RDBMS event store, JobRunr can share that same database cluster, keeping your operational footprint small.
A real-time dashboard. Every scheduled event is visible in JobRunr's web dashboard. You can see what is scheduled, what is processing, what succeeded, and what failed. For each job you get the full details: when it was created, when it will execute, and what event it will publish. This is not a nice-to-have, it is essential for operating a production system where you need to answer "what happened?" at 2am.
Distributed execution with single-job guarantees. JobRunr ensures that each job is executed by exactly one worker, regardless of how many application nodes you are running. No double-firing, no missed events.
Automatic retries with exponential backoff. If an event handler throws an exception, JobRunr catches it, backs off, and tries again. You configure how many retries you want. Failed jobs are visible on the dashboard with their full stack trace so you can diagnose what went wrong.
Micrometer integration. If you are already using Micrometer for application metrics, JobRunr publishes job-related metrics out of the box: queue depths, processing rates, failure counts. Plug them into Grafana, Datadog, or whatever you already use.
Beyond Event Scheduling: the DeadlineManager
For teams that need tighter integration with the Saga pattern, JobRunr Pro extends this with a full DeadlineManager implementation. Where the EventScheduler publishes events globally to all matching handlers, the DeadlineManager targets a specific saga or aggregate instance, making it the right choice for orchestrating timeouts in long-running business processes.
To give a brief overview of what a Saga pattern is, in case you aren’t familiar with it. A Saga breaks a complex business transaction into a sequence of smaller local transactions, each one of these publishes an event that causes the next step to trigger. If there is a failure in a step then the saga knows to execute compensating actions to undo what was already done.
The key advantage of JobRunr Pro's DeadlineManager over alternatives like Quartz or db-scheduler is how it handles cancellation. In a saga, every step typically schedules a deadline and then cancels it when the expected response arrives. At scale, this means thousands of cancel operations per minute. Quartz handles this by scanning the entire job store. db-scheduler serializes and loops through all tasks. JobRunr Pro uses label-based lookups, making cancellation a direct, indexed operation instead of a full scan.
A comparison of the available DeadlineManager implementations in Axon:
Implementation | Distributed | cancelAll Strategy | Monitoring |
|---|---|---|---|
SimpleDeadlineManager | No (in-memory) | N/A | None |
QuartzDeadlineManager | Possible, not default | Scans all jobs | None built-in |
DbSchedulerDeadlineManager | Yes | Serializes and loops all tasks | Micrometer only |
JobRunrProDeadlineManager | Yes | Direct label lookup | Dashboard + Micrometer + SSO |
Setting it Up
Getting started with JobRunr's EventScheduler in an Axon application takes minimal configuration. Add the dependencies:
With Spring Boot, auto-configuration handles the wiring. The extension picks up the JobScheduler bean and makes the JobRunrEventScheduler (and optionally the JobRunrProDeadlineManager) available for injection. Make sure jobrunr.background-job-server.enabled is set to true in your properties so scheduled events actually get executed.
If you are not using Spring Boot, the builder pattern works too:
A Complete Example: Payment Transfer Saga
To bring everything together, here is a payment transfer saga that uses deadlines at every critical step. Each saga step schedules a deadline. If the expected event arrives in time, the deadline is cancelled and the next step begins. If it does not, the deadline fires and a compensating action rolls back what was already done.
In a system processing 1,000 transfers per minute, this saga creates up to 3,000 deadlines and 3,000 cancellations per minute. The difference between scanning the entire job store for each cancellation versus doing a direct label lookup is the difference between a system that scales and one that does not.
When you are trying this yourself, take a look at the dashboard, watch the events going from scheduled to deleted if everything goes well, and watch them execute if you hit the timeout before the previous step completes.
Why Does This Matter?
When you are giving a demo, the happy path is easy. Everything happens as expected, services respond, events arrive, deadlines get cancelled. But the real world is not as clean as a demo environment, it’s messy, services go down, things don’t get cancelled, and then what? The interesting architecture decisions are made for the failure cases and in industries with high regulation, those failures aren’t just theoretical, they’re what the auditors ask about first.
Combining JobRunr for deadline management and Axon Framework’s event sourcing, we get a powerful solution that gives you an architecture where no matter what happens, everything is traceable and happens as expected, when expected.
Try it Yourself
Clone the demo repository to see this in action. The README walks you through running the example and watching jobs flow through the dashboard in real time.
For more detail on the integration, you can also read our companion article Axon Framework + JobRunr Pro: Saga deadlines done right on the JobRunr blog, which includes a full video walkthrough.
Resources:
Axon Framework deadline managers reference
Event schedulers in Axon Framework
JobRunr Pro Axon extension documentation

