Reliable Event Scheduling in Distributed Axoniq Systems with JobRunr

How JobRunr integrates with the Axon Framework as an event scheduler, tackling the hard parts of distributed scheduling: persistence, observability, and guaranteed execution.


This guestblog was submitted by our partners at JobRunr. Learn more about their offerings here.


If you are running Axon Framework in production, you have likely hit a point where you need to schedule something to happen in the future. Maybe a payment confirmation needs to arrive within five minutes. Maybe an order should be cancelled if it is not picked up within 24 hours. Maybe a compliance report needs to run at the end of every month.

These are event scheduling problems, and in a distributed system they are harder than they look.


The Scheduling Problem in Distributed Systems

Axon Framework provides the EventScheduler interface for exactly this purpose: schedule an event for future publication. The concept is simple. The implementation challenges are not.

In a single-server setup, you could keep scheduled events in memory. But the moment you move to a distributed architecture with multiple nodes, things get complicated:

  • Persistence: If a node goes down, in-memory schedules are lost. That payment timeout? Gone. That compliance report? Never fires.

  • Duplicate execution: With multiple nodes, how do you make sure a scheduled event fires exactly once, not once per node?

  • Observability: When you have hundreds or thousands of scheduled events across your system, how do you know what is pending, what has fired, and what failed?

These are not hypothetical concerns. They are the exact problems teams run into when they move from development to production with distributed Axon applications.


How JobRunr Solves This

JobRunr is a Java library for background job processing that persists jobs in your existing database. It handles distributed locking, automatic retries, and comes with a built-in dashboard. It also implements Axon's EventScheduler interface through the JobRunrEventScheduler, which means you can drop it into any Axon application.

When you schedule an event through JobRunr's EventScheduler, here is what happens behind the scenes:

JobRunr creates a persisted job in your database with the event payload and the scheduled time

Only one worker across your entire cluster will pick up and execute that job (no duplicates)

If a node goes down before the event fires, another node picks it up automatically

If the event handler fails, JobRunr retries it with an exponential backoff strategy

Here is a practical example. Say you want to publish a TransferDeadlineExpiredEvent if a bank transfer has not completed within five minutes:

eventScheduler.schedule(
    Duration.ofMinutes(5),
    new TransferDeadlineExpiredEvent(transferId)
);
eventScheduler.schedule(
    Duration.ofMinutes(5),
    new TransferDeadlineExpiredEvent(transferId)
);
eventScheduler.schedule(
    Duration.ofMinutes(5),
    new TransferDeadlineExpiredEvent(transferId)
);

That single line gives you a persisted, distributed, observable scheduled event. If the transfer completes before the deadline, you cancel it. If it does not, the event fires and your saga or event handler takes the appropriate compensating action.

Thanks to JobRunr, you are also able to observe the scheduled event in a dashboard that comes out of the box, on the dashboard you are able to get a concrete overview of what is happening at this time in your system. Once you fire off the scheduled event above, you’ll be able to see it in the “Scheduled” section of the dashboard.

If you click on the job, you can also get more insight as to what it will do when it triggers, which in this case is publishing the TransferDeadlineExpiredEvent.


What You Get Out of the Box

Because JobRunr was built for background job processing at scale, you get capabilities that purpose-built schedulers often lack:

Persistence in your existing database. JobRunr stores jobs in the same database your application already uses. PostgreSQL, MySQL, MariaDB, Oracle, MongoDB, and more are all supported. No separate infrastructure to maintain. If you are already running Axon with an RDBMS event store, JobRunr can share that same database cluster, keeping your operational footprint small.

A real-time dashboard. Every scheduled event is visible in JobRunr's web dashboard. You can see what is scheduled, what is processing, what succeeded, and what failed. For each job you get the full details: when it was created, when it will execute, and what event it will publish. This is not a nice-to-have, it is essential for operating a production system where you need to answer "what happened?" at 2am.

Distributed execution with single-job guarantees. JobRunr ensures that each job is executed by exactly one worker, regardless of how many application nodes you are running. No double-firing, no missed events.

Automatic retries with exponential backoff. If an event handler throws an exception, JobRunr catches it, backs off, and tries again. You configure how many retries you want. Failed jobs are visible on the dashboard with their full stack trace so you can diagnose what went wrong.

Micrometer integration. If you are already using Micrometer for application metrics, JobRunr publishes job-related metrics out of the box: queue depths, processing rates, failure counts. Plug them into Grafana, Datadog, or whatever you already use.


Beyond Event Scheduling: the DeadlineManager

For teams that need tighter integration with the Saga pattern, JobRunr Pro extends this with a full DeadlineManager implementation. Where the EventScheduler publishes events globally to all matching handlers, the DeadlineManager targets a specific saga or aggregate instance, making it the right choice for orchestrating timeouts in long-running business processes.

To give a brief overview of what a Saga pattern is, in case you aren’t familiar with it. A Saga breaks a complex business transaction into a sequence of smaller local transactions, each one of these publishes an event that causes the next step to trigger. If there is a failure in a step then the saga knows to execute compensating actions to undo what was already done.

The key advantage of JobRunr Pro's DeadlineManager over alternatives like Quartz or db-scheduler is how it handles cancellation. In a saga, every step typically schedules a deadline and then cancels it when the expected response arrives. At scale, this means thousands of cancel operations per minute. Quartz handles this by scanning the entire job store. db-scheduler serializes and loops through all tasks. JobRunr Pro uses label-based lookups, making cancellation a direct, indexed operation instead of a full scan.

A comparison of the available DeadlineManager implementations in Axon:


Implementation

Distributed

cancelAll Strategy

Monitoring

SimpleDeadlineManager

No (in-memory)

N/A

None

QuartzDeadlineManager

Possible, not default

Scans all jobs

None built-in

DbSchedulerDeadlineManager

Yes

Serializes and loops all tasks

Micrometer only

JobRunrProDeadlineManager

Yes

Direct label lookup

Dashboard + Micrometer + SSO


Setting it Up

Getting started with JobRunr's EventScheduler in an Axon application takes minimal configuration. Add the dependencies:

<dependency>
    <groupId>org.axonframework.extensions.jobrunrpro</groupId>
    <artifactId>axon-jobrunrpro-spring-boot-starter</artifactId>
    <version>${axon-jobrunrpro.version}</version>
</dependency>

<dependency>
    <groupId>org.axonframework.extensions.jobrunrpro</groupId>
    <artifactId>axon-jobrunrpro-spring-boot-starter</artifactId>
    <version>${axon-jobrunrpro.version}</version>
</dependency>

<dependency>
    <groupId>org.axonframework.extensions.jobrunrpro</groupId>
    <artifactId>axon-jobrunrpro-spring-boot-starter</artifactId>
    <version>${axon-jobrunrpro.version}</version>
</dependency>

<dependency>
    <groupId>org.jobrunr</groupId>
    <artifactId>jobrunr-pro-spring-boot-3-starter</artifactId>
    <version>${jobrunr-pro.version}</version>
</dependency>
<dependency>
    <groupId>org.jobrunr</groupId>
    <artifactId>jobrunr-pro-spring-boot-3-starter</artifactId>
    <version>${jobrunr-pro.version}</version>
</dependency>
<dependency>
    <groupId>org.jobrunr</groupId>
    <artifactId>jobrunr-pro-spring-boot-3-starter</artifactId>
    <version>${jobrunr-pro.version}</version>
</dependency>


With Spring Boot, auto-configuration handles the wiring. The extension picks up the JobScheduler bean and makes the JobRunrEventScheduler (and optionally the JobRunrProDeadlineManager) available for injection. Make sure jobrunr.background-job-server.enabled is set to true in your properties so scheduled events actually get executed.

If you are not using Spring Boot, the builder pattern works too:

DeadlineManager deadlineManager =
    JobRunrProDeadlineManager.proBuilder()
        .jobScheduler(jobScheduler)
        .storageProvider(storageProvider)
        .scopeAwareProvider(scopeAwareProvider)
        .serializer(serializer)
        .transactionManager(transactionManager)
        .spanFactory(spanFactory)
        .build();
DeadlineManager deadlineManager =
    JobRunrProDeadlineManager.proBuilder()
        .jobScheduler(jobScheduler)
        .storageProvider(storageProvider)
        .scopeAwareProvider(scopeAwareProvider)
        .serializer(serializer)
        .transactionManager(transactionManager)
        .spanFactory(spanFactory)
        .build();
DeadlineManager deadlineManager =
    JobRunrProDeadlineManager.proBuilder()
        .jobScheduler(jobScheduler)
        .storageProvider(storageProvider)
        .scopeAwareProvider(scopeAwareProvider)
        .serializer(serializer)
        .transactionManager(transactionManager)
        .spanFactory(spanFactory)
        .build();


A Complete Example: Payment Transfer Saga

To bring everything together, here is a payment transfer saga that uses deadlines at every critical step. Each saga step schedules a deadline. If the expected event arrives in time, the deadline is cancelled and the next step begins. If it does not, the deadline fires and a compensating action rolls back what was already done.

@Saga
@JsonAutoDetect(fieldVisibility = JsonAutoDetect.Visibility.ANY)
public class PaymentTransferSaga {
    @Autowired
    private transient CommandGateway commandGateway;
    @Autowired
    private transient DeadlineManager deadlineManager;

    private String transferId;
    private BigDecimal amount;
    private String sourceAccount;

    @StartSaga
    @SagaEventHandler(associationProperty = "transferId")
    public void on(TransferInitiatedEvent event) {
        this.transferId = event.transferId();
        this.amount = event.amount();
        this.sourceAccount = event.sourceAccount();

        commandGateway.send(new ReserveFundsCommand(sourceAccount, amount));
        deadlineManager.schedule(Duration.ofSeconds(30), "funds-reservation-deadline");
    }

    @SagaEventHandler(associationProperty = "transferId")
    public void on(FundsReservedEvent event) {
        deadlineManager.cancelAll("funds-reservation-deadline");
        commandGateway.send(new ScreenTransactionCommand(transferId, amount));
        deadlineManager.schedule(Duration.ofMinutes(2), "fraud-check-deadline");
    }

    @SagaEventHandler(associationProperty = "transferId")
    public void on(FraudCheckPassedEvent event) {
        deadlineManager.cancelAll("fraud-check-deadline");
        commandGateway.send(new ExecuteTransferCommand(transferId));
        deadlineManager.schedule(Duration.ofMinutes(5), "settlement-deadline");
    }

    @SagaEventHandler(associationProperty = "transferId")
    public void on(TransferCompletedEvent event) {
        deadlineManager.cancelAll("settlement-deadline");
        SagaLifecycle.end();
    }

    @DeadlineHandler(deadlineName = "funds-reservation-deadline")
    public void onFundsTimeout() {
        commandGateway.send(new FailTransferCommand(transferId, "Funds reservation timed out"));
        SagaLifecycle.end();
    }

    @DeadlineHandler(deadlineName = "fraud-check-deadline")
    public void onFraudCheckTimeout() {
        commandGateway.send(new ReleaseFundsCommand(sourceAccount, amount));
        commandGateway.send(new FailTransferCommand(transferId, "Fraud check timed out"));
        SagaLifecycle.end();
    }

    @DeadlineHandler(deadlineName = "settlement-deadline")
    public void onSettlementTimeout() {
        commandGateway.send(new ReleaseFundsCommand(sourceAccount, amount));
        commandGateway.send(new FailTransferCommand(transferId, "Settlement timed out"));
        SagaLifecycle.end();
    }
}
@Saga
@JsonAutoDetect(fieldVisibility = JsonAutoDetect.Visibility.ANY)
public class PaymentTransferSaga {
    @Autowired
    private transient CommandGateway commandGateway;
    @Autowired
    private transient DeadlineManager deadlineManager;

    private String transferId;
    private BigDecimal amount;
    private String sourceAccount;

    @StartSaga
    @SagaEventHandler(associationProperty = "transferId")
    public void on(TransferInitiatedEvent event) {
        this.transferId = event.transferId();
        this.amount = event.amount();
        this.sourceAccount = event.sourceAccount();

        commandGateway.send(new ReserveFundsCommand(sourceAccount, amount));
        deadlineManager.schedule(Duration.ofSeconds(30), "funds-reservation-deadline");
    }

    @SagaEventHandler(associationProperty = "transferId")
    public void on(FundsReservedEvent event) {
        deadlineManager.cancelAll("funds-reservation-deadline");
        commandGateway.send(new ScreenTransactionCommand(transferId, amount));
        deadlineManager.schedule(Duration.ofMinutes(2), "fraud-check-deadline");
    }

    @SagaEventHandler(associationProperty = "transferId")
    public void on(FraudCheckPassedEvent event) {
        deadlineManager.cancelAll("fraud-check-deadline");
        commandGateway.send(new ExecuteTransferCommand(transferId));
        deadlineManager.schedule(Duration.ofMinutes(5), "settlement-deadline");
    }

    @SagaEventHandler(associationProperty = "transferId")
    public void on(TransferCompletedEvent event) {
        deadlineManager.cancelAll("settlement-deadline");
        SagaLifecycle.end();
    }

    @DeadlineHandler(deadlineName = "funds-reservation-deadline")
    public void onFundsTimeout() {
        commandGateway.send(new FailTransferCommand(transferId, "Funds reservation timed out"));
        SagaLifecycle.end();
    }

    @DeadlineHandler(deadlineName = "fraud-check-deadline")
    public void onFraudCheckTimeout() {
        commandGateway.send(new ReleaseFundsCommand(sourceAccount, amount));
        commandGateway.send(new FailTransferCommand(transferId, "Fraud check timed out"));
        SagaLifecycle.end();
    }

    @DeadlineHandler(deadlineName = "settlement-deadline")
    public void onSettlementTimeout() {
        commandGateway.send(new ReleaseFundsCommand(sourceAccount, amount));
        commandGateway.send(new FailTransferCommand(transferId, "Settlement timed out"));
        SagaLifecycle.end();
    }
}
@Saga
@JsonAutoDetect(fieldVisibility = JsonAutoDetect.Visibility.ANY)
public class PaymentTransferSaga {
    @Autowired
    private transient CommandGateway commandGateway;
    @Autowired
    private transient DeadlineManager deadlineManager;

    private String transferId;
    private BigDecimal amount;
    private String sourceAccount;

    @StartSaga
    @SagaEventHandler(associationProperty = "transferId")
    public void on(TransferInitiatedEvent event) {
        this.transferId = event.transferId();
        this.amount = event.amount();
        this.sourceAccount = event.sourceAccount();

        commandGateway.send(new ReserveFundsCommand(sourceAccount, amount));
        deadlineManager.schedule(Duration.ofSeconds(30), "funds-reservation-deadline");
    }

    @SagaEventHandler(associationProperty = "transferId")
    public void on(FundsReservedEvent event) {
        deadlineManager.cancelAll("funds-reservation-deadline");
        commandGateway.send(new ScreenTransactionCommand(transferId, amount));
        deadlineManager.schedule(Duration.ofMinutes(2), "fraud-check-deadline");
    }

    @SagaEventHandler(associationProperty = "transferId")
    public void on(FraudCheckPassedEvent event) {
        deadlineManager.cancelAll("fraud-check-deadline");
        commandGateway.send(new ExecuteTransferCommand(transferId));
        deadlineManager.schedule(Duration.ofMinutes(5), "settlement-deadline");
    }

    @SagaEventHandler(associationProperty = "transferId")
    public void on(TransferCompletedEvent event) {
        deadlineManager.cancelAll("settlement-deadline");
        SagaLifecycle.end();
    }

    @DeadlineHandler(deadlineName = "funds-reservation-deadline")
    public void onFundsTimeout() {
        commandGateway.send(new FailTransferCommand(transferId, "Funds reservation timed out"));
        SagaLifecycle.end();
    }

    @DeadlineHandler(deadlineName = "fraud-check-deadline")
    public void onFraudCheckTimeout() {
        commandGateway.send(new ReleaseFundsCommand(sourceAccount, amount));
        commandGateway.send(new FailTransferCommand(transferId, "Fraud check timed out"));
        SagaLifecycle.end();
    }

    @DeadlineHandler(deadlineName = "settlement-deadline")
    public void onSettlementTimeout() {
        commandGateway.send(new ReleaseFundsCommand(sourceAccount, amount));
        commandGateway.send(new FailTransferCommand(transferId, "Settlement timed out"));
        SagaLifecycle.end();
    }
}


In a system processing 1,000 transfers per minute, this saga creates up to 3,000 deadlines and 3,000 cancellations per minute. The difference between scanning the entire job store for each cancellation versus doing a direct label lookup is the difference between a system that scales and one that does not.

When you are trying this yourself, take a look at the dashboard, watch the events going from scheduled to deleted if everything goes well, and watch them execute if you hit the timeout before the previous step completes.


Why Does This Matter?

When you are giving a demo, the happy path is easy. Everything happens as expected, services respond, events arrive, deadlines get cancelled. But the real world is not as clean as a demo environment, it’s messy, services go down, things don’t get cancelled, and then what? The interesting architecture decisions are made for the failure cases and in industries with high regulation, those failures aren’t just theoretical, they’re what the auditors ask about first.

Combining JobRunr for deadline management and Axon Framework’s event sourcing, we get a powerful solution that gives you an architecture where no matter what happens, everything is traceable and happens as expected, when expected.


Try it Yourself

Clone the demo repository to see this in action. The README walks you through running the example and watching jobs flow through the dashboard in real time.

For more detail on the integration, you can also read our companion article Axon Framework + JobRunr Pro: Saga deadlines done right on the JobRunr blog, which includes a full video walkthrough.


Resources:

Axon Framework deadline managers reference

Event schedulers in Axon Framework

JobRunr Pro Axon extension documentation

JobRunr Pro documentation


Join the Thousands of Developers

Already Building with Axon in Open Source

Join the Thousands of Developers

Already Building with Axon in Open Source

Join the Thousands of Developers

Already Building with Axon in Open Source