Managing large data sets in reactive way in Java and Vert.x

Take a real-life example: ride-hauling application, like Uber, Yandex.Taxi or Lyft. In such kind of project, we need to store rides’ data, as a current status or location, and this information must be updated constantly. But how? In this article we explore how to simplify a data-management in reactive applications with DaoFactory and Vertx framework.

Do you want to learn more about Vert.x framework and Java microservices? Take a look on the complete list of my Vertx tutorials and case studies.

Problem

Assume for start, that we store all our rides as rows in some relational database. When something is changed (e.g. new driver is assigned, ride is finished, location is updated), we update affected rows. When we have just a few rides in our database, the task is trivial, but when we have hundreds or thousands rides that are online in a same time? How can we update them in a fast way?

Divide and conquer

A logical decision is to divide all rides into two sets: all rides and online rides. Obviously, not all rides should be updated fast. Some rides are in progress, and updating them quickly is crucial. But most rides are archived, or finished and we don’t need them in the working dataset. Pay your attention on a graph:

Graph 1. See rides as two datasets – online rides are a subset of all rides

As you can see, all rides constitute one big set of rides objects, and online rides are a smaller part of it. That means we can create a database to store all ride objects’ data and exclude online rides in a separate dataset in order to manage operations. We can use a classical relational database to store all rides and a much faster Redis for online rides. Consider an another graph below:

Graph 2

This representation shows that we have two datasets, that both expose same Entity – Ride as a domain object, so we can call it and treat, despite of which dataset it belongs to, in a same way. But how to switch between datasets? Use DAOFactory pattern, as I show you below.

Do it in Java

Suppose, we need to update a ride – change its status (or mark it finished) or assign a driver. Let see how to it with Vert.x (by the way, if you are not familiar with Vert.x microservices, check my free tutorial on it). Take a look on the code snippet below:

private void updateRide (Message<Object> message){
        JsonObject request = JsonObject.mapFrom(message.body());
        String rideId = request.getString("ride_id");
        String status = request.getString("status");
        //..
        if (status.equalsIgnoreCase("finished")){
            vertx.executeBlocking(future->{
                try{
                    Ride ride = daoFactory.get(TEMP_DAO_ID).findRideById(rideId);
                    daoFactory.get(TEMP_DAO_ID).remove(rideId);
                    daoFactory.get(PERM_DAO_ID).add(ride);
                    future.complete();
                } catch (Exception ex){
                    future.fail(ex);
                } 
            }, result->{
                if (result.failed()){
                    //..
                }
            });
        } else if(status.equalsIgnoreCase("assigned")){
            String driverId  = request.getString("driver_id");
            vertx.executeBlocking(future->{
                try{
                    daoFactory.get(TEMP_DAO_ID).assignDriver(rideId, driverId);
                    future.complete();
                } catch (Exception ex){
                    future.fail(ex);
                } 
            }, result->{
                if (result.failed()){
                    //..
                }
            });
        } else {
            vertx.executeBlocking(future->{
                try{
                    daoFactory.get(TEMP_DAO_ID).updateStatus(rideId, status);
                    future.complete();
                } catch (Exception ex){
                    future.fail(ex);
                } 
            }, result->{
                if (result.failed()){
                    //..
                }
            });
        }
    }

This code represents an updating mechanism. Let see what it does:

  1. Check if the ride’s status is finished. In this case we don’t need it in “fast” storage – we delete it and then save it to permanent storage (an archive).
  2. Check if the ride’s status is assigned. In this case we assign a driver to the ride
  3. In all other cases we assume that ride is not finished, therefore is online and we update temporal storage.

DAOFactory pattern in a glance

The code above is an example of the DaoFactory. Let deconstruct it – take a look on a graph:

Graph 3. A sample representation of DaoFactory pattern

As you can see we have two actors: DAO itself and a DAOFactory. DAO represents an data access object, so it has all necessary methods we wait from it to work with data. We build one additional level of abstraction and encapsulate a lower level logic: depend on a caller’s logic we may need one type of DAO or another, e.g. MySQL or Redis. We use a get(type) method, as you can see from the code:

daoFactory.get("type").<here goes DAO's method>

Of course, we should provide interfaces for both classes, so we can easily mock them during a test stage and interchange quickly.

Never, I repeat, NEVER trust a lower level to decide which type of storage to choose. This is an idea of encapsulation! From higher to lower! This decision can be done ONLY by caller. The whole idea of this pattern is to encapsulate an implementation logic, but it SHOULD NOT know anything about a decision logic!

Conslusion

Ok, we observed how to deal with a problem of large datasets in fast reactive manner – just DIVIDE and CONQUER. Use DaoFactory pattern to simplify your life. We also observed how to implement it in Vertx applications.

Do you want to learn more about Vert.x framework and Java microservices? Take a look on the complete list of my Vertx tutorials and case studies.

References

  • James Carman. Write once, persist anywhere (2002). JavaWorld Read
  • Nicolas Gerard. Abstract Factory Pattern Component Code (2017). Read