Batch processing in Hibernate – Java’s ORM

Developers are often facing the problem of inserting large data, so this article could help them to
see the ways how to do it in proper way in order to avoid common-mistakes.

 

This is the way how batch processing shouldn’t be done:

public <T> void insertBatch(List<T> entities){
    EntityManager em = DatabaseProvider.getEntityManager();
    EntityTransaction transaction = em.getTransaction();
   
    try {
        transaction.begin();
        for (int i = 0; i < entities.size(); i++){
            em.persist(entities.get(i));
        }
        transaction.commit();
    } catch (Exception ex){
        ex.printStackTrace();
        transaction.rollback();
    }
}

 

There are several reasons why we should not use this approach:

  1. In case of large collection of entities – e.g. 1 million? This can cause the OutOfMemory error in the persistence context and we have not persisted anything in the database.
  2. Long-running operation which reserves one connection.
  3. In case that we need at least some of the information to be stored, this will not fit our expectations. In case of a failure, none of the records will be inserted.

 

So, the better approach would be:

public <T> void batchInsertOneTransaction(List<T> entities){
    EntityManager entityManager = DatabaseProvider.getEntityManager();
    EntityTransaction transaction = entityManager.getTransaction();
    try {
        transaction.begin();
        for (int i = 0; i < entities.size(); i++){
            entityManager.persist(entities.get(i));
            if (i > 0 && i % 50 == 0){
                entityManager.flush();
                entityManager.clear();
            }
        }
        transaction.commit();
    } catch (Exception ex){
        transaction.rollback();
        ex.printStackTrace();
    }
}

 

The reasons why this is a better approach are:

  1. Out of memory cannot happen because we clear our persistence context after we flush the data into database.
  2. This approach will save all the data, but in case of a failure, it will not save any data.

 

Another solution would contain more transactions:

public <T> void batchInsertMultiTransaction(List<T> entities){
    EntityManager entityManager = DatabaseProvider.getEntityManager();
    EntityTransaction transaction = entityManager.getTransaction();
    try {
        transaction.begin();
        for (int i = 0; i < entities.size(); i++){
            try {
                entityManager.persist(entities.get(i));
                if (i > 0 && i % 50 == 0){
                    entityManager.flush();
                    entityManager.clear();
                    transaction.commit();
                    transaction.begin();
                }
            } catch (Exception ex){
                transaction.rollback();
                ex.printStackTrace();
            }
        }
    } finally {
        //if the number of entities is not divisable by 50
        try {
            entityManager.flush();
            entityManager.clear();
            transaction.commit();
        } catch (Exception ex){
            transaction.rollback();
            ex.printStackTrace();
        }
    }
}

 

Advantage:

  1. In case of a failure, this approach will store some of the data which should be taken with awareness. In case that we need the exact order of the data, or if all data needs to be stored, or the application depends on the accumulation field, this has to be done with the previous algorithm.
  2. Beside that, with this approach you can mark the failed data and after cleaning them you can insert it again

 

Further:

When it comes to performance, if it’s low, writing native SQL is always a good option, but there is one requirement we need to fulfill.
You have to add the parameter “?rewriteBatchedStatements=true” to the connection url. The connection url will look like
jdbc:mysql://localhost:3306/simple_task_batch?rewriteBatchedStatements=true

Let’s assume that we have a specific entity Person.
This will produce the batch insert to be like

INSERT INTO person (first_name,last_name) VALUES ("FIRST NAME 1", "LAST NAME 1"), ("FIRST NAME 2", "LAST NAME 2"), ("FIRST NAME 3", "LAST NAME 3")...

Instead of:

INSERT INTO person (first_name,last_name) VALUES ("FIRST NAME 1", "LAST NAME 1");
INSERT INTO person (first_name,last_name) VALUES ("FIRST NAME 2", "LAST NAME 2");
INSERT INTO person (first_name,last_name) VALUES ("FIRST NAME 3", "LAST NAME 3");

 

And the method will look like this (with batch size e.g. 100000):

public void batchInsertNative(List<Person> entities){
    Connection connection = DatabaseProvider.getConnection();

    try {
        connection.setAutoCommit(false);
        PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO person (first_name,last_name) VALUES (?,?)");;
        for (int i = 0; i < entities.size(); i++){
            try {
                preparedStatement.setString(1,entities.get(i).getFirstName());
                preparedStatement.setString(2,entities.get(i).getLastName());
                preparedStatement.addBatch();
                if (i % 100000 == 0){
                    preparedStatement.executeBatch();
                    connection.commit();
                }
            } catch (Exception ex){
                ex.printStackTrace();
                connection.rollback();
            }
        }

        //if the number of entities is not divisable by 100000
        preparedStatement.executeBatch();
        connection.commit();
        connection.close();
    } catch (Exception e){
        e.printStackTrace();
    }
}

 

To conclude, we will repeat one more time, this approach is the same as the example batchInsertMultiTransaction, but this time it’s written in native SQL,
so it will save some of the data in case of failure. But also, it depends on what do you want to achieve – either save all the data, just the part of them or none.

  • Robii

    I would use solution number 2 because if I want to add something batch, I add all or nothing so solution number 3 is not right (maybe it is if you want explicitly to add any of data). Also, last solution sound as fast one, but it look dangerous to use JDBC if in whole project I use hibernate.

    • Robii

      I would like to see some performance test for this solutions.

  • Strahinja Petrovic

    Nice article and useful tip. Thanks Simple