Skip to main content

Understanding Hibernate @BatchSize

Greetings!

@BatchSize is one of those misunderstood concepts in Hibernate. I myself thought it was for the collection but I was wrong. Not convinced yet? let's find it out.

(Simple example repo can be found here https://github.com/slmanju/hibernate-batchsize)
Let's assume we have a one-to-many relationship with Foo and Bar where Foo is the parent. Whether this is lazy or eager does not matter but I'll use Lazy loading.
@Entity
@Table(name = "foo")
public class Foo {

  @OneToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
  @JoinColumn(name="foo_id")
  private List<Bar> bars = new ArrayList<>();

}
I will have 5 Foos each having 10 Bars.
for (int i = 0; i < 5; i++) {
  Foo foo = Foo.getInstance("Foo-" + i);
  for (int j = 0; j < 10; j++) {
    foo.addBar(Bar.getInstance("Bar-" + i + "-" + j));
  }
  entityManager.persist(foo);
}

Single Entity

When we try to access the Bar collection from Foo what will happen? It will execute a select query for Bar. That is easy.
foo.getBars().size();

Hibernate: select bars0_.foo_id as foo_id3_0_1_, bars0_.id as id1_0_1_, bars0_.id as id1_0_0_, bars0_.text as text2_0_0_ from bar bars0_ where bars0_.foo_id=?
select .. from bar where foo_id=?

Multiple Entities

Now what if we have a Foo collection and we want to access Bar for each Foo. There will be N number of queries to fetch Bar for each Foo. That is obvious as those are separate objects. This is the well known (N + 1) problem in Hibernate.
List<Foo> foos = entityManager.createQuery("from Foo", Foo.class).getResultList();
for (Foo foo : foos) {
  foo.getBars().size();
}

Hibernate: select foo0_.id as id1_1_, foo0_.text as text2_1_ from foo foo0_
Hibernate: select bars0_.foo_id as foo_id3_0_0_, bars0_.id as id1_0_0_, bars0_.id as id1_0_1_, bars0_.text as text2_0_1_ from bar bars0_ where bars0_.foo_id=?
Hibernate: select bars0_.foo_id as foo_id3_0_0_, bars0_.id as id1_0_0_, bars0_.id as id1_0_1_, bars0_.text as text2_0_1_ from bar bars0_ where bars0_.foo_id=?
Hibernate: select bars0_.foo_id as foo_id3_0_0_, bars0_.id as id1_0_0_, bars0_.id as id1_0_1_, bars0_.text as text2_0_1_ from bar bars0_ where bars0_.foo_id=?
Hibernate: select bars0_.foo_id as foo_id3_0_0_, bars0_.id as id1_0_0_, bars0_.id as id1_0_1_, bars0_.text as text2_0_1_ from bar bars0_ where bars0_.foo_id=?
Hibernate: select bars0_.foo_id as foo_id3_0_0_, bars0_.id as id1_0_0_, bars0_.id as id1_0_1_, bars0_.text as text2_0_1_ from bar bars0_ where bars0_.foo_id=?
select .. from bar where foo_id=? (5 times)
Among other solutions, Hibernate provides @BatchSize to solve this issue.
Instead of (N + 1) this will roughly execute (N/M + 1) queries.

@BatchSize

Hibernate offers the @BatchSize annotation, which can be used when fetching uninitialized entity proxies. (Hibernate doc)
Using batch fetching, Hibernate can load several uninitialized proxies if one proxy is accessed. Batch fetching is an optimization of the lazy select fetching strategy. (StackOverflow)
@BatchSize(size = 10)
@OneToMany(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
@JoinColumn(name="foo_id")
private List<Bar> bars = new ArrayList<>();

List<Foo> foos = entityManager.createQuery("from Foo", Foo.class).getResultList();
for (Foo foo : foos) {
  foo.getBars().size();
}

Hibernate: select bars0_.foo_id as foo_id3_0_1_, bars0_.id as id1_0_1_, bars0_.id as id1_0_0_, bars0_.text as text2_0_0_ from bar bars0_ where bars0_.foo_id in (?, ?, ?, ?, ?)
select .. from bar where foo_id in (?, ?, ?, ?, ?)
Boom!! Hibernate uses IN clause to fetch other uninitialized proxies as well.
What will happen if we access only the first in the list.
foos.get(0).getBars().isEmpty();
It is still the same. It will executes IN for other proxies.

It is not on the Collection

Confusion here is, it feels like this is for the collection, Bar in our example. It is not. It is for the uninitialized proxies in the session which is Foo in our example. I repeat! It fetches collections for other objects when we access a single entity. Feeling confused isn't?
Let's add it to our example and rerun. I access the first Foo but it executes IN query for other entities. This is clever magic.
Update it to another value and see again. You can see it has multiple IN queries. If I use 2 as my batch size it will have 3 queries which is 2 + 2 + 1 = 5.
@BatchSize(size = 2)
private List<Bar> bars = new ArrayList<>();

Hibernate: select bars0_.foo_id as foo_id3_0_1_, bars0_.id as id1_0_1_, bars0_.id as id1_0_0_, bars0_.text as text2_0_0_ from bar bars0_ where bars0_.foo_id in (?, ?)
Hibernate: select bars0_.foo_id as foo_id3_0_1_, bars0_.id as id1_0_1_, bars0_.id as id1_0_0_, bars0_.text as text2_0_0_ from bar bars0_ where bars0_.foo_id in (?, ?)
Hibernate: select bars0_.foo_id as foo_id3_0_1_, bars0_.id as id1_0_1_, bars0_.id as id1_0_0_, bars0_.text as text2_0_0_ from bar bars0_ where bars0_.foo_id=?

Performance

Hibernate @BatchSize greatly improves the performance by solving (N + 1) problem. However it should be noted that this still executes multiple queries if your entity size is greater than batch size. This is why the documentation itself recommends using direct JOIN or projection (or even custom IN).
Also it should be noted that Hibernate executes queries even if you check the size, check whether it is empty. Be careful of this as this might not be your real intention.
One more thing to keep in mind is that your database will have a limit (maximum ids inside the clause) for IN clauses.

Summary

Hibernate @BatchSize is a great way to improve the performance. However be aware that this still fires multiple queries if the size > M. It should also be noted that this will execute queries even if you don't want (say you want to check the first item). This may or may not fit for your use case. Hence use it with better understanding.

References

https://docs.jboss.org/hibernate/orm/5.3/userguide/html_single/Hibernate_User_Guide.html#fetching-batch
https://stackoverflow.com/questions/25210949/understanding-batchsize-in-hibernate

Happy coding ☺

Comments