Hibernate and the n+1 selections problem

Hibernate and the n+1 selections problemJoao MunhozBlockedUnblockFollowFollowingMay 27Photo by Foo Bar on UnsplashOne of the greatest challenges in software design is the difference of paradigm between the object-oriented world (in which many applications rely on) and the relational world of databases.

Converting objects to tables and vice-versa is what we call object-relational mapping (ORM).

Building your own ORM solution is no trivial task, requires a great effort and is very error-prone.

Therefore, it is not the recommended approach most of the times, especially considering that there are some reliable and well-established alternatives.

One of the most important and used in the Java world is Hibernate.

Although Hibernate abstracts most of the persistence logic and attempts to achieve low latency when making read/write operations in the database, it is no silver bullet.

In particular, we have to be very careful with the way we write our queries, especially for objects that have collections as an attribute because this can lead to severe performance problems.

Basically, we can map our collection associations with two different fetching types: the first one is eager, which means that whenever the top-level object is loaded, the associated collection will be completely loaded as well, regardless if it will be used or not.

The other fetch type is lazy; in this case, the collection is not loaded when we retrieve the top-level object and the application will be responsible for handling it when needed (if needed at all).

It is common (and recommended) to map most associations for lazy initialization.

This way, the entire object tree is not retrieved right away for every query.

But not loading the related objects together with the top-level ones can lead to the n+1 selections problem, one of the main performance issues related to persistence.

Let’s take a deeper look at this.

Suppose we are building an application to control enterprise buildings, in which each building can contain a number of companies.

To model this in software, we could have a Building class that contains a collection of instances of a Company class.

Following the guidelines for this kind of association, we would map this relation using lazy initialization, as explained before.

But what happens if I need, for example, to calculate the sum of the rent the companies pay for all the buildings to get the total profit?We could do it like that:@Transactionalpublic int getTotalProfit() { List<Building> buildings = buildingRepository.

findAll(); int totalProfit = 0; for (Building building : buildings) { for (Company company : building.

getCompanies()) { totalProfit+=company.

getRent(); } } return totalProfit;}Considering that we mapped our Building-Company association with lazy initialization, the Company collection for each building will not be loaded when we reach the repository to get all the buildings.

But it must be loaded when we call building.

getCompanies(), and Hibernate will reach the DB again to fetch this data at each iteration.

So, although we explicitly call the repository to get the entities only one time, we, in fact, are doing a much greater number of reading operations.

If we have n buildings, we will have n+1 selections (the general and one extra for each building) and that is from where the name of the problem comes.

You could be thinking now “Ok, but couldn’t we do a better logic than that and simply avoid this in the first place?”.

The answer is yes, but this problem may appear in cases that are not as trivial as this one, being hidden in more complex logic.

Now that we have explained the problem, let us take a look at some approaches to mitigate and even solve it.

For a simple case like this one, we could just move the operation done in memory to an aggregate function that could be written directly in HQL (Hibernate Query Language).

This would be more performative and is recommended for the cases where our code just applies operations that are available directly in the database.

But what to do in the general case?The first way to mitigate this problem is by enabling querying objects in batches.

In this approach, Hibernate will load in advance the next x elements of the collection when the first one is accessed.

This would reduce the problem from n+1 to n/x + 1 selections, being x the batch size defined by us.

The listing below shows how this could be done for our Building-Company example.

@Entitypublic class Building { @OneToMany(fetch = FetchType.

LAZY) @BatchSize(size = 10) private List<Company> companies;}By adding the BatchSize annotation to our mapping with size equals to 10, we are telling Hibernate to load 10 Company instances every time we try to access one that was not loaded yet.

Tunning the batch size can solve the latency for your application but has the downside that maybe these objects will be retrieved unnecessarily.

Another solution would be defining the fetch type of the association as eager in the class-level mapping (as in the listing below).

This would solve the problem, but there is a huge downside that all queries that load this object will load the full collection too.

This would spend a lot of resources unnecessarily and would increase lock time in the database, impacting directly in the concurrency capacity of the application.

Therefore, eager mapping at class level is not advised.

@Entitypublic class Building { @OneToMany(fetch = FetchType.

EAGER) private List<Company> companies;}The recommended solution is taking advantage of the support that Hibernate gives to instructions at execution time.

So, instead of defining the association as eager fetching in the class level, we define it in the query level just when necessary.

This can be done by using setFetchMode in a Criteria query or using a join fetch in HQL.

So, for our Building-Company example, we could create a new method findAllBuildingsWithCompany, implemented like this://Using Criteria interfacepublic List<Building> findAllBuildingsWithCompany() { Criteria criteria = sessionFactory.

getCurrentSession().

createCriteria(Building.

class); criteria.

setFetchMode("Company", FetchMode.

EAGER); return criteria.

list();}//Using HQL@Query("SELECT b FROM Building b JOIN FETCH b.

companies")List<Building> findAllBuildingsWithCompany();So, instead of obtaining just the top-level objects in the initial query and then fetching the necessary associations while the application navigates the object tree, we follow this process:Fetch all necessary objects in the initial query, specifying exactly which associations will be accessed during the transactionNavigate the object tree, that now consists only of objects that were already retrieved in memoryWith this solution, we guarantee that we will have only one read operation in the DB and we don’t force all fetches for this entity to load all the data, which would spend unnecessary resources many times.

Finally, there is one extra solution to a particular example of this problem.

For collections in which the number of instances is sufficiently small, it is possible to maintain all these instances in Hibernate second-level cache.

An example of how to enable cache to a class is shown in the listing below.

@Entity@Cacheable@Cache(usage = READ_WRITE)public class Company { private int rent;}Using this approach would avoid the need to reach the database to fetch this info every time it is needed.

It is a good solution when applicable (which is not the case many times, unfortunately).

So to conclude, although Hibernate does a very good job in abstracting the persistence layer and helps with performance as much as possible, we cannot trust it blindly and our code must be designed to search objects efficiently.

Whenever writing queries that use methods as list(), iterate() or load(), be aware that accessing collections may lead to the n+1 problem and to latency issues that may reduce drastically the performance of the application.

.

. More details

Leave a Reply