This article gives an introduction to Hibernate and brief overview of how to tune the Hibernate components and various strategies used to tune the application which uses Hibernate.
Hibernate is an object/relational mapping tool for Java environments. The term object/relational mapping (ORM) refers to the technique of mapping a data representation from an object model to a relational data model with a SQL-based schema.
Hibernate not only takes care of the mapping from Java classes to database tables (and from Java data types to SQL data types), but also provides data query and retrieval facilities and can significantly reduce development time otherwise spent with manual data handling in SQL and JDBC.
There are various Fetching strategies and various ways of tuning java collection objects which are used with Hibernate. Hibernate provides ways with which we can get the Metrics/Statistics which can be used for Performance Tuning.
The following sections describe few strategies on how to tune in Hibernate to match the application needs.
- N+1 query problem
- Fetching Strategies
- Use of Criteria Query
- Use of Servlet Context Cache
Solving the n+1 selects problem
The biggest performance killer in applications that persists objects to SQL databases is the n+1 selects problem. When you tune the performance of a Hibernate application, this problem is the first thing you’ll usually need to address. Its normal (and recommended) to map almost all associations for lazy initialization.
This means you generally set all collections to lazy="true" and even change some of the one-to-one and many-to-one associations to not use outer joins by default.
This is the only way to avoid retrieving all objects in the database in every transaction. Unfortunately, this decision exposes you to the n+1 selects problem.
It’s easy to understand this problem by considering a simple query that retrieves all Items for a particular user:
Iterator items = session.createCriteria(Item.class)
.add( Expression.eq("item.seller", user) )
.list()
.iterator();
This query returns a list of items.
Join is supposed to solve the n+1 problem. If you have 10 parents, each with 10 children, join will require one query and select will require 11 (one for the parents and one for the children of each parent). This may not be a big deal if the database is on the same server as the application or if the network is really fast, but if there is latency in each database call, it can add up. The join method is a little less efficient on the initial query because you're duplicating the parent columns in every row, but you only make one round-trip to the database.
Generally, if we need the children of all the parents, then go with join. If we need only the children of a few parents, then use select.
Fetching Strategies
A fetching strategy is the strategy hibernate will use for retrieving associated objects if the application needs to navigate the association. Fetch strategies may be declared in the O/R mapping metadata, or over-ridden by a particular HQL or Criteria query.
Hibernate defines the following fetching strategies:
- Batch fetching - an optimization strategy for select fetching - Hibernate retrieves a batch of entity instances or collections in a single SELECT, by specifying a list of primary keys or foreign keys.
- Subselect fetching - a second SELECT is used to retrieve the associated collections for all entities retrieved in a previous query or fetch. Unless you explicitly disable lazy fetching by specifying lazy="false", this second select will only be executed when you actually access the association.
- Join fetching - Hibernate retrieves the associated instance or collection in the same SELECT, using an OUTER JOIN.
- Select fetching - a second SELECT is used to retrieve the associated entity or collection. Unless you explicitly disable lazy fetching by specifying lazy="false", this second select will only be executed when you actually access the association.
Batch fetching
With batch fetching enabled, Hibernate pre-fetches the next 10 collections when the first collection is accessed. This reduces the problem from n+1 selects to n/10 + 1 selects. For many applications, this may be sufficient to achieve acceptable latency. On the other hand, it also means that in some other transactions, collections are fetched unnecessarily. It isn’t the best we can do in terms of reducing the number of round trips to the database
For eg:
Hibernate when using lazy fetching (in its default format) will run n+1 selects to give us all of the pets and owners - where n
is the number of pets. So, assuming we have 3 pets and 3 owners:
The selects that would be eventually fired by Hibernate would look like this:
-- get all of the pets first
Select * from pet
-- get the owner for each pet returned
Select * from owner where pet_id=1
select * from owner where pet_id=2
select * from owner where pet_id=3
This is 4 (3+1, n=3) select statements. This is certainly not optimal. The biggest problem is that this application isn't going to scale. Before you know it, you'll have fifty registered pets, and you're executing fifty-one select statements, taking up a very noticable amount of time. Wouldn't it be nice if we could do something more like this:
-- get all of the pets first
Select * from pet
-- get all owners in a single select
Select * from owner where pet_id in (1, 2, 3)
Now we only have two selects, and the second one can scale much better than linearly. This is great; but how can we achieve this through Hibernate? Cases like this are often the scenarios that people attack O/R mappers over, saying they aren't smart enough and flexible enough to meet the performance demands. It turns out Hibernate provides all kinds of options in this case.
The way to tell Hibernate to use the latter solution is to tell it that a certain class is batch-able. You do this by adding the batch-size attribute to either a.) the entity definition for the association being fetched (e.g. the definition for the Owner class) or b.) the collection definition on a class with a collection mapping. Here is the mapping declaration for the example above:
<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC
"-//Hibernate/Hibernate Mapping DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping package="com.javalobby.tnt.hibernate">
<class name="Pet">
<id name="id"><generator class="native"/></id>
<property name="name"/>
<many-to-one name="owner" column="owner_id" class="Owner"/>
</class>
<class name="Owner" batch-size="50">
<id name="id"><generator class="native"/></id>
<property name="name"/>
</class>
</hibernate-mapping>
Note the batch size is set to fifty. What a batch size means is the number of sub-elements that will be loaded at one time (the number of parameters to the 'in' clause of the SQL). If you set this number to 10, for instance, and you had 34 records to load the association for, it would load ten, ten, ten, and then four - executing 5 total select statements.
Here is the finished SQL emitted by Hibernate (sprinkled with my log statements so you can see when they were triggered again):
Hibernate: select pet0_.id as id, pet0_.name as name0_, pet0_.owner_id as owner3_0_ from Pet pet0_
Pet: Snoopy
Hibernate: select owner0_.id as id0_, owner0_.name as name1_0_ from Owner owner0_ where owner0_.id in (?, ?, ?)
Owner: Rick
Pet: Garfield
Owner: Matt
Pet: Satchel
Owner: R.J.
Let's say now, that this example gets turned on it's head, and we want to look at owners rather than pets. Owners (as our diagram above implies) are allowed to have multiple pets. We want to be able to select all owners, and then iterate over each of their pets. Let's see what Hibernate does in this scenario. Here is our new class:
package org.javalobby.tnt.hibernate.lazy;
import java.util.List;
import org.hibernate.*;
import com.javalobby.tnt.hibernate.*;
public class LazyTest
{
public static void main(String[] args)
{
Session s = HibernateSupport.currentSession();
try
{
Query q = s.createQuery("from Owner");
List<Owner> l = q.list();
for(Owner owner : l)
{
System.out.println("Owner: " + owner.getName());
for(Pet pet : owner.getPets())
{
System.out.println("\tPet: " + pet.getName());
}
}
}
finally
{
HibernateSupport.closeSession(s);
}
}
}
Here is new mapping declaration:
<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC
"-//Hibernate/Hibernate Mapping DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping package="com.javalobby.tnt.hibernate">
<class name="Pet">
<id name="id"><generator class="native"/></id>
<property name="name"/>
<many-to-one name="owner" column="owner_id" class="Owner"/>
</class>
<class name="Owner" batch-size="50">
<id name="id"><generator class="native"/></id>
<property name="name"/>
<set name="pets">
<key column="owner_id" />
<one-to-many class="Pet"/>
</set>
</class>
</hibernate-mapping>
and here is the output:
Hibernate: select owner0_.id as id, owner0_.name as name1_ from Owner owner0_
Owner: R.J.
Hibernate:
select
pets0_.owner_id as owner3___,
pets0_.id as id__,
pets0_.id as id0_,
pets0_.name as name0_0_,
pets0_.owner_id as owner3_0_0_
from Pet pets0_
where pets0_.owner_id=?
Pet: Satchel
Pet: Bucky
Owner: Rick
Hibernate:
select
pets0_.owner_id as owner3___,
pets0_.id as id__,
pets0_.id as id0_,
pets0_.name as name0_0_,
pets0_.owner_id as owner3_0_0_
from Pet pets0_
where pets0_.owner_id=?
Pet: Snoopy
Owner: Matt
Hibernate:
select
pets0_.owner_id as owner3___,
pets0_.id as id__,
pets0_.id as id0_,
pets0_.name as name0_0_,
pets0_.owner_id as owner3_0_0_
from Pet pets0_
where pets0_.owner_id=?
Pet: Garfield
Pet: Odie
As we can see, we are back to a slow linear situation - it is running a select for each owner it gets back; that's really not optimal. Thankfully, collections can be batched as well - here is our new mapping declaration:
<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC
"-//Hibernate/Hibernate Mapping DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping package="com.javalobby.tnt.hibernate">
<class name="Pet">
<id name="id"><generator class="native"/></id>
<property name="name"/>
<many-to-one name="owner" column="owner_id" class="Owner"/>
</class>
<class name="Owner" batch-size="50">
<id name="id"><generator class="native"/></id>
<property name="name"/>
<set name="pets" batch-size="50">
<key column="owner_id" />
<one-to-many class="Pet"/>
</set>
</class>
</hibernate-mapping>
and here is our new output:
Hibernate: select owner0_.id as id, owner0_.name as name1_ from Owner owner0_
Owner: R.J.
Hibernate: select pets0_.owner_id as owner3___, pets0_.id as id__, pets0_.id as id0_, pets0_.name as name0_0_, pets0_.owner_id as owner3_0_0_ from Pet pets0_ where pets0_.owner_id in (?, ?, ?)
Pet: Bucky
Pet: Satchel
Owner: Rick
Pet: Snoopy
Owner: Matt
Pet: Garfield
Pet: Odie
Much better! Keep in mind that the 'batch-size' parameter has *no* bearing on how many elements inside the collection are loaded . Instead, it defines how many collections should be loaded in a single select. No matter what setting you provide, it will always retrieve 'Bucky and Satchel' in a single select statement as defined above, because they are part of the same collection. I repeat - batch size in collections defines *how many collections* will be retrieved at once.
Subselect fetching
Subselect fetching is very similar to batch size controlled fetching.
Subselect fetching is actually a different type of fetching strategy that is applied to collection style associations. Unlike join style fetching, however, subselect fetching is still compatible with lazy associations. In other words, it uses subselect execution to pass the ID set of the main entity set into the select off of the association table:
select * from owner
select * from pet where owner_id in (select id from owner)
This is very similar to the previous examples, but all of the burden is now put on the database; and the batch size is effectively infinity.
Here is the new mapping declaration:
<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC
"-//Hibernate/Hibernate Mapping DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping package="com.javalobby.tnt.hibernate">
<class name="Pet">
<id name="id"><generator class="native"/></id>
<property name="name"/>
<many-to-one name="owner" column="owner_id" class="Owner"/>
</class>
<class name="Owner" batch-size="50">
<id name="id"><generator class="native"/></id>
<property name="name"/>
<set name="pets" fetch="subselect">
<key column="owner_id" />
<one-to-many class="Pet"/>
</set>
</class>
</hibernate-mapping>
and here is the output:
Hibernate: select owner0_.id as id, owner0_.name as name1_ from Owner owner0_
Owner: R.J.
Hibernate:
select
pets0_.owner_id as owner3_1_,
pets0_.id as id1_,
pets0_.id as id0_,
pets0_.name as name0_0_,
pets0_.owner_id as owner3_0_0_
from Pet pets0_
where pets0_.owner_id in
(select owner0_.id from Owner owner0_)
Pet: Satchel
Pet: Bucky
Owner: Rick
Pet: Snoopy
Owner: Matt
Pet: Garfield
Pet: Odie
Lazy fetching
By default, Hibernate uses lazy select fetching for collections and lazy fetching for single-valued associations. These defaults make sense for almost all associations
in almost all applications. However, lazy fetching poses one problem that you must be aware of. Access to a lazy association outside of the context of an open Hibernate session will result in an exception.
Let say, a table ‘EMPLOYEE.
The Employee.hbm.xml file will be
<class name="com.net.web.Employee" table="EMPLOYEE">
<id name="empId" column=" EMP_ID" type="integer">
<generator class="increment"/>
</id>
<property name="name" column=" NAME" type="string"/>
</class>
If this table has mapping with the ‘COMPANY’ table.
Company.hbm.xml:
<class name="com.net.web.Company" table="COMPANY">
<id name="compId" column="COMP_ID" type="integer">
<generator class="increment"/>
</id>
<property name="name" column=" NAME" type="string"/>
<set name=”employees”>
<key column=”EMP_ID”/>
<one-to-many class = ”com.net.web.Employee” />
</set>
</class>
Employee.hbm.xml:
<class name="com.net.web.Employee" table="EMPLOYEE">
<id name="empId" column="EMP_ID" type="integer">
<generator class="increment"/>
</id>
<many-to-one name=”company” column=”COMP_ID” class=” com.net.web.Company”/>
<property name="name" column=" NAME" type="string"/>
</class>
Lets say, select * from employee;
For the above mapping, the generated hibernate query will be
Hibernate: select this_.emp_id, this_.comp_id, this_.name from employee this_
Queries differ corresponding to the attributes given in the mapping file.
Case 1:
<many-to-one name=”company” column=”COMP_ID” class=” com.net.web.Company” lazy=”false” fetch=”select”/>
The generated query will be,
Hibernate: select this_.emp_id, this_.comp_id, this_.name from Employee this_
Hibernate: select company0_.comp_id, company0_.name from Company company0_
Hibernate: select company0_.comp_id, company0_.name from Company company0_
Hibernate: select company0_.comp_id, company0_.name from Company company0_
If we specify the lazy attribute as ‘false’, the hibernate will preload the Company objects. For ‘select’ fetch, DB hit will be equal to the number of records in the Employee table. And for each hit, a separate query will run.
The main purpose of specifying lazy=’false’ is to fetch the objects in the query itself.
If we want the company object, we need to get it from the Employee object by calling the corresponding getter method. The Company object will be available through out the application.
Case 2:
<many-to-one name=”company” column=”COMP_ID” class=” com.net.web.Company” lazy=”false” fetch=”join”/>
The generated query will be,
Hibernate: select this_.emp_id, this_.comp_id, this_.name, company0_.comp_id, company0_.name from Employee this_,Company company0_ where this_.comp_id = company0_.comp_id(+)
If we specify the lazy attribute as ‘false’, hibernate will preload the Company objects. For ‘join’ fetch, only one DB hit will fetch the Company objects in a single query.
The main purpose of specifying lazy=’false’ is to fetch the objects in the query itself.
If we want the company object, we need to get it from the Employee object by calling the corresponding getter method. The Company object will be available through out the application.
Case 3:
<many-to-one name=”company” column=”COMP_ID” class=” com.net.web.Company” fetch=”select”/>
The generated query will be,
Hibernate: select this_.emp_id, this_.comp_id, this_.name from Employee this_
If we don’t specify the lazy attribute, the hibernate will not preload the Company objects. In this case, specifying fetch=’select’ will have no meaning since the Company objects are fetched in the query.
If we want the company object, we need to get it from the Employee object by calling the corresponding getter method. The Company object will be available until the Hibernate session is closed.
Case 4:
<many-to-one name=”company” column=”COMP_ID” class=” com.net.web.Company” fetch=”join”/>
The generated query will be,
Hibernate: select this_.emp_id, this_.comp_id, this_.name, company0_.comp_id, company0_.name from Employee this_,Company company0_ where this_.comp_id = company0_.comp_id(+)
If we don’t specify the lazy attribute, the hibernate will not preload the Company objects. For ‘join’ fetch, only one DB hit will fetch the Company objects in a single query.
If we want the company object, we need to get it from the Employee object by calling the corresponding getter method. The Company object will be available until the Hibernate session is closed.
Consolidated Points:
Attributes of Many-to-one mapping | When to Use | |
Lazy | Fetch | |
False | Select | If we want the object through out the application. The object will be fetched in the query itself. |
False | Join | If we want the object through out the application. The object will be fetched in the single query itself. |
- | Select | The object will not be fetched in the query. We can the get the object until the Hibernate session is closed. |
- | Join | The object will be fetched in the single query itself . We can the get the object until the Hibernate session is closed. |
Use of Criteria Query
1. Dynamic association fetching:
You can specify association fetching semantics at runtime using setFetchMode()
.
2. Usually, the mapping document is not used to customize fetching. Instead, we keep the default behavior, and override it for a particular transaction, using left join fetch in HQL. This tells Hibernate to fetch the association eagerly in the first select, using an outer join. In the Criteria query API, you would use setFetchMode(FetchMode.JOIN).
3. If you want to change the fetching strategy used by get() or load(), you can use a Criteria query.
For example:
User user = (User) session.createCriteria(User.class)
.setFetchMode("permissions", FetchMode.JOIN)
.add( Restrictions.idEq(userId) )
.uniqueResult();
Use of Servlet Context Cache
If a Object(Retrieved from Database) is used frequently in the application, then instead of hitting the DB for each DAO call, you can load those objects and put it in the Servlet Cache. This will load the objects from the cache instead of the DB hit.
Collection performance
- Indexed collections
- sets
- bags
All indexed collections (maps, lists, and arrays) have a primary key consisting of the <key>
and <index>
columns. In this case, collection updates are extremely efficient. The primary key can be efficiently indexed and a particular row can be efficiently located when Hibernate tries to update or delete it.
Sets have a primary key consisting of <key>
and element columns. This can be less efficient for some types of collection element, particularly composite elements or large text or binary fields, as the database may not be able to index a complex primary key as efficiently. However, for one-to-many or many-to-many associations, particularly in the case of synthetic identifiers, it is likely to be just as efficient. If you want SchemaExport
to actually create the primary key of a <set>
, you must declare all columns as not-null="true"
.
<idbag>
mappings define a surrogate key, so they are efficient to update. In fact, they are the best case.
Bags are the worst case since they permit duplicate element values and, as they have no index column, no primary key can be defined. Hibernate has no way of distinguishing between duplicate rows. Hibernate resolves this problem by completely removing in a single DELETE
and recreating the collection whenever it changes. This can be inefficient.
Conclusion
The above are only some of the various ways that the application can be tuned. Performance must always be a key consideration throughout the stages of the application development life cycle - from design to deployment. It happens too often that performance takes a back seat to functionality, and problems are found later that are difficult to fix.
1 comments:
Thank you so much for writtening this blog..It hleps me a lot to understand at basic level. Good post..
Post a Comment