Large Enterprise Systems have growing needs to improve their business processes. Business needs translate into IT requirements to build cost effective business systems which are scalable to meet the growing business and perform the required functionality with better efficiency. In addition to this, the systems are also required to be reliable (able to handle failure situations). The answer to this is clustering where different servers are connected so that a plan B is always ready in case of unexpected situations. In a competitive environment, time effectiveness and performance of business systems are also critical factors. Caching is one of the mechanisms to achieve this. This article provides various approaches to implement caching in clustered environment at a high level.
The caching approaches described here are based on the J2EE application server properties. The basic implementation of clustering and caching are out of the scope of this article and the reader is assumed to be aware of them. The source code of implementing these approaches is not provided in the article.
Caching is one of the very essential parts in the development of J2EE application. Many J2EE applications/businesses demand the implementation of caching mechanism. When the development of J2EE application servers for supporting scalability, load balancing and clustering started, the building of cache in clustered environment and maintaining consistent data in these caches was a real challenge. Growing business needs demanded the caches in the clustered servers to be updated instantaneously and error prone.
This article aims at providing different approaches of caching in clustered environments using J2EE technologies along with their benefits/limitations.
Caching is a technique for improving the efficiency of an application in enterprise environment. Caching is the concept of temporarily storing frequently-referenced data in the application server to save the overhead of repeated access into the database. In J2EE environment, a J2EE server can passively do entity bean caching, if the entity beans is used to access the data. However, in an enterprise environment these measures are not enough to get the application to the efficiency level. Caching should be implemented to eliminate a remote call or to eliminate a need for one tier to make a call to an underlying tier. There are some points which raise the requirement to implement cache.
- Reduction of network traffic Management of amount of data to be cached.
- Status of data in the cache (e.g. Read/ write and Static )
- Complexity in implementing the cache in application.
In a J2EE web application, we need frequently accessed data in memory, but at the same time it is required that the stale data needs to be cleared out and refreshed with new data. There are many open source techniques which provide object level caching which share data across requests and users. These also manage the data since repetitive creation and loading of objects is expensive.
The data which does not change and requires a significant amount of time to return from the data source is a good candidate for caching. The data which is secured, personal (like SSN), Business related (e.g. stock market price) is not recommended for caching.
Clustering is a mechanism provided by the J2EE application server to meet the demands of scalability and high availability for mission-critical web-based applications. The challenges of software-only clustering have been met by a combination of careful state management and highly optimized protocols based on new commodity technologies such as IP multicast.
The implementation of caching has considerable benefits in applications/businesses, some of which are listed below.
- Significant improvement of application performance.
- Reduction of number of accesses in the network (e.g. database)
- Avoids the cost of creation of object.
- Shares objects in a process
- Avoids cost of acquiring and releasing objects.
a) Cache may consume high heap space in Application Server resulting in huge JVM memory size if objects are not released from cache at regular intervals.
b) Synchronization complexity is also a matter of concern. The data in the cache and the data in the data source need to be in synchronization. Otherwise the cached data will lead to inaccuracies.
Caching in clustering
The application caches are built initially when the server is started. Every application is responsible to build its cache within the JVM space. In a clustering environment, when every individual server starts its application, a cache will be maintained by the application in that server.
There are two ways of implementing caching in clustered environments. The requirement and need of the business will drive the selection of implementing the caching mechanisms.
- Pull Mechanism
- Push Mechanism
The mechanism where every participating server in the clustered environment updates its application cache regularly with any changes in the data source is referred as pull mechanism. In this case, the servers validate their cache data with the data in the data source regularly. If the data in the data source is different from that in the cache, the servers update their cache by extracting the data from the data source.
Consider a scenario where you have updated some personal details (eg. Address, phone number etc) and have been displayed the message “It will take 2 hours for your changes to get reflected.” This is a typical case of pull mechanism having been implemented. In this case, the request is processed in one of the servers (say, server1) and the data is updated in its cache of the USER ACL for its ADDRESS. Now the question arises as to how the cache in the other participating servers gets updated. This is done as follows,
Every 2 hours the participating servers query the database/datasource for any update in the data. If there is any data modified in the database/datasource, the servers update their caches accordingly. The figure below pictorially represents the steps.
1. User triggers the caching updation in the server “Weblogic 1”
2. At specified interval, the application updates its cache
Limitations of this approach
• Delay in updating of cache in different application servers.
• There is no guaranty that all the servers in the clustering will update their cache at the same time. (Scheduler in each application may not have started together)
In this scenario, the changes in the cache of one application server reflect across the caches in all the other participating servers of the clustered environment instantaneously. Any change in the cache of any of the servers invokes the process of cache updation in other servers.
For example, assume that the ‘rate of interest’ parameter is maintained in the cache of each of the servers. If the interest rate is changed in any one of the servers, then the interest rate will be immediately updated in the cache of every server for further calculation.
There are many ways to achieve push mechanism.
The most common methodology is,
- Point-to-Point connection
- Listener Mechanism
In this approach, one application server establishes the connection with the rest of the application servers participating in the clustering and updates their caches accordingly.
The steps 1 to 6 in the above figure are explained below:
1. User triggers the caching updating in the Weblogic 1 server.
2. Application update the cache within the Weblogic 1 server
3. Application of Weblogic 1 server establishes connection with Weblogic 2 server and triggers the caching module of Application in Weblogic server 2.
4. Application of Weblogic 1 server establishes the connection with Weblogic 3 server and triggers the caching module of Application in Weblogic server 3.
5. Application of Weblogic 3 server updates its cache.
6. Application of Weblogic 2 server updates its cache.
Limitations of this approach
Opening the connection, authentication and closing the connection for every cache update is found to be very expensive and time consuming.
In this approach, the cache in every server in the cluster is updated through the listener mechanism. Each listener in the application will be listening to one of the queues in the MS. The application server is responsible for replicating the messages across the servers in the cluster.
1. User triggers the caching updating in the server Weblogic 1.
2. Application from the server “Weblogic 1” sends the message to its JMS server using JMS. Now the server “Weblogic 1” sends the message to all the other Weblogic servers which are participating in the clustering. In this example “Weblogic 2” and “Weblogic 3”
3. The listener will read the message from the Messaging server from respective Weblogic servers.
4. Application will update the cache
Benefits of this approach
- This approach is very much near to real.
- There is no need of establishing explicit connection.
- Since the all the servers are in clustering, messages can me replicated automatically to rest of the servers.
- Higher scalability