Concerns and issues relating to all versions of WebSphere Application Server
Customers often have questions around initialization, latency and the possiblity of race conditions when putting data in a Dynacache DistributedMap. I have tried to collect all these questions and answers here in one place ...
Q Why is my cache instance data not replicating across cluster members ?
A In order for a cache instance to be replicated the cache instance has to be associated with a replication domain. This association (Cache Instance -- Replication domain ) occurs through the cache instance configuration when the cache-instance is defined through 1. a properties file, 2. admin console or 3. programmatically. All the application servers in the cluster need to be added to the replication domain. Please note that replication ONLY occurs after the cache instances have bootstrapped with one another via the Data Replication Service (DRS). This initial bootstrap of cache instances takes some time and leads to situations where
Objects placed in DistributedMap are not replicated. Bootstrap only occurs when a cache instance is created. A cache instance is ONLY created when it is first accessed * i..e. on demand. It is best practice to access the cache instance to trigger bootstrap as early as possible in the application life cycle . See next queston's answer for details.
Q What is the best practice in initializing the Dynacache DistributedMap cache instance for replication in a cluster environment ?
A It is best to create and access the Dmap in a ServletContextListener. The ServetContext is created by the container when the web application is deployed and after that only the context is available to each servlet in the web application. There is ONE servletContext for the entire web application.ServletContextListener is a interface which contains two methods:
-- public void contextInitialized(ServletContextEvent event) ... do the context.lookup("cache/myDmapCache"); here
-- public void contextDestroyed(ServletContextEvent event) ... null out the map here
Doing this very early in the application lifecycle helps boostrap all the cache instances quickly.
Another thing you can do, is hook up a com.ibm.websphere.cache.ChangeListener to the DistributedMap intf. to listen on the CRUD operations in the cache. void cacheEntryChanged(ChangeEvent e) This method is invoked when there is a change to a cache entry. With respect to logging, you can turn on trace for com.ibm.ws.cache.drs.DRSMessageListener and watch for "updateEntryProp" trace string.
Q Is there anyway to synchronize access to the map cross member? i.e. is there a way to guarantee that once a put completes that any subsequent get for the same ID on any cluster member will see the value of the added by the put?
A Unfortunately there is no way to synchronize access across members. Dynacache-DRS does not provide distributed locking across JVMs like Teracotta or some other grid products.
* In WAS7 Dynacache has a JVM generic customer property called "com.ibm.ws.cache.CacheConfig.createCacheAtServerStartup" which when set to "true" will create cache instances automatically at server startup instead of the default on-demand behavior.
This is a technote'ish type of blog post ..
If the HAManager.thread.pool which is handling DRS agent callbacks, gets stuck in threadpool it can build up memory pretty quick.The agent callback could be is in the waiting state from the threadpool.
3XMTHREADINFO "HAManager.thread.pool : 0" (TID:0x714D58A0,
sys_thread_t:0x40C35A28, state:CW, native ID:0x737F) prio=5
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java(Compiled Code))
com.ibm.ws.util.BoundedBuffer.put(BoundedBuffer.java(Compiled Code))
com.ibm.ws.util.ThreadPool.execute(ThreadPool.java(Compiled Code))
com.ibm.ws.util.ThreadPool.execute(ThreadPool.java(Compiled Code))
com.ibm.ws.drs.ha.DRSAgentClassEvents.agentMessageReceived(DRSAgentClass
Events.java
com.ibm.ws.hamanager.agent.AgentClassImpl.doDataStackMessageReceived(Age
ntClassImpl.java
com.ibm.ws.hamanager.agent.AgentClassImpl$ACCallback.doCallback(AgentCla
ssImpl.java
com.ibm.ws.hamanager.impl.Worker.run(UserCallbacks.java(Compiled Code))
com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java(Compiled Code))
Explanation
DRS launches a thread from the "Default" threadpool, to handle processing of incoming messages to get off the HAManager thread. When ThreadPool.execute() does not return it means the thread pool has been exhausted. It is likely the customer has not increased the size of the "Default" thread pool to at least 40 (from the out-of-box default of 20) as documented in the InfoCenter. In a large topology 40 may not be sufficient
There is such a plethora of tools for WebSphere and JVM problem determination that it becomes difficult to know the correct tool to use for a problem. The presentation below, provides guidance on the exact tools and techniques to use for specific issues with your JEE application. This was presented to the Southern California WebSphere Users Group in April 2010.