Pages

Thursday, April 15, 2010

HAManager hung threads leading to an OOM

This is a technote'ish type of blog post ..

If the HAManager.thread.pool which is handling DRS agent callbacks, gets stuck in threadpool it can build up  memory pretty quick.The agent callback could be is in the waiting state from the  threadpool.                                                        

3XMTHREADINFO "HAManager.thread.pool : 0" (TID:0x714D58A0,        
sys_thread_t:0x40C35A28, state:CW, native ID:0x737F) prio=5             
java.lang.Object.wait(Native Method)          


java.lang.Object.wait(Object.java(Compiled Code))                                                                  
com.ibm.ws.util.BoundedBuffer.put(BoundedBuffer.java(Compiled Code))    
com.ibm.ws.util.ThreadPool.execute(ThreadPool.java(Compiled Code))      
com.ibm.ws.util.ThreadPool.execute(ThreadPool.java(Compiled Code))      
com.ibm.ws.drs.ha.DRSAgentClassEvents.agentMessageReceived(DRSAgentClass
Events.java
com.ibm.ws.hamanager.agent.AgentClassImpl.doDataStackMessageReceived(Age
ntClassImpl.java
com.ibm.ws.hamanager.agent.AgentClassImpl$ACCallback.doCallback(AgentCla
ssImpl.java
com.ibm.ws.hamanager.impl.Worker.run(UserCallbacks.java(Compiled Code)) 
com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java(Compiled Code))   

Explanation
DRS launches a thread from the "Default" threadpool, to handle processing of incoming messages to get off the HAManager thread.  When ThreadPool.execute() does not return it means the thread pool has been exhausted.  It is likely the customer has not increased the size of the "Default" thread pool to at least 40 (from the out-of-box default of 20) as documented in the InfoCenter.  In a large topology 40 may not be sufficient

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.