Today's post is an answer from WebSphere Application Server SWAT Team member Kevin Grigorenko
in response to the question
Given the problem of a slow HTTP interaction (as measured by a load generation client for example), how would you go about gathering more detail on where the time is being spent?
1. Leverage the Trace Request Analyzer
plugin from ISA with the detection gap set to 1ms. Trace and Request Analyzer for WebSphere Application Server allows
you to find delays and possible hangs from WebSphere trace files and
HTTP plug-in traces by parsing call trees of methods and traces and
calculating delays in each method and trace.
2. Take three or four javacores spread 20 seconds apart to see if threads are hanging in particular java operation or code paths. Javacores are the poor man's profiler to debug performance issues.
3. Use Request Metrics (you can start off at Hops and then, if necessary, Performance_debug). This will print a line to SystemOut.log for each request, information like the URL and user IP, and how long it took (and if you do component level, how long each component took, e.g. a servlet forward, etc. -- I think Hops will show database times since it's a boundary). This functionality is essentially what monitoring tools use (just as an agent instead of SystemOut.log). Often, ARM to SystemOut.log has a huge overhead, so you can use a filter if you have some idea of what might be causing the slow request. http://publib.boulder.ibm.com/infocenter/wasinfo/v7r0/topic/com.ibm.websphere.nd.doc/info/ae/ae/uprf_rrequestmetrics.html
4. If IHS or an ODR or some other web server fronts WAS, then you can use something like LogFormat %D or %T to print the time each request takes to access.log and then dive into that slow URL from there. You can also use this to then filter ARM in #3. http://publib.boulder.ibm.com/httpserv/manual70//mod/mod_log_config.html
5. Use a custom Request Metrics agent
, that can do, for example, a javacore when a request takes over a threshold, like 1 second -- this can essentially be extended to do what a monitoring product does -- i.e. avoiding the overhead of SystemOut.log -- let me know if you need the code.
6. Use a monitoring product like ITCAM
or Wily Introscope or HP Open View.
There are multiple approaches to tackle this problem all of which involve a certain level of monitoring or logging. These approaches are detailed here
I suggest starting in the following order to understand why requests may be taking longer
1. Doing a review of existing PMI data
2. Taking three or four javacores spread 20 seconds apart during the slow request to see if threads are hanging in particular java operation or code paths. Javacores are the poor man's profiler to debug performance issues.
3. Change log format at both the IHS and WAS tier to log response times in order to get response times for individual requests i.e. methods 0 and 4 in Kevin's blog post
4. IBM -Xtrace
5. Request metrics