"bstansberry(a)jboss.com" wrote : So are you thinking about a service that
generates thread dumps (or tweaking the existing one), with configuration options to
control where the dumps go?
Something along those lines but was thinking more about a service that lies in the
management console that responds to specific JMX events and then requests generation of
thread dumps. Could be a standard service in AS, but seems to fit more the monitoring
area.
This service should be able to instruct not only the local node, but other cluster members
to generate thread dumps as well. For example if the TE happened when doing a sync repl,
or when doing a sync JGroups RPC call, request the node where the sync repl/rpc failed to
generate a thread dump.
Information that would be needed:
- timestamp (kill -3 does not provide timestamp of the thread dump!)
- thread dump
- some kind of unique id shared by all thread dumps in all nodes that were generated from
a specific failure.
- some information to match the thread dump(s) to the failure in the logs.
The aim is for someone to be able to say something like this: "Machine A reported a
TE (with suspected=false) and these are the thread dumps taken immediately after from
Machines B,C,D in the cluster that are associated with this TE. I have already the GC logs
in case the TE was due to long garbage collection."
"bstansberry(a)jboss.com" wrote : With hooks to inject the service into other
interested services, e.g. HAPartition? HAPartition would then decide whether an event
(e.g. a timeout on an RPC) justifies calling into the thread dump service.
|
| In the case of an RPC timeout, only the caller knows it happened.
That could work as well. My comment above on JMX notifications would be pretty much this.
HAPartition generates a JMX notification upon an RPC timeout and a service in the
monitoring tool does the job.
Where do you think such service would fit better?
One thing I need to get to the bottom of regarding thread dumps is that
Thread.getAllStackTraces() does not provide the same information a kill -3. Some lock
information seems to be missing from Thread.getAllStackTraces(), which is why I recommend
against JMX method to generate stack traces.
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4120230#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...