Paul Robinson [
https://community.jboss.org/people/paul.robinson] modified the document:
"TransactionMonitoringAndVisualization"
To view the document, visit:
https://community.jboss.org/docs/DOC-48255
--------------------------------------------------------------
h1. Transaction Profiling and Visualization
h3. Overview of the Tool
The purpose of this project is to create a tool for profiling the status of in-flight and
complete transactions. The main aims of the tool are to help debug transaction related
issues and also to provide insight into the running of the system. Essentially, the tool
profiles a running system (or a log from a running system) and produces a detailed list of
all transactions. A user may then select a particular transaction to see in more detail,
including the participants involved in the transaction, the outcome of the transaction and
the participants that influenced the outcome. The tool could be visual, showing a
diagrammatic view of a transaction or may be command line based, producing a textual
output (or maybe both).
h3. What kind of problems could this solve?
* Transaction Timeout detection. We frequently get support requests (via the forums or
elsewhere) from users who are experiencing intermittent rollbacks of transactions due to
timeout. It is not a simple matter for the user to figure out that this is what's
happening.
* Transaction Profiling. It may be relatively easy to detect transactions that are taking
longer than desired to complete. However, diagnosing which party in the transaction is to
blame is non-trivial.
* Loops and Diamonds detection. In a distributed transaction it is possible to introduce a
loop or a diamond without realizing. JTS may tolerate this, but distributed JTA and
bridged WS-AT to JTA does not. Identifying this scenario is non-trivial and requires
detailed internal knowledge of the TM.
* Performance Profiling. Diagnosing which transactions are taking a long time to run and
discovering where the time is being spent.
* Reasoning about the System Architecture. Producing an architectural diagram of a
distributed transaction with many participants and servers is not an easy task.
Furthermore it can be error prone and could change based on the application inputs. The
difficulty of this task is further compounded if multiple transaction types are involved
(WS-AT, WS-BA, REST-AT, etc). This diagram is essential for reasoning about the current
architecture and for considering improvements or changes. Also this diagram alone may not
be enough without detailed profiling overlayed.
* Analyzing Disk Syncs. One approach to improving the performance of a transaction is to
reduce the number of disk syncs. The problem is that it is often difficult to calculate
exactly how many you are performing and what delay each adds.
h3. Who is the target audience?
* End users, to diagnose their own issues
* GSS, to diagnose customer issues
* Architects, to analyze their system architecture
* Those new to the field of transactions, learning what's going on.
h3. GSS Requirements
We are currently targetting GSS as our early adoptors for this tool. We will also make
early release available to the community and work with any early adopters when they
apear.
* JTA-only at the moment then JTS. XTS and REST not a requirement for the forseable
future.
* Display the following information* List of transactions with outcome (success or
failure)* Detailed information on a per transaction basis* List of resources* JNDI name of
resource
* Indication as to the type and vendor
* Outcome of its participation in the protocol
* Event log of the activities that occured in the transaction (with timings).
* If timeout, what caused it.
* Recovery information [VERIFY]
* List of subordinate transactions (in JTS case)
* Query transactions* By outcome.
* By failure reason
* Inflight* Stuck (inflight for a long time)
* Recovered transactions [VERIFY]
* Parse existing logs submitted by customers
* Target EAP 6.1 [VERIFY]
* Web and command-line interfaces [VERIFY]
* Single-server (JTA-only) requirement higher than distributed (JTS) requirement at the
momment.
* Need a non-byteman solution. However, Byteman could be used for optional features, where
no other alternative is available.* Many users will be unwilling to enable Byteman in
production.
* The logs must not leave the business domain [VERIFY]* Many users will not be willing to
have their logs submitted to some third-party service for processing.
h3. How does it solve these problems?
The tool analyses the server log and gathers data on every transaction ran. The tool could
also support live updates for a server that is still running. The tool assumes that it can
gather all its information from log statements. If it can't, we take the view that
some logging is missing and then add it. We may also want the tool to support different
log levels. For example, it may need a log level of TRACE to acquire full knowledge of the
system. This may introduce too much of a performance overhead, so we may want to allow the
tool to tollerate the reduced information provided by the less verbose log levels. In this
case only basic information would be provided.
It is unlikely that this tool be used routinely in production. I think the sensible thing
to do is to gather aggregate information in such a way that has minimal impact on the
running system. This is mostly covered at the moment, as Narayana can obtain, rollback,
commit and in-flight transaction counters. This information is great for identifying the
presence of a problem, but in many cases the user would then need to turn up the logging
to establish the root cause of the problem. This is similar to how developers improve the
performance of a system. Aggregate data (i.e. average throughput, average response times)
is used to identify that a problem exists. The developer then attaches a profiler to
identify the root cause of the performance problem.
I see this tool more as a profiling tool to be used when the aggregate data has identified
an issue. In which case, the overhead of gathering this data is less of an issue, assuming
it doesn't impact the system to the point where it skews the data being gathered.
h6. Querying feature
The data can be queried to find out specific information. For example, show me all the
transactions that rolled back. Given a rolledback transaction, the data should be
available to diagnose exactly why it rolledback. Other things you may want to query by:
* *Outcome*. Committed, Rolledback, heuristic, etc
* *Type*. JTA, JTS, WS-AT, REST-AT, etc
* *Duration*. Find all fast or slow transactions
* *Inflight*. Show transactions currently inflight
* *Stuck*. Inflight transactions that have been running for longer than Xms.
* *Recovery*. All transactions that needed to be recovered.
h6. Diagnostics
The tool could also allow 'profiles' to be plugged in. A 'profile' is
responsible for identifying common problems that we see users having. Hopefully these
would allow issues to be identified earlier before they are passed further up the support
chain. For example, we may create a 'profile' that searches over the data looking
for loops or diamonds. Another could be responsible for identifying timeouts and maybe
hinting at their cause. These 'profiles' would be created based on community
demand.
h6. Visualization
In order to help architects reason about the system architecture, the tool could produce a
visualization of a transaction of interest. A few diagrams should give you an idea of what
the tooling could produce:
https://community.jboss.org/servlet/JiveServlet/showImage/102-48255-11-20...
https://community.jboss.org/servlet/JiveServlet/downloadImage/102-48255-1...
Here we can see that a transaction was begun on Server1. It enlisted a DB and JMS queue
locally and invoked a remote service on Server2. A second resource (DB2) was enlisted on
Server2.
https://community.jboss.org/servlet/JiveServlet/showImage/102-48255-11-20...
https://community.jboss.org/servlet/JiveServlet/downloadImage/102-48255-1...
In this example we can see that the transaction rolled back because DB2 voted rollback.
Other possible features:
* *Participant types*. Display if the participant is one or two phase aware for JTA/JTS.
Display the completion type (Participant/Coordinator) for WS-BA.
* * Resource types*. Whether it's a database or message queue and maybe include the
name and version.
* *Multiple Transaction Types*. Somehow depict the type of transaction (JTS, WS-AT,
REST-AT, etc).
* *Display Bridges*. Display when a transaction is bridged from one type to another
* *In flight Transactions*. More useful for long running transactions; display the current
status of each participant and update the display in real-time as the status changes.
* *Recovery Status*. Show which resources crashed and the status of those resources
already committed or recovered.
h3. Reducing the amount of data
It's likely that enabling all the required TRACE logging will generate too large a
log, and slow the system down. Therefore to optimize this we could do a number of things:
1. Use Byteman to do the TRACE logging anyway (despite being turned off), for just the
items we require
2. Use Byteman to somehow buffer up detailed logging in memory, for all transactions.
This is discarded for committed transactions and dumped to the log for failed
transactions.
3. Update Narayana to dump a lot of data when the transaction fails. Same as the Byteman
approach, but will be useful for more users.
We should also reduce the amount of data held for 'identical' transactions. For
example, we may have 1,000's of transactions that follow the same paths, interacting
with the same resources. We should simply store this data once along with aggregate data
for all the associated transactions.
h3. Research
Things to look into:
*
http://pic.dhe.ibm.com/infocenter/wasinfo/v8r5/index.jsp?topic=%2Fcom.ibm...
http://pic.dhe.ibm.com/infocenter/wasinfo/v8r5/index.jsp?topic=%2Fcom.ibm...
IBM seem to have an API for being notified of all completed transactions.
*
https://blog.heroku.com/archives/2013/3/19/log2viz
https://blog.heroku.com/archives/2013/3/19/log2viz
*
http://logstash.net/ http://logstash.net/
h3. Example - log grepping
Edit the log levels in standalone.xml setting the "com.arjuna" level to TRACE
and the CONSOLE logger to TRACE. You'll then see a lot of logging produced by the
transaction manager. Here's some interesting messages for different scenarios:
h5. Commit
09:20:38,554 TRACE [com.arjuna.ats.arjuna] (pool-1-thread-1) BasicAction::Begin() for
action-id 0:ffffac118223:61d0e901:515016c7:13
[0m09:20:38,558 TRACE [com.arjuna.ats.jta] (pool-1-thread-1)
TransactionImple.enlistResource ( org.jboss.narayana.txvis.simple.DummyXAResource@47b24153
)
[0m09:20:38,563 TRACE [com.arjuna.ats.jta] (pool-1-thread-1)
TransactionImple.enlistResource ( org.jboss.narayana.txvis.simple.DummyXAResource@1bb557c8
)
09:20:38,565 TRACE [com.arjuna.ats.arjuna] (pool-1-thread-1) BasicAction::prepare () for
action-id 0:ffffac118223:61d0e901:515016c7:13
09:20:38,566 TRACE [com.arjuna.ats.jta] (pool-1-thread-1) XAResourceRecord.topLevelPrepare
for XAResourceRecord <
resource:org.jboss.narayana.txvis.simple.DummyXAResource@47b24153, txid:<
formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:61d0e901:515016c7:13, node_name=1,
branch_uid=0:ffffac118223:61d0e901:515016c7:14, subordinatenodename=null, eis_name=unknown
eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@6e09388c >
09:20:38,567 TRACE [com.arjuna.ats.jta] (pool-1-thread-1) XAResourceRecord.topLevelPrepare
for XAResourceRecord <
resource:org.jboss.narayana.txvis.simple.DummyXAResource@1bb557c8, txid:<
formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:61d0e901:515016c7:13, node_name=1,
branch_uid=0:ffffac118223:61d0e901:515016c7:18, subordinatenodename=null, eis_name=unknown
eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@1f5fa7c4 >
09:20:38,573 TRACE [com.arjuna.ats.arjuna] (pool-1-thread-1) BasicAction::doCommit
(XAResourceRecord < resource:org.jboss.narayana.txvis.simple.DummyXAResource@47b24153,
txid:< formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:61d0e901:515016c7:13, node_name=1,
branch_uid=0:ffffac118223:61d0e901:515016c7:14, subordinatenodename=null, eis_name=unknown
eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@6e09388c >)
09:20:38,574 TRACE [com.arjuna.ats.jta] (pool-1-thread-1) XAResourceRecord.topLevelCommit
for XAResourceRecord <
resource:org.jboss.narayana.txvis.simple.DummyXAResource@47b24153, txid:<
formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:61d0e901:515016c7:13, node_name=1,
branch_uid=0:ffffac118223:61d0e901:515016c7:14, subordinatenodename=null, eis_name=unknown
eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@6e09388c >
09:20:38,574 TRACE [com.arjuna.ats.arjuna] (pool-1-thread-1) BasicAction::doCommit
(XAResourceRecord < resource:org.jboss.narayana.txvis.simple.DummyXAResource@1bb557c8,
txid:< formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:61d0e901:515016c7:13, node_name=1,
branch_uid=0:ffffac118223:61d0e901:515016c7:18, subordinatenodename=null, eis_name=unknown
eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@1f5fa7c4 >)
09:20:38,574 TRACE [com.arjuna.ats.jta] (pool-1-thread-1) XAResourceRecord.topLevelCommit
for XAResourceRecord <
resource:org.jboss.narayana.txvis.simple.DummyXAResource@1bb557c8, txid:<
formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:61d0e901:515016c7:13, node_name=1,
branch_uid=0:ffffac118223:61d0e901:515016c7:18, subordinatenodename=null, eis_name=unknown
eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@1f5fa7c4 >
09:20:38,575 TRACE [com.arjuna.ats.arjuna] (pool-1-thread-1)
FileSystemStore.remove_committed(0:ffffac118223:61d0e901:515016c7:13,
/StateManager/BasicAction/TwoPhaseCoordinator/AtomicAction)
h5. Client Driven Rollback
[0m10:03:41,737 TRACE [com.arjuna.ats.arjuna] (pool-2-thread-1) BasicAction::Begin() for
action-id 0:ffffac118223:-507ee916:515020eb:13
[0m10:03:41,741 TRACE [com.arjuna.ats.jta] (pool-2-thread-1)
TransactionImple.enlistResource ( org.jboss.narayana.txvis.simple.DummyXAResource@5cbf424a
)
[0m10:03:41,747 TRACE [com.arjuna.ats.jta] (pool-2-thread-1)
TransactionImple.enlistResource ( org.jboss.narayana.txvis.simple.DummyXAResource@53adb4e5
)
[0m10:03:41,749 TRACE [com.arjuna.ats.arjuna] (pool-2-thread-1) BasicAction::Abort() for
action-id 0:ffffac118223:-507ee916:515020eb:13
[0m10:03:41,750 TRACE [com.arjuna.ats.jta] (pool-2-thread-1)
XAResourceRecord.topLevelAbort for XAResourceRecord <
resource:org.jboss.narayana.txvis.simple.DummyXAResource@5cbf424a, txid:<
formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:-507ee916:515020eb:13, node_name=1,
branch_uid=0:ffffac118223:-507ee916:515020eb:14, subordinatenodename=null,
eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@6c28e401 >
[0m10:03:41,751 TRACE [com.arjuna.ats.jta] (pool-2-thread-1)
XAResourceRecord.topLevelAbort for XAResourceRecord <
resource:org.jboss.narayana.txvis.simple.DummyXAResource@53adb4e5, txid:<
formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:-507ee916:515020eb:13, node_name=1,
branch_uid=0:ffffac118223:-507ee916:515020eb:18, subordinatenodename=null,
eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@70913520 >
h5. Participant Driven Rollback
[0m10:15:52,446 TRACE [com.arjuna.ats.arjuna] (pool-2-thread-1) BasicAction::Begin() for
action-id 0:ffffac118223:-6fc6eada:515023cb:13
[0m10:15:52,450 TRACE [com.arjuna.ats.jta] (pool-2-thread-1)
TransactionImple.enlistResource ( org.jboss.narayana.txvis.simple.DummyXAResource@72e0b50
)
[0m10:15:52,455 TRACE [com.arjuna.ats.jta] (pool-2-thread-1)
TransactionImple.enlistResource ( org.jboss.narayana.txvis.simple.DummyXAResource@7aa354c9
)
[0m10:15:52,458 TRACE [com.arjuna.ats.jta] (pool-2-thread-1)
XAResourceRecord.topLevelPrepare for XAResourceRecord <
resource:org.jboss.narayana.txvis.simple.DummyXAResource@72e0b50, txid:<
formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:-6fc6eada:515023cb:13, node_name=1,
branch_uid=0:ffffac118223:-6fc6eada:515023cb:14, subordinatenodename=null,
eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@50150c84 >
[0m10:15:52,459 TRACE [com.arjuna.ats.jta] (pool-2-thread-1)
XAResourceRecord.topLevelPrepare for XAResourceRecord <
resource:org.jboss.narayana.txvis.simple.DummyXAResource@7aa354c9, txid:<
formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:-6fc6eada:515023cb:13, node_name=1,
branch_uid=0:ffffac118223:-6fc6eada:515023cb:18, subordinatenodename=null,
eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@15e8d72f >
[0m [33m10:15:52,460 WARN [com.arjuna.ats.jta] (pool-2-thread-1) ARJUNA016041: prepare
on < formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:-6fc6eada:515023cb:13, node_name=1,
branch_uid=0:ffffac118223:-6fc6eada:515023cb:18, subordinatenodename=null,
eis_name=unknown eis name > (org.jboss.narayana.txvis.simple.DummyXAResource@7aa354c9)
failed with exception XAException.XAER_RMERR: javax.transaction.xa.XAException
[0m [33m10:15:52,467 WARN [com.arjuna.ats.arjuna] (pool-2-thread-1) ARJUNA012073:
BasicAction.End() - prepare phase of action-id 0:ffffac118223:-6fc6eada:515023cb:13
failed.
[0m [33m10:15:52,467 WARN [com.arjuna.ats.arjuna] (pool-2-thread-1) ARJUNA012075: Action
Aborting
[0m10:15:52,467 TRACE [com.arjuna.ats.arjuna] (pool-2-thread-1)
BasicAction::phase2Abort() for action-id 0:ffffac118223:-6fc6eada:515023cb:13
[0m10:15:52,468 TRACE [com.arjuna.ats.arjuna] (pool-2-thread-1) BasicAction::doAbort
(XAResourceRecord < resource:org.jboss.narayana.txvis.simple.DummyXAResource@72e0b50,
txid:< formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:-6fc6eada:515023cb:13, node_name=1,
branch_uid=0:ffffac118223:-6fc6eada:515023cb:14, subordinatenodename=null,
eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@50150c84 >)
[0m10:15:52,468 TRACE [com.arjuna.ats.jta] (pool-2-thread-1)
XAResourceRecord.topLevelAbort for XAResourceRecord <
resource:org.jboss.narayana.txvis.simple.DummyXAResource@72e0b50, txid:<
formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:-6fc6eada:515023cb:13, node_name=1,
branch_uid=0:ffffac118223:-6fc6eada:515023cb:14, subordinatenodename=null,
eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@50150c84 >
[0m10:15:52,468 TRACE [com.arjuna.ats.arjuna] (pool-2-thread-1) BasicAction::doAbort
(XAResourceRecord < resource:org.jboss.narayana.txvis.simple.DummyXAResource@7aa354c9,
txid:< formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:-6fc6eada:515023cb:13, node_name=1,
branch_uid=0:ffffac118223:-6fc6eada:515023cb:18, subordinatenodename=null,
eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@15e8d72f >)
[0m10:15:52,469 TRACE [com.arjuna.ats.jta] (pool-2-thread-1)
XAResourceRecord.topLevelAbort for XAResourceRecord <
resource:org.jboss.narayana.txvis.simple.DummyXAResource@7aa354c9, txid:<
formatId=131077, gtrid_length=29, bqual_length=36,
tx_uid=0:ffffac118223:-6fc6eada:515023cb:13, node_name=1,
branch_uid=0:ffffac118223:-6fc6eada:515023cb:18, subordinatenodename=null,
eis_name=unknown eis name >, heuristic: TwoPhaseOutcome.FINISH_OK
com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord@15e8d72f >
h3.
h3. Data Model
The basic data model currently used in early iterations of the tool is described in the
following ER diagram
https://community.jboss.org/servlet/JiveServlet/showImage/102-48255-11-20...
https://community.jboss.org/servlet/JiveServlet/downloadImage/102-48255-1...
--------------------------------------------------------------
Comment by going to Community
[
https://community.jboss.org/docs/DOC-48255]
Create a new document in JBoss Transactions Development at Community
[
https://community.jboss.org/choose-container!input.jspa?contentType=102&a...]