[JBoss JIRA] (ISPN-5179) Add distributed execution and map/reduce job statistics
by Vladimir Blagojevic (JIRA)
[ https://issues.jboss.org/browse/ISPN-5179?page=com.atlassian.jira.plugin.... ]
Vladimir Blagojevic commented on ISPN-5179:
-------------------------------------------
For the first iteration I propose the following levels of statistics information:
- System wide level
- MapReduceTask level
System wide level includes MapReduceTask ids of the running tasks and completed tasks. We could keep a limited history of completed tasks. A sample output for system wide statistics could be:
{code:title=Sample response|borderStyle=solid}
{
"masterNode":"IP address",
"inprogressCount":"2",
"completedCount":"2",
"inprogress":[
{"id":"actualID1"},
{"id":"actualID2"}
],
"completed":[
{"id":"actualID3"},
{"id":"actualID4"}
]
}
{code}
We could include any other system wide statistic related to map reduce - as the need arise and as we see fit.
MapReduceTask level - given an id of a task in progress or a completed task would return statistics related to that particular task. A sample output could be:
{code:title=Sample response|borderStyle=solid}
{
"runningReduceAttempts" : 1,
"reduceProgress" : 72.104515,
"failedReduceAttempts" : 0,
"mapsRunning" : 0,
"state" : "RUNNING",
"reducesRunning" : 1,
"reducesPending" : 0,
"mapsCompleted" : 1,
"startTime" : 1326860720902,
"id" : "actualID3",
"runningMapAttempts" : 0,
"mapsPending" : 0,
"elapsedTime" : 64432,
"reducesCompleted" : 0,
"mapProgress" : 100,
"failedMapAttempts" : 0,
"finishTime" : 0
}
{code}
> Add distributed execution and map/reduce job statistics
> --------------------------------------------------------
>
> Key: ISPN-5179
> URL: https://issues.jboss.org/browse/ISPN-5179
> Project: Infinispan
> Issue Type: Feature Request
> Components: JMX, reporting and management
> Reporter: Vladimir Blagojevic
> Assignee: Vladimir Blagojevic
> Fix For: 7.2.0.Final
>
>
> We should add DMR/JMX statistics for the running distributed execution jobs as well as map/reduce jobs. The statistics will also include overview/total system statistics of previously executed jobs; we might store statistics of individual executed jobs in some internal cache. However, the primary objective is to calculate and maintain dist.exec and map/reduce job statistics for Infinispan admin console.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 10 months
[JBoss JIRA] (ISPN-5137) Transaction from new topology rolled back (but succeeds on originator)
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-5137?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-5137:
-----------------------------------------------
Sebastian Łaskawiec <slaskawi(a)redhat.com> changed the Status of [bug 1180680|https://bugzilla.redhat.com/show_bug.cgi?id=1180680] from ASSIGNED to MODIFIED
> Transaction from new topology rolled back (but succeeds on originator)
> ----------------------------------------------------------------------
>
> Key: ISPN-5137
> URL: https://issues.jboss.org/browse/ISPN-5137
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.1.0.Alpha1, 7.0.3.Final
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 7.1.0.CR2
>
>
> During a merge of (edg-perf01, edg-perf02, edg-perf03) + (edg-perf04), transaction executed on edg-perf04 in latest topology is silently rolled back and commit command is ignored since the topology has not arrived yet.
> Topologies on edg-perf04
> {code}
> 09:06:38,233 DEBUG [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread-23) Updating cluster-wide current topology for cache testCache, topology = CacheTopology{id=51, rebalanceId=12, currentCH
> =DefaultConsistentHash{ns = 512, owners = (3)[edg-perf03-18534: 171+171, edg-perf02-60481: 171+170, edg-perf01-19198: 170+171]}, pendingCH=null, unionCH=null, actualMembers=[edg-perf03-18534, edg-perf02-60481, e
> dg-perf01-19198]}, availability mode = AVAILABLE
> 09:06:38,248 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread-21) Attempting to execute command on self: CacheTopologyControlCommand{cache=testCache, type=CH_UPDATE, sender=edg-perf04
> -59142, joinInfo=null, topologyId=51, rebalanceId=12, currentCH=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf03-18534: 171+171, edg-perf02-60481: 171+170, edg-perf01-19198: 170+171]}, pendingCH=null, ava
> ilabilityMode=AVAILABLE, actualMembers=[edg-perf03-18534, edg-perf02-60481, edg-perf01-19198], throwable=null, viewId=19}
> 09:06:38,413 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (transport-thread-23) Replication task sending CacheTopologyControlCommand{cache=testCache, type=CH_UPDATE, sender=edg-per
> f04-59142, joinInfo=null, topologyId=52, rebalanceId=12, currentCH=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf03-18534: 171+171, edg-perf02-60481: 171+170, edg-perf01-19198: 170+171]}, pendingCH=null,
> availabilityMode=AVAILABLE, actualMembers=[edg-perf03-18534, edg-perf02-60481, edg-perf01-19198], throwable=null, viewId=19} to addresses null with response mode GET_NONE
> 09:06:38,415 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread-10) Attempting to execute command on self: CacheTopologyControlCommand{cache=testCache, type=CH_UPDATE, sender=edg-perf04
> -59142, joinInfo=null, topologyId=52, rebalanceId=12, currentCH=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf03-18534: 171+171, edg-perf02-60481: 171+170, edg-perf01-19198: 170+171]}, pendingCH=null, ava
> ilabilityMode=AVAILABLE, actualMembers=[edg-perf03-18534, edg-perf02-60481, edg-perf01-19198], throwable=null, viewId=19}
> 09:06:38,451 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread-8) Attempting to execute command on self: CacheTopologyControlCommand{cache=testCache, type=REBALANCE_START, sender=edg-p
> erf04-59142, joinInfo=null, topologyId=53, rebalanceId=13, currentCH=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf03-18534: 171+171, edg-perf02-60481: 171+170, edg-perf01-19198: 170+171]}, pendingCH=Defa
> ultConsistentHash{ns = 512, owners = (4)[edg-perf04-59142: 128+128, edg-perf03-18534: 128+128, edg-perf02-60481: 128+128, edg-perf01-19198: 128+128]}, availabilityMode=null, actualMembers=[edg-perf04-59142, edg-
> perf03-18534, edg-perf02-60481, edg-perf01-19198], throwable=null, viewId=19}
> {code}
> Transaction on edg-perf04 can be executed, since all owners are in current topology, though this node itself is degraded yet:
> {code}
> 09:06:38,545 TRACE [org.infinispan.partionhandling.impl.PartitionHandlingManagerImpl] (DefaultStressor-5) Checking availability for key=key_00000000000022CF, status=DEGRADED_MODE
> 09:06:38,545 TRACE [org.infinispan.partionhandling.impl.PartitionHandlingManagerImpl] (DefaultStressor-5) Key key_00000000000022CF is available.
> 09:06:38,549 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-5) edg-perf04-59142 invoking PrepareCommand {modifications=[PutKeyValueCommand{key=key_00000000000022CF, value=[35 #4: 197, 314, 331, 1101, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false, gtx=GlobalTransaction:<edg-perf04-59142>:23898:local, cacheName='testCache', topologyId=51} to recipient list [edg-perf01-19198, edg-perf02-60481] with options RpcOptions{timeout=60000, unit=MILLISECONDS, fifoOrder=true, totalOrder=false, responseFilter=null, responseMode=SYNCHRONOUS, skipReplicationQueue=false}
> 09:06:38,780 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (DefaultStressor-5) Responses: [sender=edg-perf01-19198, received=true, suspected=false]
> [sender=edg-perf02-60481, received=true, suspected=false]
> 09:06:38,788 TRACE [org.infinispan.remoting.rpc.RpcManagerImpl] (DefaultStressor-5) edg-perf04-59142 invoking CommitCommand {gtx=GlobalTransaction:<edg-perf04-59142>:23898:local, cacheName='testCache', topologyId=51} to recipient list [edg-perf01-19198, edg-perf02-60481] with options RpcOptions{timeout=60000, unit=MILLISECONDS, fifoOrder=false, totalOrder=false, responseFilter=null, responseMode=SYNCHRONOUS_IGNORE_LEAVERS, skipReplicationQueue=false}
> 09:06:39,080 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (DefaultStressor-5) Responses: [sender=edg-perf01-19198, retval=SuccessfulResponse{responseValue=null} , received=true, suspected=false]
> [sender=edg-perf02-60481, retval=SuccessfulResponse{responseValue=null} , received=true, suspected=false]
> {code}
> These are topologies on edg-perf02:
> {code}
> 09:06:38,417 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (remote-thread-55) Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=testCache, type=CH_UPDATE, sender=edg-perf04-59142, joinInfo=null, topologyId=51, rebalanceId=12, currentCH=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf03-18534: 171+171, edg-perf02-60481: 171+170, edg-perf01-19198: 170+171]}, pendingCH=null, availabilityMode=AVAILABLE, actualMembers=[edg-perf03-18534, edg-perf02-60481, edg-perf01-19198], throwable=null, viewId=19} [sender=edg-perf04-59142]
> 09:06:38,467 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (remote-thread-56) Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=testCache, type=CH_UPDATE, sender=edg-perf04-59142, joinInfo=null, topologyId=52, rebalanceId=12, currentCH=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf03-18534: 171+171, edg-perf02-60481: 171+170, edg-perf01-19198: 170+171]}, pendingCH=null, availabilityMode=AVAILABLE, actualMembers=[edg-perf03-18534, edg-perf02-60481, edg-perf01-19198], throwable=null, viewId=19} [sender=edg-perf04-59142]
> 09:06:38,577 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (remote-thread-52) Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=testCache, type=REBALANCE_START, sender=edg-perf04-59142, joinInfo=null, topologyId=53, rebalanceId=13, currentCH=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf03-18534: 171+171, edg-perf02-60481: 171+170, edg-perf01-19198: 170+171]}, pendingCH=DefaultConsistentHash{ns = 512, owners = (4)[edg-perf04-59142: 128+128, edg-perf03-18534: 128+128, edg-perf02-60481: 128+128, edg-perf01-19198: 128+128]}, availabilityMode=null, actualMembers=[edg-perf04-59142, edg-perf03-18534, edg-perf02-60481, edg-perf01-19198], throwable=null, viewId=19} [sender=edg-perf04-59142]
> {code}
> The transaction is successfully prepared on edg-perf02 (even twice as the prepare command is resent from edg-perf01)
> {code}
> 09:06:38,613 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-64) Calling perform() on PrepareCommand {modifications=[PutKeyValueCommand{key=key_00000000000022CF, value=[35 #4: 197, 314, 331, 1101, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false, gtx=GlobalTransaction:<edg-perf04-59142>:23898:remote, cacheName='testCache', topologyId=51}
> 09:06:38,620 TRACE [org.infinispan.util.concurrent.locks.LockManagerImpl] (remote-thread-64) Successfully acquired lock key_00000000000022CF!
> 09:06:38,626 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-64) About to send back response null for command PrepareCommand {modifications=[PutKeyValueCommand{key=key_00000000000022CF, value=[35 #4: 197, 314, 331, 1101, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false, gtx=GlobalTransaction:<edg-perf04-59142>:23898:remote, cacheName='testCache', topologyId=51}
> 09:06:38,670 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-64) Calling perform() on PrepareCommand {modifications=[PutKeyValueCommand{key=key_00000000000022CF, value=[35 #4: 197, 314, 331, 1101, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false,
> gtx=GlobalTransaction:<edg-perf04-59142>:23898:remote, cacheName='testCache', topologyId=52} // resent from edg-perf01
> 09:06:38,672 TRACE [org.infinispan.util.concurrent.locks.LockManagerImpl] (remote-thread-64) Already own lock for entry
> 09:06:38,675 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-64) About to send back response null for command PrepareCommand {modifications=[PutKeyValueCommand{key=key_00000000000022CF, value=[35 #4: 197, 314, 331, 1101, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false, gtx=GlobalTransaction:<edg-perf04-59142>:23898:remote, cacheName='testCache', topologyId=52}
> {code}
> However, as topology 53 is installed, this transaction is rolled back:
> {code}
> 09:06:38,577 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (remote-thread-52) Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=testCache, type=RE
> BALANCE_START, sender=edg-perf04-59142, joinInfo=null, topologyId=53, rebalanceId=13, currentCH=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf03-18534: 171+171, edg-perf02-60481: 171+170, edg-perf01-19198
> : 170+171]}, pendingCH=DefaultConsistentHash{ns = 512, owners = (4)[edg-perf04-59142: 128+128, edg-perf03-18534: 128+128, edg-perf02-60481: 128+128, edg-perf01-19198: 128+128]}, availabilityMode=null, actualMemb
> ers=[edg-perf04-59142, edg-perf03-18534, edg-perf02-60481, edg-perf01-19198], throwable=null, viewId=19} [sender=edg-perf04-59142]
> 09:06:38,658 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl] (remote-thread-52) Installing fake cache topology CacheTopology{id=52, rebalanceId=12, currentCH=DefaultConsistentHash{ns = 512, owners = (3)[edg-perf03-18534: 171+171, edg-perf02-60481: 171+170, edg-perf01-19198: 170+171]}, pendingCH=null, unionCH=null, actualMembers=null} for cache testCache
> 09:06:38,670 DEBUG [org.infinispan.transaction.TransactionTable] (remote-thread-52) The originating node left the cluster for 6 remote transactions
> 09:06:38,675 DEBUG [org.infinispan.transaction.TransactionTable] (remote-thread-52) Rolling back transaction GlobalTransaction:<edg-perf04-59142>:23898:remote because originator edg-perf04-59142 left the cluster
> 09:06:38,675 TRACE [org.infinispan.commands.tx.AbstractTransactionBoundaryCommand] (remote-thread-52) About to execute tx command RollbackCommand {gtx=GlobalTransaction:<edg-perf04-59142>:23898:remote, cacheName
> ='testCache', topologyId=-1}
> {code}
> But even after that, the CommitCommand still suceeds:
> {code}
> 09:06:38,811 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-56) Calling perform() on CommitCommand {gtx=GlobalTransaction:<edg-perf04-59142>:23898:remote, cacheName='testCache', topologyId=51}
> 09:06:38,812 TRACE [org.infinispan.commands.tx.AbstractTransactionBoundaryCommand] (remote-thread-56) Did not find a RemoteTransaction for GlobalTransaction:<edg-perf04-59142>:23898:remote
> 09:06:38,812 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-56) About to send back response SuccessfulResponse{responseValue=null} for command CommitCommand {gtx=GlobalTransaction:<edg-perf04-59142>:23898:remote, cacheName='testCache', topologyId=51}
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 10 months
[JBoss JIRA] (ISPN-5158) Transaction rolled back but returns successful response
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-5158?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-5158:
-----------------------------------------------
Sebastian Łaskawiec <slaskawi(a)redhat.com> changed the Status of [bug 1180680|https://bugzilla.redhat.com/show_bug.cgi?id=1180680] from ASSIGNED to MODIFIED
> Transaction rolled back but returns successful response
> -------------------------------------------------------
>
> Key: ISPN-5158
> URL: https://issues.jboss.org/browse/ISPN-5158
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.1.0.Beta1
> Reporter: Radim Vansa
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: tx.txt, views.txt
>
>
> When the cluster is merging, it is possible that a node is removed from the view although it is still responsive. Eventually the cluster is merged correctly, but since the node is reported as missing from the view, transaction originating from this node is rolled back.
> {code}
> 10:01:36,116 TRACE [org.infinispan.interceptors.TxInterceptor] (remote-thread-151) Rolling back remote transaction GlobalTransaction:<edg-perf02-39415>:28106:remote because either already completed(false) or originator no longer in the cluster(true).
> {code}
> However, even after this a successful response is sent to the originator:
> {code}
> 10:01:36,119 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (remote-thread-151) About to send back response null for command PrepareCommand {modifications=[PutKeyValueCommand{key=key_0000000000001318, value=[19 #1: 1195, ], flags=[SKIP_CACHE_LOAD, SKIP_REMOTE_LOOKUP], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false, gtx=GlobalTransaction:<edg-perf02-39415>:28106:remote, cacheName='testCache', topologyId=47}
> {code}
> Originator then expects that the transaction was successfully prepared:
> {code}
> 10:01:36,124 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (DefaultStressor-9) Responses: [sender=edg-perf01-36235, received=true, suspected=false]
> [sender=edg-perf03-24110, received=true, suspected=false]
> 10:01:36,135 TRACE [org.infinispan.transaction.TransactionCoordinator] (DefaultStressor-9) Committing transaction GlobalTransaction:<edg-perf02-39415>:28106:local
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 10 months