[
https://issues.jboss.org/browse/ISPN-1883?page=com.atlassian.jira.plugin....
]
Dan Berindei edited comment on ISPN-1883 at 3/1/12 3:26 AM:
------------------------------------------------------------
The error appears when the coordinator leaves the cluster, another coordinator starts to
install a new cache view, and then it also dies before finishing the view installation.
The 3rd coordinator will pick the same view id that the 2nd coordinator was trying to
install, resulting in the IllegalStateException. After the exception, the 3rd coordinator
will retry to install the cache view and succeed (because it's using a higher view
id):
{noformat}
v1|A = [A, B, C, D] - initial view
A dies
v2|B = [B, C, D]
B dies - causing v2|B to fail
v2|C = [C, D] - fails because it has the same view id
v3|C = [C, D] - succeeds
{noformat}
In this particular case the coordinators were shut down 30 seconds apart, but because of
JGRP-1428 the first state transfer didn't finish until the 2nd view change (when the
2nd coordinator had already left the cluster).
Since the 2nd attempt to install the cache view is successful and this issue only happens
in rare circumstances, I'm postponing it to 5.2.0.
Relevant log messages:
{noformat}
2012-02-22 13:13:34,352 INFO [JGroupsTransport] (Incoming-2,edg-perf03-57571) ISPN000094:
Received new cluster view: [edg-perf01-39846|3] [edg-perf01-39846, edg-perf02-36519,
edg-perf03-57571, edg-perf04-36539]
2012-02-22 13:14:11,897 DEBUG [CacheViewsManagerImpl]
(CacheViewInstaller-3,edg-perf02-36519) Installing new view CacheView{viewId=7,
members=[edg-perf04-36539, edg-perf02-36519, edg-perf03-57571]} for cache testCache
2012-02-22 13:14:41,827 INFO [JGroupsTransport] (Incoming-12,edg-perf03-57571)
ISPN000094: Received new cluster view: [edg-perf03-57571|5] [edg-perf03-57571,
edg-perf04-36539]
2012-02-22 13:14:42,266 WARN [CacheViewControlCommand] (OOB-18,edg-perf04-36539)
ISPN000071: Caught exception when handling command
CacheViewControlCommand{cache=testCache, type=PREPARE_VIEW, sender=edg-perf02-36519,
newViewId=7, newMembers=[edg-perf04-36539, edg-perf02-36519, edg-perf03-57571],
oldViewId=4, oldMembers=[edg-perf01-39846, edg-perf02-36519, e
dg-perf03-57571, edg-perf04-36539]}
org.infinispan.statetransfer.StateTransferCancelledException
2012-02-22 13:14:42,273 DEBUG [CacheViewsManagerImpl]
(CacheViewInstaller-3,edg-perf03-57571) Installing new view CacheView{viewId=7,
members=[edg-perf04-36539, edg-perf03-57571]} for cache testCache
2012-02-22 13:14:42,296 ERROR [CacheViewsManagerImpl]
(CacheViewInstaller-3,edg-perf03-57571) ISPN000172: Failed to prepare view
CacheView{viewId=7, members=[edg-perf04-36539, edg-perf03-57571]} for cache testCache,
rolling back to view CacheView{viewId=6, members=[edg-perf01-39846, edg-perf02-36519,
edg-perf03-57571, edg-perf04-36539]}
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Trying to block
write commands but they are already blocked for view 7
2012-02-22 13:14:43,274 DEBUG [CacheViewsManagerImpl]
(CacheViewInstaller-3,edg-perf03-57571) Installing new view CacheView{viewId=9,
members=[edg-perf03-57571, edg-perf04-36539]} for cache testCache
2012-02-22 13:14:46,409 DEBUG [CacheViewsManagerImpl]
(CacheViewInstaller-3,edg-perf03-57571) testCache: Committing cache view
CacheView{viewId=9, members=[edg-perf03-57571, edg-perf04-36539]}
{noformat}
was (Author: dan.berindei):
The error appears when the coordinator leaves the cluster, another coordinator starts
to install a new cache view, and then it also dies before finishing the view installation.
The 3rd coordinator will pick the same view id that the 2nd coordinator was trying to
install, resulting in the IllegalStateException. After the exception, the 3rd coordinator
will retry to install the cache view and succeed (because it's using a higher view
id):
v1|A = [A, B, C, D] - initial view
A dies
v2|B = [B, C, D] - fails because B died
B dies
v2|C = [C, D] - fails because it has the same view id
v3|C = [C, D] - succeeds
In this particular case the coordinators were shut down 30 seconds apart, but because of
JGRP-1428 the first state transfer didn't finish until the 2nd view change (when the
2nd coordinator had already left the cluster).
Since the 2nd attempt to install the cache view is successful and this issue only happens
in rare circumstances, I'm postponing it to 5.2.0.
IllegalStateException during view installation
----------------------------------------------
Key: ISPN-1883
URL:
https://issues.jboss.org/browse/ISPN-1883
Project: Infinispan
Issue Type: Bug
Affects Versions: 5.1.1.FINAL
Reporter: Michal Linhard
Assignee: Dan Berindei
Fix For: 5.2.0.FINAL
Problem found in this special build from a branch:
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-QE/job/edg-60-build-...
branch:
https://github.com/maniksurtani/infinispan/tree/t_st_perf_1
http://hudson.qa.jboss.com/hudson/view/EDG6/view/EDG-REPORTS-RESILIENCE/j...
Installation of view 7 is started on node02, then on node03, on node03 it fails with:
{code}
2012-02-22 13:14:42,296 ERROR [CacheViewsManagerImpl]
(CacheViewInstaller-3,edg-perf03-57571) ISPN000172: Failed to prepare view
CacheView{viewId=7, members=[edg-perf04-36539, edg-perf03-57571]} for cache testCache,
rolling back to view CacheView{viewId=6, members=[edg-perf01-39846, edg-perf02-36519,
edg-perf03-57571, edg-perf04-36539]}
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Trying to block
write commands but they are already blocked for view 7
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
at java.util.concurrent.FutureTask.get(FutureTask.java:91)
at
org.infinispan.cacheviews.CacheViewsManagerImpl.clusterPrepareView(CacheViewsManagerImpl.java:319)
at
org.infinispan.cacheviews.CacheViewsManagerImpl.clusterInstallView(CacheViewsManagerImpl.java:250)
at
org.infinispan.cacheviews.CacheViewsManagerImpl$ViewInstallationTask.call(CacheViewsManagerImpl.java:876)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalStateException: Trying to block write commands but they are
already blocked for view 7
at
org.infinispan.statetransfer.StateTransferLockImpl.blockNewTransactions(StateTransferLockImpl.java:233)
at
org.infinispan.statetransfer.DistributedStateTransferTask.doPerformStateTransfer(DistributedStateTransferTask.java:102)
at
org.infinispan.statetransfer.BaseStateTransferTask.performStateTransfer(BaseStateTransferTask.java:93)
at
org.infinispan.statetransfer.BaseStateTransferManagerImpl.prepareView(BaseStateTransferManagerImpl.java:294)
at
org.infinispan.cacheviews.CacheViewsManagerImpl.handlePrepareView(CacheViewsManagerImpl.java:486)
at
org.infinispan.commands.control.CacheViewControlCommand.perform(CacheViewControlCommand.java:125)
at
org.infinispan.remoting.InboundInvocationHandlerImpl.handle(InboundInvocationHandlerImpl.java:95)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommand(CommandAwareRpcDispatcher.java:161)
at
org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:141)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:447)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:354)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:230)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:543)
at org.jgroups.JChannel.up(JChannel.java:716)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1026)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:881)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:332)
at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:697)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:559)
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:167)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:282)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:205)
at org.jgroups.protocols.Discovery.up(Discovery.java:355)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1174)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1722)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1704)
... 3 more
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira