[jboss-jira] [JBoss JIRA] Created: (JBMESSAGING-1854) ConcurrentModificationException in org.jboss.messaging.core.impl.postoffice.MessagingPostOffice in Clustered JMS deployment
Ryan Hochstetler (JIRA)
jira-events at lists.jboss.org
Thu Mar 24 14:59:45 EDT 2011
ConcurrentModificationException in org.jboss.messaging.core.impl.postoffice.MessagingPostOffice in Clustered JMS deployment
----------------------------------------------------------------------------------------------------------------------------
Key: JBMESSAGING-1854
URL: https://issues.jboss.org/browse/JBMESSAGING-1854
Project: JBoss Messaging
Issue Type: Bug
Components: JMS Clustering
Affects Versions: 1.4.7.GA
Environment: Dell PowerEdge M1000e Chassis with 16 PowerEdgeM610 blades. Each blade has 2 Intel 2.40 GHz with 32GB of memory
Windows Server 2008 64-bit.
JRE 1.6.0_22
JBM 1.4.7 deployed in JBoss AS 5.1.0.GA
oracle-persistence-service.xml, on 3-blade Oracle RAC 11.2.0.2
Reporter: Ryan Hochstetler
We recently changed how we start JBoss, and it has uncovered a concurrency problem in JBoss Messaging's clustering.
Previously, we started each of the 32 JBoss instances in serial. Of course, you can imagine that this takes forever. Recently, one of our integration engineers got RHQ working, so we created two startup groups. One that contains just the first server. He boots fully, and becomes HASingleton and JGroups coordinator on all channels, and then another group that contains the other 31 nodes. The 31 other nodes now start mostly in parallel.
And that's when the ConcurrentModificationExceptions began.
[22 Mar 2011 21:01:38,982] [ERROR] [org.jboss.messaging.core.impl.postoffice.GroupMember] - Caught Exception in RequestHandler
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
at java.util.HashMap$EntryIterator.next(Unknown Source)
at java.util.HashMap$EntryIterator.next(Unknown Source)
at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.findNodeIDForAddress(MessagingPostOffice.java:2289)
at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.calculateFailoverMap(MessagingPostOffice.java:2225)
at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.handleNodeJoined(MessagingPostOffice.java:1337)
at org.jboss.messaging.core.impl.postoffice.JoinClusterRequest.execute(JoinClusterRequest.java:68)
at org.jboss.messaging.core.impl.postoffice.GroupMember$ControlRequestHandler.handle(GroupMember.java:648)
at org.jgroups.blocks.MessageDispatcher.handle(MessageDispatcher.java:616)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:637)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:545)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:368)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:775)
at org.jgroups.JChannel.up(JChannel.java:1336)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:454)
at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:486)
at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:153)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
at org.jgroups.protocols.FC.up(FC.java:473)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:820)
at org.jgroups.protocols.VIEW_SYNC.up(VIEW_SYNC.java:192)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:233)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:328)
at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:895)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:708)
at org.jgroups.protocols.BARRIER.up(BARRIER.java:136)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
at org.jgroups.protocols.FD.up(FD.java:284)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:307)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
at org.jgroups.protocols.Discovery.up(Discovery.java:264)
at org.jgroups.protocols.PING.up(PING.java:273)
at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2315)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1249)
at org.jgroups.protocols.TP.access$100(TP.java:49)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1826)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1805)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
[22 Mar 2011 21:01:38,983] [ERROR] [org.jgroups.blocks.RequestCorrelator] - error invoking method
java.lang.IllegalStateException
at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
at java.util.HashMap$EntryIterator.next(Unknown Source)
at java.util.HashMap$EntryIterator.next(Unknown Source)
at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.findNodeIDForAddress(MessagingPostOffice.java:2289 atorg.jboss.messaging.core.impl.postoffice.MessagingPostOffice.calculateFailoverMap(MessagingPostOffice.java:2225)
at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.handleNodeJoined(MessagingPostOffice.java:1337)
at org.jboss.messaging.core.impl.postoffice.JoinClusterRequest.execute(JoinClusterRequest.java:68)
at org.jboss.messaging.core.impl.postoffice.GroupMember$ControlRequestHandler.handle(GroupMember.java:648)
at org.jgroups.blocks.MessageDispatcher.handle(MessageDispatcher.java:616)
at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:637)
at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:545)
at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:368)
at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:775)
at org.jgroups.JChannel.up(JChannel.java:1336)
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:454)
at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:486)
at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:153)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:188)
at org.jgroups.protocols.FC.up(FC.java:473)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:820)
at org.jgroups.protocols.VIEW_SYNC.up(VIEW_SYNC.java:192)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:233)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:328)
at org.jgroups.protocols.pbcast.NAKACK.handleMessage(NAKACK.java:895)
at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:708)
at org.jgroups.protocols.BARRIER.up(BARRIER.java:136)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
at org.jgroups.protocols.FD.up(FD.java:284)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:307)
at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
at org.jgroups.protocols.Discovery.up(Discovery.java:264)
at org.jgroups.protocols.PING.up(PING.java:273)
at org.jgroups.protocols.TP$ProtocolAdapter.up(TP.java:2315)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1249)
at org.jgroups.protocols.TP.access$100(TP.java:49)
at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1826)
at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1805)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
The CME logged at relatively the same time on several cluster nodes. Two nodes attempted to join the cluster within 9 seconds of one another. These two messages are from the same server:
[22 Mar 2011 21:01:27,142] [ INFO] [org.jboss.messaging.core.impl.postoffice.GroupMember] - New Members : 1 ([10.15.20.168:52397])
[22 Mar 2011 21:01:36,460] [ INFO] [org.jboss.messaging.core.impl.postoffice.GroupMember] - New Members : 1 ([10.15.20.166:58958])
MessagingPostOffice seems to be a singleton, per my heap dump. It appears that MPO.handleNodeJoined() executes a put() and then iterates over nodeIDAddressMap (by means of calculateFailoverMap()). If handleNodeJoined() were invoked by two JGroups threads concurrently, I can see how the CME would result. nodeIdAddressMap is not a thread-safe collection, and does not appear to be guarded by anything. I'm going to try to make this class thread-safe myself, since I have no delusions that you're interested in fixing this bug for me. I assume/see that most of your attention is on HornetQ, but perhaps someone else can benefit from me documenting the problem. I'll upload what I'm permitted by my company to disclose when I find a solid solution. I'm hoping it's as simple as synchronizing handleNodeJoined().
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the jboss-jira
mailing list