[jboss-user] [JBoss Messaging] - Re: JBM exception

Mon Sep 15 14:58:40 EDT 2008

I am having the exact same issue when I try to cluster the application.  It is a very difficult situation to create a simple test for as it includes MDBs, datasources, databases, clusters, jboss messaging, etc.

I think it breaks down to a failure of the cluster.  I have 2 boxes A & B.  Both A & B have jboss messaging installed based upon the documentation.  They both use a shared Oracle data store.  I am using a shared (configured) Queue for request, and temporary queues for responses (a basic pattern I think).  Things seem to work well under light load, but upon funkload'ing a sample app to trigger the mechanisms with high load the cluster seems to fail.  

I am using the system like this.
* I have a simple one page, one form JSF app that will fire a JMS request on my common request queue.  This app will create a temporary response queue, listen on it with a java "Future" object, put a message on the global configured queue with appropriate reply-to, a MDB echo's back a hard coded string on the reply-to, the app echo's response to screen.
* I have my app configured such that if the request goes to A, the we use A's 1099 port for JMS lookups, and similarly on B.
* I am only hitting box A in this test

What I see
The application seems to work ok for the first few seconds, but then fails between 5-20 seconds into the load test.  Like I said above, I am only hitting A.
I see a lot of the following on A
2008-09-15 14:21:36,192 ERROR [org.jboss.messaging.util.ExceptionUtil] SessionEndpoint[1hn-dln9f5lf-1-b8q6e5lf-np8q9k-t1ce1a]
  |  addTemporaryDestination [pko-ctaaf5lf-1-b8q6e5lf-np8q9k-t1ce1a]
  | java.lang.IllegalStateException: org.jboss.messaging.core.impl.postoffice.GroupMember at 1ad11ec response not received from 10.5
  | 0.12.136:56010 - there may be others
  |         at org.jboss.messaging.core.impl.postoffice.GroupMember.multicastControl(GroupMember.java:253)
  |         at org.jboss.messaging.core.impl.postoffice.MessagingPostOffice.internalAddBinding(MessagingPostOffice.java:1886)
  | 

The more informative piece of information (IMHO) actually comes from B

  | 2008-09-15 14:22:51,660 DEBUG [org.jboss.messaging.core.impl.postoffice.GroupMember] org.jboss.messaging.core.impl.postoffice.GroupMember$ControlMembershipListener at 1aec0d1 got new view [10.50.12.136:56010|2] [10.50.12.136:56010], old view is [10.50.12.136:56010|1] [10.50.12.136:56010, 10.50.12.137:35992]
  | 2008-09-15 14:22:51,660 DEBUG [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice] Updated failover map:
  | 
  |              1->1
  | 
  | 2008-09-15 14:22:51,660 DEBUG [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice] org.jboss.messaging.core.impl.postoffice.MessagingPostOffice at fbf107: 10.50.12.137:35992 left
  | 2008-09-15 14:22:51,660 DEBUG [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice] org.jboss.messaging.core.impl.postoffice.MessagingPostOffice at fbf107: node 2 has crashed
  | 2008-09-15 14:22:51,661 DEBUG [org.jboss.messaging.core.impl.postoffice.MessagingPostOffice] org.jboss.messaging.core.impl.postoffice.MessagingPostOffice at fbf107 the failover node for the crashed node is 1
  | 2008-09-15 14:22:58,558 INFO  [org.jboss.cache.TreeCache] viewAccepted(): [10.50.12.136:55867|32] [10.50.12.136:55867]
  | 2008-09-15 14:22:58,566 DEBUG [org.jboss.cache.buddyreplication.BuddyManager] Instance 10.50.12.136:55867 broadcasting membership in buddy pool default to recipients []
  | 2008-09-15 14:22:58,566 DEBUG [org.jboss.cache.buddyreplication.BuddyManager] Data owner address 10.50.12.136:55867
  | 2008-09-15 14:22:58,566 DEBUG [org.jboss.cache.buddyreplication.BuddyManager] Entering updateGroup.  Current group: BuddyGroup: (dataOwner: 10.50.12.136:55867, groupName: 10.50.12.136_55867, buddies: [10.50.12.137:35817]).  Current View membership: [10.50.12.136:55867]
  | 2008-09-15 14:22:58,566 INFO  [org.jboss.cache.buddyreplication.NextMemberBuddyLocator] Expected to look for 1 buddies but could only find 0 suitable candidates - trying with colocated buddies as well.
  | 2008-09-15 14:22:58,566 INFO  [org.jboss.cache.buddyreplication.NextMemberBuddyLocator] Expected to look for 1 buddies but could only find 0 suitable candidates - trying again, ignoring buddy pool hints.
  | 2008-09-15 14:22:58,566 INFO  [org.jboss.cache.buddyreplication.NextMemberBuddyLocator] Expected to look for 1 buddies but could only find 0 suitable candidates - trying with colocated buddies as well.
  | 2008-09-15 14:22:58,566 INFO  [org.jboss.cache.buddyreplication.NextMemberBuddyLocator] Expected to look for 1 buddies but could only find 0 suitable candidates!
  | 2008-09-15 14:22:58,572 INFO  [org.jboss.cache.buddyreplication.BuddyManager] Removing obsolete buddies from buddy group [10.50.12.136_55867].  Obsolete buddies are [10.50.12.137:35817]
  | 2008-09-15 14:22:58,572 INFO  [org.jboss.cache.buddyreplication.BuddyManager] New buddy group: BuddyGroup: (dataOwner: 10.50.12.136:55867, groupName: 10.50.12.136_55867, buddies: [])
  | 

#'s
I am currently running a load test of 256 concurrent users consistently requesting with 0.1 second delay between requests.  Note: this is the same configuration that ran fairly well on my laptop in a non-clustered situation (using hypersonic).

Versions
JBoss: 4.2.1.GA
JBoss Messaging
Implementation-Version: 1.4.0.SP3 (build: CVSTag=JBossMessaging_1_4_0_
 SP3 date=200712131418)
Remoting
Implementation-Version: 2.2.2.SP4

Any help or guidance would be appreciated.

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4176599#4176599

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4176599