[jboss-jira] [JBoss JIRA] Commented: (JBMESSAGING-1110) Client incorrectly initiates client-side fail over on high concurrency

Ovidiu Feodorov (JIRA) jira-events at lists.jboss.org
Sat Oct 13 22:56:03 EDT 2007


    [ http://jira.jboss.com/jira/browse/JBMESSAGING-1110?page=comments#action_12382315 ] 
            
Ovidiu Feodorov commented on JBMESSAGING-1110:
----------------------------------------------

Also, messages seem to be vanished from the database. 

I started with 50,000 10K persistent text messages stored in the queue "ovidiu" on the node with the ID 39 (looked up a clustered but not load-balanced connection factory from that node and got confirmation by calling getServerID()). The queue was deployed as "clustered"

State of the database after killing the client (which became unresponsive):

mysql> SELECT COUNT(MESSAGE_ID) FROM JBM_MSG_REF;
+-------------------+
| COUNT(MESSAGE_ID) |
+-------------------+
|             49288 | 
+-------------------+

mysql> SELECT * FROM JBM_MSG_REF WHERE MESSAGE_ID = 35528; (this is MIN(MESSAGE_ID))
+------------+------------+----------------+-------+-------------------+----------+----------------+----------------+
| CHANNEL_ID | MESSAGE_ID | TRANSACTION_ID | STATE | ORD               | PAGE_ORD | DELIVERY_COUNT | SCHED_DELIVERY |
+------------+------------+----------------+-------+-------------------+----------+----------------+----------------+
|          0 |      35528 |           NULL | C     | 39070061203453943 |     NULL |              0 |              0 | 
+------------+------------+----------------+-------+-------------------+----------+----------------+----------------+

mysql> SELECT * FROM JBM_MSG_REF WHERE MESSAGE_ID = 84912; (this is MAX(MESSAGE_ID))
+------------+------------+----------------+-------+-------------------+----------+----------------+----------------+
| CHANNEL_ID | MESSAGE_ID | TRANSACTION_ID | STATE | ORD               | PAGE_ORD | DELIVERY_COUNT | SCHED_DELIVERY |
+------------+------------+----------------+-------+-------------------+----------+----------------+----------------+
|          0 |      84912 |           NULL | C     | 39070075545073791 |     NULL |              0 |              0 | 
+------------+------------+----------------+-------+-------------------+----------+----------------+----------------+

mysql> SELECT * FROM JBM_POSTOFFICE; (irrelevant destination names edited for privacy reasons)
+-----------------+---------+--------------------------+--------------------------------+----------+------------+-----------+-----------+
| POSTOFFICE_NAME | NODE_ID | QUEUE_NAME               | COND                           | SELECTOR | CHANNEL_ID | CLUSTERED | ALL_NODES |
+-----------------+---------+--------------------------+--------------------------------+----------+------------+-----------+-----------+
| JMS post office |      38 | *** | queue.***              | NULL     |         27 | Y         | N         | 
| JMS post office |      38 | DLQ                      | queue.DLQ                      | NULL     |         21 | N         | N         | 
| JMS post office |      38 | ExpiryQueue              | queue.ExpiryQueue              | NULL     |         22 | N         | N         | 
| JMS post office |      38 | *** | queue.*** | NULL     |         29 | Y         | N         | 
| JMS post office |      38 | *** | queue.*** | NULL     |         30 | Y         | N         | 
| JMS post office |      38 | *** | queue.*** | NULL     |         31 | Y         | N         | 
| JMS post office |      38 | *** | queue.*** | NULL     |         32 | Y         | N         | 
| JMS post office |      38 | *** | queue.*** | NULL     |         36 | Y         | N         | 
| JMS post office |      38 | *** | queue.*** | NULL     |         34 | Y         | N         | 
| JMS post office |      38 | *** | queue.*** | NULL     |         35 | Y         | N         | 
| JMS post office |      38 | *** | queue.*** | NULL     |         33 | Y         | N         | 
| JMS post office |      38 | ovidiu                   | queue.ovidiu                   | NULL     |         20 | Y         | N         | 
| JMS post office |      38 | *** | queue.*** | NULL     |         23 | Y         | N         | 
| JMS post office |      38 | *** | queue.*** | NULL     |         25 | Y         | N         | 
| JMS post office |      38 | *** | queue.*** | NULL     |         26 | Y         | N         | 
| JMS post office |      38 | *** | queue.*** | NULL     |         24 | Y         | N         | 
| JMS post office |      38 | testDistributedQueue     | queue.testDistributedQueue     | NULL     |         28 | Y         | N         | 
| JMS post office |      39 | DLQ                      | queue.DLQ                      | NULL     |          1 | N         | N         | 
| JMS post office |      39 | ExpiryQueue              | queue.ExpiryQueue              | NULL     |          2 | N         | N         | 
| JMS post office |      39 | *** | queue.*** | NULL     |         12 | Y         | N         | 
| JMS post office |      39 | *** | queue.*** | NULL     |         15 | Y         | N         | 
| JMS post office |      39 | *** | queue.*** | NULL     |         13 | Y         | N         | 
| JMS post office |      39 | *** | queue.*** | NULL     |          3 | Y         | N         | 
| JMS post office |      39 | *** | queue.*** | NULL     |          5 | Y         | N         | 
| JMS post office |      39 | *** | queue.*** | NULL     |          4 | Y         | N         | 
| JMS post office |      39 | testDistributedQueue     | queue.testDistributedQueue     | NULL     |          8 | Y         | N         | 
+-----------------+---------+--------------------------+--------------------------------+----------+------------+-----------+-----------+
26 rows in set (0.00 sec)

Please note that the channel corresponding to the "ovidiu" queue on node 39 vanished, and it seems there's no "ovidiu" queue on 39.

The JMX console on node 39 became unresponsive as well (couldn't check whether the queue is visible via the JMX interface)
Any attempt to connect to node 39 (even just to create a Connection and then close it) was unsuccessful .

This is the client-side log on a connection attempt:

ofeodorov at yamato$ mess ping
@Timer-1 19:45:51,445 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51137],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection
@Timer-1 19:46:31,458 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51142],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection
@Timer-1 19:46:56,467 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51143],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection
@Timer-1 19:47:31,479 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51146],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection
@Timer-1 19:48:21,497 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51147],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection
@Timer-1 19:48:46,506 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51148],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection
@Timer-1 19:49:26,521 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51150],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection
@Timer-1 19:50:36,548 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51154],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection
@Timer-1 19:51:31,567 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51159],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection
@Timer-1 19:52:21,586 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51160],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection
@Timer-1 19:53:06,603 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51162],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection
@Timer-1 19:53:46,619 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51164],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection
@Timer-1 19:54:26,633 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 74b957ea: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=51177],5,main] (a4812d-p364ba-f7qyex29-1-f7qyfgeu-5: requesting new control connection






> Client incorrectly initiates client-side fail over on high concurrency
> ----------------------------------------------------------------------
>
>                 Key: JBMESSAGING-1110
>                 URL: http://jira.jboss.com/jira/browse/JBMESSAGING-1110
>             Project: JBoss Messaging
>          Issue Type: Bug
>          Components: JMS Facade
>    Affects Versions: 1.4.0.GA
>            Reporter: Ovidiu Feodorov
>         Assigned To: Tim Fox
>
> Dump of a faulty test run:
> ofeodorov at yamato$ mess -sessions 50 receive all
> ConnectionFactory: JBossConnectionFactory->ClusteredConnectionFactoryDelegate[SIDs={38,39}]
> Destination: JBossQueue[ovidiu]
> Consumer21 starting to consume ...
> Consumer43 starting to consume ...
> Consumer23 starting to consume ...
> Consumer44 starting to consume ...
> Consumer11 starting to consume ...
> Consumer29 starting to consume ...
> Consumer26 starting to consume ...
> Consumer39 starting to consume ...
> Consumer40 starting to consume ...
> Consumer01 starting to consume ...
> Consumer17 starting to consume ...
> Consumer14 starting to consume ...
> Consumer04 starting to consume ...
> Consumer16 starting to consume ...
> Consumer49 starting to consume ...
> Consumer42 starting to consume ...
> Consumer41 starting to consume ...
> Consumer34 starting to consume ...
> Consumer18 starting to consume ...
> Consumer46 starting to consume ...
> Consumer20 starting to consume ...
> Consumer02 starting to consume ...
> Consumer06 starting to consume ...
> Consumer32 starting to consume ...
> Consumer07 starting to consume ...
> Consumer19 starting to consume ...
> Consumer35 starting to consume ...
> Consumer48 starting to consume ...
> Consumer24 starting to consume ...
> Consumer30 starting to consume ...
> Consumer08 starting to consume ...
> Consumer12 starting to consume ...
> Consumer36 starting to consume ...
> Consumer09 starting to consume ...
> Consumer47 starting to consume ...
> Consumer00 starting to consume ...
> Consumer13 starting to consume ...
> Consumer31 starting to consume ...
> Consumer27 starting to consume ...
> Consumer33 starting to consume ...
> Consumer10 starting to consume ...
> Consumer05 starting to consume ...
> Consumer45 starting to consume ...
> Consumer15 starting to consume ...
> Consumer03 starting to consume ...
> Consumer37 starting to consume ...
> Consumer28 starting to consume ...
> Consumer38 starting to consume ...
> Consumer22 starting to consume ...
> Consumer30 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer33 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer22 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer32 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer31 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer10 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer09 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer34 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer12 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer38 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer24 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer15 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer13 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer29 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer36 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer16 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer41 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer14 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer26 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer19 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer08 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer07 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer23 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer11 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer18 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer00 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer03 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer02 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer27 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer05 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer20 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer04 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer45 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer48 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer06 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer47 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer43 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer46 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer44 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer42 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer17 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer39 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> Consumer49 doesn't get any more messages, exiting after a timeout of 3000 ms ...
> @Consumer25 18:11:31,474 WARN  [ClusteringAspect] Client attempted failover, but no failover attempt has been detected on the server side. We will now try again on the original server in case there was a temporary glitch on the client--server network
> closing connection JBossConnection->ConnectionDelegate[1572973638, ID=yc-2wk3vq7f-1-6ii5qq7f-onm5l5-31184a, SID=39] on exit ...
> @Timer-1 18:16:58,969 WARN  [BisocketServerInvoker] org.jboss.remoting.transport.bisocket.BisocketServerInvoker$ControlMonitorTimerTask at 11dba9f9: detected failure on control connection Thread[control: Socket[addr=/10.152.1.39,port=3309,localport=50826],5,main] (a4812d-tafka6-f7qv26rw-1-f7qv3kvq-o: requesting new control connection

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.jboss.com/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the jboss-jira mailing list