[jboss-dev-forums] [Design of Messaging on JBoss (Messaging/JBoss)] - Ping issues under stress

jmesnil do-not-reply at jboss.com
Thu Aug 13 09:27:34 EDT 2009


I managed to get ping issues and clients stuck doing a soak test on our messaging Labs.

Setup:
1 server (messaging-01) with 1 JMS topic
10 producers + 17 non-durable subscribers (splitted on 7 machines)

server config:

  |   Netty:
  |          <param key="jbm.remoting.netty.tcpnodelay" value="false" type="Boolean"/>
  |          <param key="jbm.remoting.netty.tcpsendbuffersize" value="1048576" type="Integer"/>
  |          <param key="jbm.remoting.netty.tcpreceivebuffersize" value="1048576" type="Integer"/>
  | 
  |    <security-enabled>false</security-enabled>
  |    <persistence-enabled>false</persistence-enabled>
  | 


JMS config:

   <connection-factory name="ConnectionFactory">
      <connector-ref connector-name="netty-connector"/>
      
         
      
      <retry-interval>1000</retry-interval>
      <retry-interval-multiplier>1.0</retry-interval-multiplier>
      <reconnect-attempts>-1</reconnect-attempts>
      <failover-on-server-shutdown>true</failover-on-server-shutdown>
      <call-timeout>5000</call-timeout>
      <block-on-non-persistent-send>false</block-on-non-persistent-send>
      <block-on-persistent-send>true</block-on-persistent-send>
      <pre-acknowledge>true</pre-acknowledge>
   </connection-factory>

   
      
   



soak.properties:

  | duration-in-minutes=120
  | num-warmup-messages=100
  | message-size=1024
  | durable=false
  | transacted=false
  | batch-size=1000
  | drain-queue=false
  | destination-lookup=soakTopic
  | connection-factory-lookup=/ConnectionFactory
  | throttle-rate=-1
  | dups-ok-acknowledge=false
  | disable-message-id=true
  | disable-message-timestamp=true
  | 

The clients (both producers and consumers) are connected to reconnect to the server
when they are notified by a JMSExceptionListener. Their connection factories will indefinitely try 
to reconnect to the server.

1. I start the server. No activity:


  |      [java] **** Server Dump ****
  |      [java] date:            Thu Aug 13 08:40:56 EDT 2009
  |      [java] heap memory:     used=60.09 MB, max=1.92 GB
  |      [java] non-heap memory: used=60.09 MB, max=1.92 GB
  |      [java] # of thread:     16
  |      [java] # of conns:      0
  |      [java] JMS topics:
  |      [java] 	soakTopic: 0 subscription (0 non-durable) receiving 0 message
  |      [java] ********************
  | 

2. I then start 10 producers and 17 non-durable subscribers
At the beginning, all producers are producing and all subscribers are subscribing

3. But after a few seconds, the subscribers starts to warn about missing ping servers and failures to failover


  |      [java] Aug 13, 2009 8:51:30 AM org.jboss.messaging.core.logging.Logger warn
  |      [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection at 128ae45a[local= /172.16.8.12:41708, remote=/172.16.8.10:5445] [code=3]
  |      [java] Aug 13, 2009 8:51:30 AM org.jboss.messaging.core.logging.Logger warn
  |      [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection at 128ae45a[local= /172.16.8.12:41706, remote=/172.16.8.10:5445] [code=3]
  |      [java] Aug 13, 2009 8:51:36 AM org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$3 onMessage
  |      [java] INFO: received 10000 messages in 11.60s (total: 117s)
  |      [java] Aug 13, 2009 8:51:53 AM org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$3 onMessage
  |      [java] INFO: received 10000 messages in 16.80s (total: 133s)
  |      [java] Aug 13, 2009 8:52:09 AM org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$3 onMessage
  |      [java] INFO: received 10000 messages in 16.07s (total: 150s)
  |      [java] Aug 13, 2009 8:52:15 AM org.jboss.messaging.core.logging.Logger warn
  |      [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection at 5f7779e3[local= /172.16.8.12:41735, remote=/172.16.8.10:5445] [code=3]
  |      [java] Aug 13, 2009 8:52:15 AM org.jboss.messaging.core.logging.Logger warn
  |      [java] WARNING: Connection failure has been detected: Did not receive ping from server for org.jboss.messaging.integration.transports.netty.NettyConnection at 1ce3570c[local= /172.16.8.12:41737, remote=/172.16.8.10:5445] [code=3]
  |      [java] Aug 13, 2009 8:52:20 AM org.jboss.messaging.core.logging.Logger error
  |      [java] SEVERE: Failed to handle failover
  |      [java] MessagingException[errorCode=3 message=Timed out waiting for response when sending packet 32]
  |      [java] 	at org.jboss.messaging.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:312)
  |      [java] 	at org.jboss.messaging.core.client.impl.ClientSessionImpl.handleFailover(ClientSessionImpl.java:751)
  |      [java] 	at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.reattachSessions(ConnectionManagerImpl.java:794)
  |      [java] 	at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.failoverOrReconnect(ConnectionManagerImpl.java:647)
  |      [java] 	at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.handleConnectionFailure(ConnectionManagerImpl.java:518)
  |      [java] 	at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.access$600(ConnectionManagerImpl.java:81)
  |      [java] 	at org.jboss.messaging.core.client.impl.ConnectionManagerImpl$DelegatingFailureListener.connectionFailed(ConnectionManagerImpl.java:1218)
  |      [java] 	at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:393)
  |      [java] 	at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:244)
  |      [java] 	at org.jboss.messaging.core.client.impl.ConnectionManagerImpl$FailedConnectionAction$1.run(ConnectionManagerImpl.java:1174)
  |      [java] 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  |      [java] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  |      [java] 	at java.lang.Thread.run(Thread.java:619)
  |      [java] javax.jms.JMSException: Connection is destroyed
  |      [java] 	at org.jboss.messaging.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:240)
  |      [java] 	at org.jboss.messaging.core.client.impl.ClientConsumerImpl.doCleanUp(ClientConsumerImpl.java:745)
  |      [java] 	at org.jboss.messaging.core.client.impl.ClientConsumerImpl.close(ClientConsumerImpl.java:315)
  |      [java] 	at org.jboss.messaging.jms.client.JBossMessageConsumer.close(JBossMessageConsumer.java:153)
  |      [java] 	at org.jboss.messaging.jms.client.JBossSession.close(JBossSession.java:275)
  |      [java] 	at org.jboss.messaging.jms.client.JBossConnection.close(JBossConnection.java:259)
  |      [java] 	at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.disconnect(SoakReconnectableReceiver.java:186)
  |      [java] 	at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.access$000(SoakReconnectableReceiver.java:41)
  |      [java] 	at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$2.onException(SoakReconnectableReceiver.java:99)
  |      [java] 	at org.jboss.messaging.jms.client.JBossConnection$JMSFailureListener$1.run(JBossConnection.java:550)
  |      [java] 	at java.lang.Thread.run(Thread.java:619)
  |      [java] Caused by: MessagingException[errorCode=2 message=Connection is destroyed]
  |      [java] 	... 11 more
  |      [java] javax.jms.JMSException: Timed out waiting for response when sending packet 30
  |      [java] 	at org.jboss.messaging.core.remoting.impl.ChannelImpl.sendBlocking(ChannelImpl.java:312)
  |      [java] 	at org.jboss.messaging.core.client.impl.ConnectionManagerImpl.createSession(ConnectionManagerImpl.java:326)
  |      [java] 	at org.jboss.messaging.core.client.impl.ClientSessionFactoryImpl.createSessionInternal(ClientSessionFactoryImpl.java:977)
  |      [java] 	at org.jboss.messaging.core.client.impl.ClientSessionFactoryImpl.createSession(ClientSessionFactoryImpl.java:721)
  |      [java] 	at org.jboss.messaging.jms.client.JBossConnection.createSessionInternal(JBossConnection.java:463)
  |      [java] 	at org.jboss.messaging.jms.client.JBossConnection.createSession(JBossConnection.java:160)
  |      [java] 	at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.connect(SoakReconnectableReceiver.java:213)
  |      [java] 	at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver.access$100(SoakReconnectableReceiver.java:41)
  |      [java] 	at org.jboss.jms.soak.example.reconnect.SoakReconnectableReceiver$2.onException(SoakReconnectableReceiver.java:100)
  |      [java] 	at org.jboss.messaging.jms.client.JBossConnection$JMSFailureListener$1.run(JBossConnection.java:550)
  |      [java] 	at java.lang.Thread.run(Thread.java:619)
  |      [java] Caused by: MessagingException[errorCode=3 message=Timed out waiting for response when sending packet 30]
  |      [java] 	... 11 more
  | 

4. on the server, the number of remoting connection (and the memory) explodes:


  |     [java] **** Server Dump ****
  |      [java] date:            Thu Aug 13 08:41:57 EDT 2009
  |      [java] heap memory:     used=1.46 GB, max=1.88 GB
  |      [java] non-heap memory: used=1.46 GB, max=1.88 GB
  |      [java] # of thread:     62
  |      [java] # of conns:      112
  |      [java] JMS topics:
  |      [java] 	soakTopic: 17 subscription (17 non-durable) receiving 8337499 message
  |      [java] ********************
  | 

5. The exception continues on the subscribers
6. Things are messy on the server-side with more subscriptions than there are clients:

  |      [java] **** Server Dump ****
  |      [java] date:            Thu Aug 13 08:43:57 EDT 2009
  |      [java] heap memory:     used=1.70 GB, max=1.88 GB
  |      [java] non-heap memory: used=1.70 GB, max=1.88 GB
  |      [java] # of thread:     45
  |      [java] # of conns:      260
  |      [java] JMS topics:
  |      [java] 	soakTopic: 23 subscription (23 non-durable) receiving 10414915 message
  |      [java] ********************
  | 

6. the server start to clean up remoting connections and related resources

  |    [java] [Thread-0 (group:JBM-scheduled-threads-2030193044)] 08:44:25,717 WARNING [org.jboss.messaging.core.remoting.server.impl.RemotingServiceImpl]  Did not receive initial ping from /172.16.8.15:38164, connection will be closed
  |      [java] [Thread-0 (group:JBM-scheduled-threads-2030193044)] 08:44:25,717 WARNING [org.jboss.messaging.core.remoting.server.impl.RemotingServiceImpl]  Did not receive initial ping from /172.16.8.12:49942, connection will be closed
  | 

7. I also have some netty exception on the server:

  |     [java] [New I/O server worker #1-2] 08:44:31,281 WARNING [org.jboss.netty.channel.socket.nio.NioWorker]  Unexpected exception in the selector loop.
  |      [java] org.jboss.netty.channel.ChannelException: Failed to register a socket to the selector.
  |      [java] 	at org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:754)
  |      [java] 	at org.jboss.netty.channel.socket.nio.NioWorker.processRegisterTaskQueue(NioWorker.java:260)
  |      [java] 	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:199)
  |      [java] 	at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:53)
  |      [java] 	at org.jboss.messaging.integration.transports.netty.VirtualExecutorService$ChildExecutorRunnable.run(VirtualExecutorService.java:184)
  |      [java] 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  |      [java] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  |      [java] 	at java.lang.Thread.run(Thread.java:619)
  |      [java] Caused by: java.nio.channels.ClosedChannelException
  |      [java] 	at java.nio.channels.spi.AbstractSelectableChannel.configureBlocking(AbstractSelectableChannel.java:252)
  |      [java] 	at org.jboss.netty.channel.socket.nio.NioWorker$RegisterTask.run(NioWorker.java:739)
  |      [java] 	... 7 more
  | 

8. Failed connections now are unable to clean up related resource on the server

  |      [java] [Thread-3 (group:JBM-scheduled-threads-2030193044)] 08:47:55,595 SEVERE [org.jboss.messaging.core.management.impl.ManagementServiceImpl]  Failed to call listener
  |      [java] java.lang.IllegalStateException: Cannot find queue info for queue b685ebae-d3b7-4a4d-81f1-862366829b1f8b8e8cc7-8806-11de-9b94-0015178e6d7c
  |      [java] 	at org.jboss.messaging.core.postoffice.impl.PostOfficeImpl.onNotification(PostOfficeImpl.java:387)
  |      [java] 	at org.jboss.messaging.core.management.impl.ManagementServiceImpl.sendNotification(ManagementServiceImpl.java:647)
  |      [java] 	at org.jboss.messaging.core.server.impl.ServerConsumerImpl.close(ServerConsumerImpl.java:272)
  |      [java] 	at org.jboss.messaging.core.server.impl.ServerSessionImpl.close(ServerSessionImpl.java:339)
  |      [java] 	at org.jboss.messaging.core.server.impl.ServerSessionImpl.doHandleClose(ServerSessionImpl.java:2217)
  |      [java] 	at org.jboss.messaging.core.server.impl.ServerSessionImpl.handleClose(ServerSessionImpl.java:882)
  |      [java] 	at org.jboss.messaging.core.server.impl.ServerSessionImpl.connectionFailed(ServerSessionImpl.java:1177)
  |      [java] 	at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.callFailureListeners(RemotingConnectionImpl.java:393)
  |      [java] 	at org.jboss.messaging.core.remoting.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:244)
  |      [java] 	at org.jboss.messaging.core.remoting.server.impl.RemotingServiceImpl$FailedConnectionAction.run(RemotingServiceImpl.java:480)
  |      [java] 	at org.jboss.messaging.core.remoting.impl.Pinger.run(Pinger.java:116)
  |      [java] 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
  |      [java] 	at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
  |      [java] 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
  |      [java] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
  |      [java] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
  |      [java] 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
  |      [java] 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  |      [java] 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  |      [java] 	at java.lang.Thread.run(Thread.java:619)
  | 

9. During that time, the producers started to see missing pings from the server:

10. When things settle down, all clients are stuck

  |      [java] **** Server Dump ****
  |      [java] date:            Thu Aug 13 09:22:10 EDT 2009
  |      [java] heap memory:     used=1.70 GB, max=1.87 GB
  |      [java] non-heap memory: used=1.70 GB, max=1.87 GB
  |      [java] # of thread:     30
  |      [java] # of conns:      16
  |      [java] JMS topics:
  |      [java] 	soakTopic: 2 subscription (2 non-durable) receiving 0 message
  |      [java] ********************
  | 

I did the same run with <block-non-persistent-send> set to true and I did not get any error.
To clear things up, no servers were killed/restarted during these runs.

View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4249412#4249412

Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4249412



More information about the jboss-dev-forums mailing list