[jboss-dev-forums] [Design of Messaging on JBoss (Messaging/JBoss)] - Re: Client failover redeliveries discussion

Thu Oct 26 04:46:45 EDT 2006

I thought we had gone over all this already, but here goes again....

"ovidiu.feodorov at jboss.com" wrote : 
  | A client may happen to be sending a message when the failure occurs. If the message is sent individually (not in the context of a transaction), then the on-going synchronous invocation is going to fail, the client code will catch the exception, and most likely will re-try to send the message, 
  | 

What do you mean the "client code" will catch the exception?
Do you mean the application code?
If so, this is incorrect - JBoss Messaging failover is supposed to be automatic - this is one of our selling points.
Applications shouldn't have to catch connection exceptions and retry like in JBossMQ.

"Ovidiu" wrote : 
  | If the client happens to send message in the context of a transaction when the failure occurs, we could either throw an exception, and discard everything, or go for the more elegant solution of transparently copying the transactional state (the corresponding TxState instance) into the new ResourceManager and send the messages over the new connection when transaction commits. We're probably not doing this right now, but this is what should be doing.
  | 

Same reasoning applies as previous comment. It should be transparent.

"Ovidiu" wrote : 
  | It is interesting to consider also what happens when the failure occurs right in the middle of "sentTransaction()" invocation.
  | 

Again same reasoning applies.

There is a problem here of an exception being received and a retry occuring but the transaction/send actually went through on the previous node.
Please see discussion on duplicated message detection in a previous thread  for more information on this.

"Ovidiu" wrote : 
  | 
  | What is more interesting is what happens with the messages that are already in the MessageCallbackHandler's buffer.
  | 
  | For a seamless fail-over, they will need to be transferred in the new MessageCallbackHandler's buffer. Also important, immediately after the failover condition is detected, any in-progress read should be completed, and no further reads should be accepted until the client-side fail-over is complete ("client side failover lockdown"). The next post-failover read should be done from the new MessagingCallbackHandler's buffer.
  | 

There is no need to copy anything since Clebert is re-using the same connection, consumer, buffer objects before and after failover - he is just changing the ids.

"Ovidiu" wrote : 
  | Contrary to what has been discussed so far on this thread, I think we can also salvage non-persistent messages, with minimum of effort. I'll address this issue again later. The acknowledgments for these messages (persistent and non-persistent) will be sent by the new Connection Delegate.
  | 
  | We also have the acknowledgments accumulated in a transaction on the client-side. The case should be dealt with similarly with the way we handle transacted messages (copy the TxState instance).
  | 

Again, no copying is necessary - just re-use the same object.

"Ovidiu" wrote : 
  | 
  | Tim wrote : 
  |   | Yes - we should send the ids of every persistent message as part of the failover protocol - the server then repopulates the delivery list in the server consumer endpoint
  |   | 
  | 
  | I think we can go a step further and also send the IDs of non-persistent messages that have been "failed-over" on the client side. This way, the client will continue to receive (and successfully acknowledge) non-persistent messages that otherwise would have been lost.
  | 

This makes no sense. When server A fails and server B takes over, only the persistent messages are resurrected into server B's queues.

The non persistent messages are lost.

Therefore it's not possible that the non persistent messages can be successfully acknowledged on server B, since server B won't know about them.

This is why I said the non persistent messages should be removed from the client state so they don't attempt to be acked.

"Ovidiu" wrote : 
  | Tim wrote : 
  |   | Clebert wrote : 
  |   |   | - Should we ignore ACKs for non existent messages on the server?
  |   |   | 
  |   | Non existent messages on the server will be non persistent messages that didn't survive the failover.
  |   | They should be removed from the client side list on failover so the acks will never get sent.
  |   | 
  | 
  | Not necessarily. See my above comment. We could also include the ids of non-persistent messages with the list of message ID sent to the server as part of the failover protocol, and thus be able to "salvage" those messages as well. I don't see any problem if we do that. We get better fault tolerance.
  | 

How can you salvage a message that doesn't exist any more? The non persistent messages wil have been lost when server A failed.

"Ovidiu" wrote : 
  | What about the "fail-over protocol"? Your statement above seem to assume that the new server node is called into without any "preparation", as would a completely new client that creates a new connection, session and consumer endpoints. This is not going to work, those server-side objects need to undergo a "post-failover" preparation phase, where deliveries for the client-side failed over messages are created and so forth.
  | 

Correct.
"Ovidiu" wrote : 
  | Tim wrote : 
  |   | So to summarise:
  |   | ...
  |   | 3) Let the server "stall" you until server failover has completed
  |   | ...
  |   | 
  | 
  | What exactly does this mean?
  | 

When a server fails, the server side failover kicks in, and the server loads those queues which it is taking over responsibuility for.

This may take several seconds, during which time we do not want a failed over client connection to start sending/consuming from those queues since they might receive a partial state.

Hence we need to stall the connection at reconnection until the server completes its failover protocol. I.e. a "valve".

This is covered in the wiki page I believe (like most of this stuff).

"Ovidiu" wrote : 
  | Tim wrote : 
  |   | So to summarise:
  |   | ...
  |   | 5) Delete any non persistent messages from the client list of unacked messages in any sessions in the failed connection.
  |   | ...
  |   | 
  | 
  | Why? See my comment above. Why do you think "salvaging" non-persistent messages too isn't going to work?
  | 

Because the new server know nothing about the non persistent messages, since they won't have survived the server failure.

"Ovidiu" wrote : 
  | 
  | Non-persistent message ids too.
  | 

No point doing that, for the reasons explained twice already in this thread.

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3980937#3980937

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=3980937