[Design of Messaging on JBoss (Messaging/JBoss)] - Re: Client failover redeliveries discussion - jboss-dev-forums

Thursday, 26 October 2006

I thought we had gone over all this already, but here goes again....

&quot;ovidiu.feodorov(a)jboss.com&quot; wrote : 
  | A client may happen to be sending a message when the failure occurs. If the message is
sent individually (not in the context of a transaction), then the on-going synchronous
invocation is going to fail, the client code will catch the exception, and most likely
will re-try to send the message, 
  | 

What do you mean the "client code" will catch the exception?
Do you mean the application code?
If so, this is incorrect - JBoss Messaging failover is supposed to be automatic - this is
one of our selling points.
Applications shouldn't have to catch connection exceptions and retry like in JBossMQ.

"Ovidiu" wrote : 
  | If the client happens to send message in the context of a transaction when the failure
occurs, we could either throw an exception, and discard everything, or go for the more
elegant solution of transparently copying the transactional state (the corresponding
TxState instance) into the new ResourceManager and send the messages over the new
connection when transaction commits. We're probably not doing this right now, but this
is what should be doing.
  | 

Same reasoning applies as previous comment. It should be transparent.

"Ovidiu" wrote : 
  | It is interesting to consider also what happens when the failure occurs right in the
middle of "sentTransaction()" invocation.
  | 

Again same reasoning applies.

There is a problem here of an exception being received and a retry occuring but the
transaction/send actually went through on the previous node.
Please see discussion on duplicated message detection in a previous thread  for more
information on this.

"Ovidiu" wrote : 
  | 
  | What is more interesting is what happens with the messages that are already in the
MessageCallbackHandler's buffer.
  | 
  | For a seamless fail-over, they will need to be transferred in the new
MessageCallbackHandler's buffer. Also important, immediately after the failover
condition is detected, any in-progress read should be completed, and no further reads
should be accepted until the client-side fail-over is complete ("client side failover
lockdown"). The next post-failover read should be done from the new
MessagingCallbackHandler's buffer.
  | 

There is no need to copy anything since Clebert is re-using the same connection, consumer,
buffer objects before and after failover - he is just changing the ids.

"Ovidiu" wrote : 
  | Contrary to what has been discussed so far on this thread, I think we can also salvage
non-persistent messages, with minimum of effort. I'll address this issue again later.
The acknowledgments for these messages (persistent and non-persistent) will be sent by the
new Connection Delegate.
  | 
  | We also have the acknowledgments accumulated in a transaction on the client-side. The
case should be dealt with similarly with the way we handle transacted messages (copy the
TxState instance).
  | 

Again, no copying is necessary - just re-use the same object.

"Ovidiu" wrote : 
  | 
  | Tim wrote : 
  |   | Yes - we should send the ids of every persistent message as part of the failover
protocol - the server then repopulates the delivery list in the server consumer endpoint
  |   | 
  | 
  | I think we can go a step further and also send the IDs of non-persistent messages that
have been "failed-over" on the client side. This way, the client will continue
to receive (and successfully acknowledge) non-persistent messages that otherwise would
have been lost.
  | 

This makes no sense. When server A fails and server B takes over, only the persistent
messages are resurrected into server B's queues.

The non persistent messages are lost.

Therefore it's not possible that the non persistent messages can be successfully
acknowledged on server B, since server B won't know about them.

This is why I said the non persistent messages should be removed from the client state so
they don't attempt to be acked.

"Ovidiu" wrote : 
  | Tim wrote : 
  |   | Clebert wrote : 
  |   |   | - Should we ignore ACKs for non existent messages on the server?
  |   |   | 
  |   | Non existent messages on the server will be non persistent messages that
didn't survive the failover.
  |   | They should be removed from the client side list on failover so the acks will
never get sent.
  |   | 
  | 
  | Not necessarily. See my above comment. We could also include the ids of non-persistent
messages with the list of message ID sent to the server as part of the failover protocol,
and thus be able to "salvage" those messages as well. I don't see any
problem if we do that. We get better fault tolerance.
  | 

How can you salvage a message that doesn't exist any more? The non persistent messages
wil have been lost when server A failed.

"Ovidiu" wrote : 
  | What about the "fail-over protocol"? Your statement above seem to assume
that the new server node is called into without any "preparation", as would a
completely new client that creates a new connection, session and consumer endpoints. This
is not going to work, those server-side objects need to undergo a
"post-failover" preparation phase, where deliveries for the client-side failed
over messages are created and so forth.
  | 

Correct.
"Ovidiu" wrote : 
  | Tim wrote : 
  |   | So to summarise:
  |   | ...
  |   | 3) Let the server "stall" you until server failover has completed
  |   | ...
  |   | 
  | 
  | What exactly does this mean?
  | 

When a server fails, the server side failover kicks in, and the server loads those queues
which it is taking over responsibuility for.

This may take several seconds, during which time we do not want a failed over client
connection to start sending/consuming from those queues since they might receive a partial
state.

Hence we need to stall the connection at reconnection until the server completes its
failover protocol. I.e. a "valve".

This is covered in the wiki page I believe (like most of this stuff).

"Ovidiu" wrote : 
  | Tim wrote : 
  |   | So to summarise:
  |   | ...
  |   | 5) Delete any non persistent messages from the client list of unacked messages in
any sessions in the failed connection.
  |   | ...
  |   | 
  | 
  | Why? See my comment above. Why do you think "salvaging" non-persistent
messages too isn't going to work?
  | 

Because the new server know nothing about the non persistent messages, since they
won't have survived the server failure.

"Ovidiu" wrote : 
  | 
  | Non-persistent message ids too.
  | 

No point doing that, for the reasons explained twice already in this thread.

View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3980937#...

Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Design of Messaging on JBoss (Messaging/JBoss)] - Re: Client failover redeliveries discussion