Manik Surtani wrote:
On 4 Apr 2009, at 16:16, Mircea Markus wrote:
> Hi,
>
> Current implementation of tx in JBC/infinispan might result in heuristic
> transactions: e.g. if the coordinator cannot send an commit message
> (2nd phase from 2PC) within a given timeout to some of the
> participants, this might results in data being committed on some
> nodes and rollbacked on other.
? If the coord (and I assume you mean the transaction coordinator,
not the JGroups channel coordinator) doesn't broadcast a commit, none
of the other nodes would have committed this state. I don't see how
you have a situation where it is committed on some and rolled back on
others.
Perhaps you mean if the tx coordinator has broadcast a commit, some
receive the commit and before all receive the commit the tx
coordinator dies.
yes, this is the scenario I had in mind.
And you are not using multicast (if you are they all receive the
commit message at the same time). But we recommend you use multicast
anyway so I'm not so sure if this is such a problem.
Generally speaking not all
messages are received *at the same time*.
JGrous only guarantees that they will be received.
Let's say that we have 3 nodes, A B and C. A starts tx, does a put
("k","v") then commits tx. During commit following happen:
1) prepare is broadcasted
B prepares and holds locks
C prepares and holds locks
2) A sees B and C voted okay,so triggers a commit:
- B receives the commit msg and applies changes (for good!)
- A does not manage to send the message to C *in the given timeout*.
At this point, the RPC call returns and A rollbacks, also C will
rollback after a while (tx timeout). But B will have the changes
applied, and this will result in an atomicity being violated.
> Even worse, there is no way to take action and recover from the
> failure. Would it make sense to have tx failure recovery mechanism
> in infinispan?
Well, it depends. If it is used as a cache for a db, then "recovery"
is to just empty the cache. Otherwise, if you want to treat it as a
distributed in-memory db, "recovery" here would mean emptying the
cache instance in question, and doing a state transfer from a
neighbour (REPL) or re-hashing keys (DIST).
Yes. But right now, if a situation like the one I described happens no
admin will be notified, and inconsistent resources will be exposed to
users. I'm thinking about a recovery mechanism in which (continuing
previous example).
- C to keep locks on resources and not allow users to see them until it
can take a decision
- when communication between A and C is established, A to inform C that
it should rollback the tx
(Of course this is a simplistic solution, the problem is more complex,
e.g. A might die in between).
> I'm referring here to something similar to the way DBs
work, i.e.
> based on an persistent tx logs, external notifications etc? Even
> though I didn't see any such request on forums, I guess such a
> feature is mandatory for certain systems, e.g. a financial
> application. Wdyt?
Persistent tx logs can be just as error-prone, unless you checkpoint
open files to disk via OS system calls to ensure all kernel and
hardware caches are flushed. But this is *very* slow.
Agreed. But it assures
correctness for ones that need it.
AFAIK the way DBs do this - including Oracle - is to checkpoint at
intervals, but this still allows for windows where your persistent tx
log could be out of date or corrupt.
Not sure about that - the logs can be used, in
the case of heuristic tx,
for moving the system to an consistent state.
Cheers
--
Manik Surtani
Lead, JBoss Cache
http://www.jbosscache.org
manik(a)jboss.org <mailto:manik@jboss.org>