After think a bit about it, we can conclude that the behaviour when we
have partition totally depends on the application.
So, why not create an generic interface (like, execute(Operation, Key,
isPrimaryPartition)) that is invoked when a partition occurs?
Maybe I'm losing the focus of the first email, but this way the
application can choose what it is better for it. For example, if the
application does not care about consistency, it can allow all partitions
to read and write. On other hand, if the application is more restrict,
it can allow only one partition to read and write and the others
partitions reject all the operations.
Pedro
On 04/19/2013 10:51 AM, Sanne Grinovero wrote:
TBH I'm not understanding which problem this thread is about :)
Surely network partitions are a problem, but there are many forms of
"partition", and many different opinions of what an "acceptable"
behaviour is that the grid should implement, which largely depend on
assumptions the client application is making.
Since we seem to be discussing a case in which the minority group is
expected to flip into read-only mode, could we step back and describe:
- why this is an accepatble solution for some class of applications?
- what kind of potential network failure we want to take compensating
actions for?
I'm not an expert on how people physically wire up single nodes, racks
and rooms to allow for our virtual connections, but let's assume that
all nodes are connected with a single "cable" between each other, or
if concrete multiple cables are actually used, could we rely on system
configuration to guarantee packets can find alternative routes if one
wire is eaten by mice?
It seems important to me to define what level of network failure we
want to address, for example are we assuming we don't deal with cases
in which nodes can talk to one group but not vice-versa?
If the effect of a nework failure is a completely isolated group, can
we assume Hot Rod clients can't reach them either?
If the group is totally isolated, would it still need read-only (with
the risk of outdated reads) or could the whole group just shutdown
since it's not reachable by anyone anyway? That is making more
assumptions, like that all produced state change goes via the network
as well, not suited for example to driving an assembly chain in a
manufacturing plant, but then again it might be safer to stop the
production belt rather than going ahead without being able to perform
fresh read operations.
I'm just trying to make an example of entirely different class of
requirements, not proposing any solution but it seems to me that,
given the complexity of the problem, we'll always need to make some
trade off and which trade off is acceptable depends on the problem. If
we described a very specific problem, we can work to make sure
Infinispan and JGroups have enough extension points and smart
protocols to deal with it, but I don't think we can resolve this issue
at a one-size-fits-all level.
Sanne
_______________________________________________
infinispan-dev mailing list
infinispan-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev