[
https://issues.jboss.org/browse/ISPN-5016?page=com.atlassian.jira.plugin....
]
Radim Vansa edited comment on ISPN-5016 at 12/23/14 7:46 AM:
-------------------------------------------------------------
{quote}If a node is suspected because of a Full GC, it might go from the initial JGroups
view straight to the merge view. If that happens, its topology will be the largest one,
and it will not be wiped, neither will it receive new data. Instead, it will keep the
(possibly stale) entries it had before the Full GC.{quote}
Sorry, this is not very clear to me. Why largest topology ID and why wouldn't that
propagate its state to all other members (wiping their data)?
{quote}If at least half of the nodes in the stable topology leave in quick
succession{quote}
What is quick succession? Does that depend on JGroups view installation, any timeout or
the duration of rebalance?
{quote}And if some of the nodes in the Available partition’s consistent hash are not
really accessible after the merge, the cache might stay Degraded.{quote}
'the cache'? We should be talking rather about partitions, not the whole cache. I
understand the first part of the special case, but not the latter.
{quote}If a node joins and becomes a backup owner after a write command was sent from the
primary owner to the backups, but before the primary owner updates its own data container,
it may not receive the value neither as a write command nor via state transfer.{quote}
Sounds like a bug to me - is there a JIRA that could be linked? We could tolerate
inconsistencies when the node crashes (if we can't fix it), but join or graceful leave
should keep the cluster consistent.
{quote}When a write to the store fails, it will fail the write operation.{quote}
Does this hold for write-behind, too?
{quote}With write-behind or asynchronous replication enabled, store write failures are
hidden from the user (unless the originator is the primary owner, with async
replication).{quote}
When originator == primary and write-behind, how can the failure to write to the store be
propagated to the user? I thought that user thread initiates both write to the store and
async replication and returns.
{quote}When the partitions merge back, there is no effort to replicate the values from one
partition to another.{quote}
Why is that different from non-tx mode, where the partitions with non-highest topology id
are wiped? Moreover, in optimistic tx you write {quote}Same as with pessimistic and
non-transactional caches.{quote} - what version, then?
{quote}If the primary owners of the keys written by the transaction are all in the local
transaction, {quote}
local partition?
{quote}If one partition stays available, its entries will replace all the other
partitions' entries on merge, undoing partial commits in those partitions.{quote}
Do I understand correctly that degraded partition may commit transaction, and this
transaction will be later ignored (the data will later be overwritten to the previous
values). Why is this behaviour desired?
{quote}Transactions already prepared, however, will commit successfully even in minority
partitions{quote}
Is that true even if the originator is not in this minority partition?
{quote}When a transaction needs to acquire more than one key lock with the same primary
node, they are always acquired in the same order, so this will not cause a
deadlock.{quote}
If the keys have the same hashCodes, they can be locked in different order, though.
ISPN-2491
{quote}The commit is always synchronous on the originator, so a transaction T3 started on
node A after T1 finished will see T1’s updates.{quote}
Will it see all T1's updates, or just updates on those entries owned by A?
{quote}The write to the attached cache store(s) is performed during the one-phase prepare
command or the commit command, depending on the configuration.{quote}
What configuration, exactly?
was (Author: rvansa):
{quote}If a node is suspected because of a Full GC, it might go from the initial JGroups
view straight to the merge view. If that happens, its topology will be the largest one,
and it will not be wiped, neither will it receive new data. Instead, it will keep the
(possibly stale) entries it had before the Full GC.{quote}
Sorry, this is not very clear to me. Why largest topology ID and why wouldn't that
propagate its state to all other members (wiping their data)?
{quote}If at least half of the nodes in the stable topology leave in quick
succession{quote}
What is quick succession? Does that depend on JGroups view installation, any timeout or
the duration of rebalance?
{quote}And if some of the nodes in the Available partition’s consistent hash are not
really accessible after the merge, the cache might stay Degraded.{quote}
'the cache'? We should be talking rather about partitions, not the whole cache. I
understand the first part of the special case, but not the latter.
{quote}While a partition is in Degraded mode, attempting to read or write a key will yield
an AvailabilityException{quote}
Unless all owners of the key are in the degraded partition.
{quote}If a node joins and becomes a backup owner after a write command was sent from the
primary owner to the backups, but before the primary owner updates its own data container,
it may not receive the value neither as a write command nor via state transfer.{quote}
Sounds like a bug to me - is there a JIRA that could be linked? We could tolerate
inconsistencies when the node crashes (if we can't fix it), but join or graceful leave
should keep the cluster consistent.
{quote}When a write to the store fails, it will fail the write operation.{quote}
Does this hold for write-behind, too?
{quote}With write-behind or asynchronous replication enabled, store write failures are
hidden from the user (unless the originator is the primary owner, with async
replication).{quote}
When originator == primary and write-behind, how can the failure to write to the store be
propagated to the user? I thought that user thread initiates both write to the store and
async replication and returns.
{quote}When the partitions merge back, there is no effort to replicate the values from one
partition to another.{quote}
Why is that different from non-tx mode, where the partitions with non-highest topology id
are wiped? Moreover, in optimistic tx you write {quote}Same as with pessimistic and
non-transactional caches.{quote} - what version, then?
{quote}If the primary owners of the keys written by the transaction are all in the local
transaction, {quote}
local partition?
{quote}If one partition stays available, its entries will replace all the other
partitions' entries on merge, undoing partial commits in those partitions.{quote}
Do I understand correctly that degraded partition may commit transaction, and this
transaction will be later ignored (the data will later be overwritten to the previous
values). Why is this behaviour desired?
{quote}Transactions already prepared, however, will commit successfully even in minority
partitions{quote}
Is that true even if the originator is not in this minority partition?
{quote}When a transaction needs to acquire more than one key lock with the same primary
node, they are always acquired in the same order, so this will not cause a
deadlock.{quote}
If the keys have the same hashCodes, they can be locked in different order, though.
ISPN-2491
{quote}The commit is always synchronous on the originator, so a transaction T3 started on
node A after T1 finished will see T1’s updates.{quote}
Will it see all T1's updates, or just updates on those entries owned by A?
{quote}The write to the attached cache store(s) is performed during the one-phase prepare
command or the commit command, depending on the configuration.{quote}
What configuration, exactly?
Specify and document cache consistency guarantees
-------------------------------------------------
Key: ISPN-5016
URL:
https://issues.jboss.org/browse/ISPN-5016
Project: Infinispan
Issue Type: Task
Components: Documentation-Core
Affects Versions: 7.0.2.Final
Reporter: Radim Vansa
Assignee: Dan Berindei
Priority: Critical
We can't simply use the consistency model defined by Java Specification and broaden
it for whole cache (maybe the expression "can't" is too strong, but we
definitely don't want to do that in some cases).
By consistency guarantees/model I mean mostly in which order are
writes allowed to be observed: and we can't boil it down to simply
causal, PRAM or any other consistency model as writes can be observed as non-atomic in
Infinispan.
Infinispan documentation is quite scarce about that, the only trace I've
found is in Glossarry [2] "Infinispan has traditionally followed ACID
principles as far as possible, however an eventually consistent mode
embracing BASE is on the roadmap."
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)