[
https://issues.jboss.org/browse/ISPN-5515?page=com.atlassian.jira.plugin....
]
Dan Berindei commented on ISPN-5515:
------------------------------------
I think these are really tricky ideas which should be discussed on
the mailing list, I noticed this JIRA by pure luck and find it concerning that such
decisions are made without any wider discussion.
We already discussed this on the mailing list, and the conclusion was to implement
[graceful
restart|https://github.com/infinispan/infinispan/wiki/Graceful-shutdown-&...].
This issue is not really about implementing new functionality, it's about automating a
recommendation we already have _for users who want it_.
it's possible the new starting node starts while
"thinking it's first", but then actually merge with a running cluster. The
cluster detection protocols aren't foolproof, and you're relying on timeouts to be
configured safely (when are they ever?).
If a node starts in a separate partition by itself, the behaviour with "purge on
join" enabled will be exactly as it is now - not better, but not worse either.
it's unrealistic to push such a requirement to
"admin's responsibility" especially but not least because node restarts
might not be under their control
A node restart will not affect nodes that are already running in any way.
Also, this option will be disabled by default, and if the admin can't control the
order in which nodes start, he should definitely not enable it.
even with this design, the majority of cachestores are cleared so
there is an assumption that "data loss is fine" for the user: so why even bother
trying to keep a small portion of it at risk of consistency trouble?
In a replicated cache, it's not a small portion of the data, it's all the data.
I agree that it makes a lot less sense in a distributed cache: in theory you could shut
down the cluster such that all the state is transferred to a single node and all the data
is preserved in that node's store, but it's definitely something you'd want to
do on a regular basis.
this design seems to favour something else above correctness, and
I'm not sure what "something else" you're aiming at.. why work hard to
not wipe a single cachestore?
These are my assumptions for using this option:
* The cache is replicated and the number of nodes is small
* Losing data is not fatal, as there is a backing store
* Reading a stale value *is* fatal
* Reading data from the canonical store is slow
I realize these assumptions are quite narrow, and most users will not use it. But for
applications who do fit these assumptions, I think this will help. And it would be
back-portable to 7.2.x, unlike the graceful restart work.
I agree with you that this is an improvement over the current state,
but I don't see why you would implement tricky code to provide a tricky solution when
all what's needed is remove the preloading option from configuration. You'll be
done in much less work and get a better reliable solution.
I renamed the issue, since it's not really about preload - having preload does
complicate the code, but stale values are possible with or without preload enabled.
Purge store if there is another node already running
----------------------------------------------------
Key: ISPN-5515
URL:
https://issues.jboss.org/browse/ISPN-5515
Project: Infinispan
Issue Type: Enhancement
Components: Core, Loaders and Stores
Affects Versions: 7.2.2.Final, 8.0.0.Alpha1
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 8.0.0.Alpha2
Preloading happens before communicating with other nodes that might already have the
cache running. When joining the existing members, the cache then waits to receive the
first CH in which it is a member, and then deletes only the entries in the segments that
it doesn't own in that CH.
The intention of this was to remove as little as possible from the existing data, e.g. if
the first node to start up is not the one that was stopped last. But the preloaded entries
are not replicated to the other nodes, so this can lead to inconsistencies.
It would be better to delay preloading until we know we are the first node to start up,
but failing that we could clear the data container and the store before receiving the
initial state.
Note that this will only allow preloading data from one node. Restoring data from more
nodes is harder to do, and we will implement it as part of graceful restart.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)