[JBoss JIRA] (ISPN-5290) Better automatic merge for caches with enabled partition handling
by Wolf-Dieter Fink (JIRA)
Wolf-Dieter Fink created ISPN-5290:
--------------------------------------
Summary: Better automatic merge for caches with enabled partition handling
Key: ISPN-5290
URL: https://issues.jboss.org/browse/ISPN-5290
Project: Infinispan
Issue Type: Feature Request
Environment: JDG cluster with partitionHandling enabled
Reporter: Wolf-Dieter Fink
At the moment there is no detection whether a node which join a cluster is one of the nodes which are known from the "last stable view" or not.
This will have the drawback that the cluster will be still in DEGRADED_MODE if there are some nodes restarted during the split-brain.
Assuming the cluster split is a power failure of some nodes the available nodes are DEGRADED as >=numOwners are lost.
If the failed nodes are restarted, let's say we have an application which use library mode in EAP, these instances are now identified as new nodes as the node-ID's are different.
If these nodes join the 'cluster' all the nodes are still degraded as the restarted are now known as different nodes and not as the lost nodes, so the cluster will not heal and come back to AVAILABLE.
There is a way to prevent some of the possibilities by using server hinting to ensure that at least one owner will survive.
But there are other cases where it would be good to have a different strategy to get the cluster back to AVAILABLE mode.
During the split-brain there is no way to continue as there is no possiblity to know whether "the other" part is gone or still acessable but not seen.
For a shared persistence it might be possible but there is a huge drawback for normal working state to synchronize that with locking and version columns.
If the node ID can be kept I see the following enhancements:
- with a shared persistence there should no data lost, if all nodes are back in the cluster it can go AVAILABLE and reload the missing entries
- for a 'side' cache the values are calculated or retrieved from other (slow) systems, so the cluster can be AVAILABLE and reload the entries
- In other cases there might be a WARNING/ERROR that all members are back from split, there is maybe some data lost and automaticaly or manually set back to AVAILABLE
It might be complicated to calculate this modes, but a configuration for partition-handling might give the possibility to the administrator to decide which behaviour is apropriate for a cache
i.e.
<partition-handling enabled="true" healing="HEALING.MODE"/>
where modes are
AVAILABLE_NO_WARNING back to available after all nodes from "last stable" are back
AVAILABLE_WARNING_DATALOST dto. but log a warning that some DATA can be lost
WARNING_DATALOST only a warning and a hint how to enable manually
NONE same as current behaviour (if necessary, maybe WARNING_DATALOST is similar or better)
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years
[JBoss JIRA] (ISPN-5270) Deadlock in InfinispanDirectoryProvider startup
by Sanne Grinovero (JIRA)
[ https://issues.jboss.org/browse/ISPN-5270?page=com.atlassian.jira.plugin.... ]
Sanne Grinovero commented on ISPN-5270:
---------------------------------------
Great detective work :) Now I understand.
The Infinispan {{DirectoryProvider}} seeks for the 3 caches which have default (by convention) names to store the index, and since the default configuration is indexed those 3 caches will each trigger another bootstrap of the Search engine, triggering again the DP initialization and requiring the 3 caches to be started. We normally don't initialize the DP eagerly, but the {{ProgrammaticSearchMappingProvider}} requires us to start it.
The workaround is simple: never ever enable indexing on the default cache.
I see different solutions possible:
# we actively validate against the caches being used for index storage to be used for indexing as well
# Initialization of the DP is wrapped into a Future?
# It could be very helpeful to users if we could start the 3 "conventional" index storage caches using a configuration which we control (not inherit)
I don't like the first solution because - even if it makes sense for production use - as it makes for yet another thing people need to take care of, especially for a quick POC I don't see why we should prevent people to store it all in one cache.
More ideas?
> Deadlock in InfinispanDirectoryProvider startup
> -----------------------------------------------
>
> Key: ISPN-5270
> URL: https://issues.jboss.org/browse/ISPN-5270
> Project: Infinispan
> Issue Type: Bug
> Components: Embedded Querying
> Affects Versions: 7.2.0.Alpha1, 7.1.1.Final
> Reporter: Dan Berindei
> Assignee: Gustavo Fernandes
> Priority: Minor
> Attachments: surefire.stacks, surefire2.stacks
>
>
> The InfinispanDirectoryProvider tries to start the metadata, data, and locking caches when it starts up, with {{DefaultCacheManager.startCaches()}}.
> However, when one of these caches (e.g. the metadata cache) starts, the {{LifecycleManager.cacheStarting()}}, which can then try to start the InfinispanDirectoryProvider again:
> {noformat}
> "CacheStartThread,null,LuceneIndexesMetadata" prio=10 tid=0x00007f5f74484000 nid=0xe42 in Object.wait() [0x00007f5efff48000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000c2180000> (a org.infinispan.manager.DefaultCacheManager$1)
> at java.lang.Thread.join(Thread.java:1281)
> - locked <0x00000000c2180000> (a org.infinispan.manager.DefaultCacheManager$1)
> at java.lang.Thread.join(Thread.java:1355)
> at org.infinispan.manager.DefaultCacheManager.startCaches(DefaultCacheManager.java:465)
> at org.hibernate.search.infinispan.spi.InfinispanDirectoryProvider.start(InfinispanDirectoryProvider.java:84)
> at org.hibernate.search.indexes.spi.DirectoryBasedIndexManager.initialize(DirectoryBasedIndexManager.java:88)
> at org.hibernate.search.indexes.impl.IndexManagerHolder.createIndexManager(IndexManagerHolder.java:256)
> at org.hibernate.search.indexes.impl.IndexManagerHolder.createIndexManager(IndexManagerHolder.java:513)
> - locked <0x00000000ce6001d0> (a org.hibernate.search.indexes.impl.IndexManagerHolder)
> at org.hibernate.search.indexes.impl.IndexManagerHolder.createIndexManagers(IndexManagerHolder.java:482)
> at org.hibernate.search.indexes.impl.IndexManagerHolder.buildEntityIndexBinding(IndexManagerHolder.java:91)
> - locked <0x00000000ce6001d0> (a org.hibernate.search.indexes.impl.IndexManagerHolder)
> at org.hibernate.search.spi.SearchIntegratorBuilder.initDocumentBuilders(SearchIntegratorBuilder.java:366)
> at org.hibernate.search.spi.SearchIntegratorBuilder.buildNewSearchFactory(SearchIntegratorBuilder.java:204)
> at org.hibernate.search.spi.SearchIntegratorBuilder.buildSearchIntegrator(SearchIntegratorBuilder.java:122)
> at org.hibernate.search.spi.SearchFactoryBuilder.buildSearchFactory(SearchFactoryBuilder.java:35)
> at org.infinispan.query.impl.LifecycleManager.getSearchFactory(LifecycleManager.java:260)
> at org.infinispan.query.impl.LifecycleManager.cacheStarting(LifecycleManager.java:102)
> at org.infinispan.factories.ComponentRegistry.notifyCacheStarting(ComponentRegistry.java:230)
> at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:216)
> at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:814)
> at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:591)
> at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:546)
> at org.infinispan.manager.DefaultCacheManager.access$100(DefaultCacheManager.java:115)
> at org.infinispan.manager.DefaultCacheManager$1.run(DefaultCacheManager.java:452)
> {noformat}
> This can hang the test, the attached thread dumps show {{EmbeddedCompatTest}} and {{IndexCacheStopTest}}.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
11 years