[infinispan-issues] [JBoss JIRA] (ISPN-5270) Deadlock in InfinispanDirectoryProvider startup
Sanne Grinovero (JIRA)
issues at jboss.org
Thu Mar 12 14:12:19 EDT 2015
[ https://issues.jboss.org/browse/ISPN-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049572#comment-13049572 ]
Sanne Grinovero commented on ISPN-5270:
---------------------------------------
Pasting a conversation on this topic; my conclusion is that checking the index-storing cache when the indexed cache is starting is evil as it leads to some configuration to be valid depending on the order in which Cache(s) are initialized.
One could have a perfectly starting cluster, running fine for weeks, then reboots and is unable to get it started again because we refuse the configuration in the current context.
{noformat}[17:02] <sannegrinovero> gustavonalle: regarding ISPN-5270, your test makes sense but do you have an exaplantion for why the testsuite would trigger it "occasionally" ?
[17:02] <jbossbot> jira [ISPN-5270] Deadlock in InfinispanDirectoryProvider startup [Coding In Progress (Unresolved) Bug, Minor, Gustavo Fernandes] https://issues.jboss.org/browse/ISPN-5270
[17:02] <gustavonalle> sannegrinovero, that the weird thing.
[17:03] <gustavonalle> sannegrinovero, no idea. The tests uses RAM, but somehow the stack trace contradicts it
[17:04] <sannegrinovero> gustavonalle: ok. What's the priority you're giving this issue? It seems a problem we had since ages and didn't really bite any user so far
[17:04] <sannegrinovero> I'm questioning priority not because I don't want to fix it, but because I'd rather focus on some of the other tasks
[17:05] <sannegrinovero> also it's definitely a configuration problem and we can't validate all of them
[17:08] <gustavonalle> sannegrinovero, a deadlock in the testsuite is annoying
[17:08] <sannegrinovero> sure but you have no idea how it happens :)
[17:09] <sannegrinovero> so it looks like we might be fixing the wrong issue
[17:10] <sannegrinovero> gustavonalle: on JIRA you say "Enabling indexing on Lucene* caches sounds a mistake, and disabling it is a good thing , since it won't leave many threads hanging around and intercepting every single index call."
[17:10] <sannegrinovero> and while I agree one shouldn't enable indexing on a Lucene cache in a well-tuned production system, I don't see why we should fail hard on that for someone just trying it out.
[17:11] <sannegrinovero> also, there won't be any thread started as long as no indexed type is actually stored.
[17:11] <gustavonalle> sannegrinovero, there will be an entity because of the ProgrammaticMapping
[17:12] <sannegrinovero> in terms of usability, it's annoying that people need-to-know about these special considerations
[17:12] <sannegrinovero> well most people won't have an auto-discovered ProgrammaticMapping
[17:12] <gustavonalle> sannegrinovero, what about people using the server?
[17:13] <sannegrinovero> we should ship server with sensible configurations for the index storage?
[17:13] <sannegrinovero> Is this currently all up to user needing to make the configuration?
[17:14] <sannegrinovero> gustavonalle: in Search/ORM we include this, and it's being used by default: https://github.com/hibernate/hibernate-search/blob/953fba0a641ac8b7d3aa9b634d63874aa1dcedb9/infinispan/src/main/resources/default-hibernatesearch-infinispan.xml
[17:15] <sannegrinovero> the problem is that in Infinispan we don't use this, but rather inject the same Infinispan CacheManager
[17:15] <gustavonalle> sannegrinovero, remote-query-server.jar ships with the META-INF/services
[17:16] <sannegrinovero> so the Search/Infinispan DP is expecting that configuration to be applied
[17:16] <gustavonalle> sannegrinovero, So I wonder if someone enables indexing on server with the infinispan directory and gets the cache, it will deadlock :)
[17:16] <sannegrinovero> but is allowing the user to override
[17:16] <sannegrinovero> problem is that in Infinispan Query, that correct configuration is not being set by default
[17:17] <sannegrinovero> gustavonalle: right so the solution is to paste that same configuration in server?
[17:17] <sannegrinovero> define the three caches in a sensible way by default
[17:17] <sannegrinovero> rather than having them set by the user - or worse taking the (wrong) defaults from the default cache
[17:18] <sannegrinovero> the user will still be able to shoot himself though. Which is why I was thinking in somehow initializing the DP asynchronously
[17:21] <sannegrinovero> gustavonalle: BTW people should be using org.infinispan.query.indexmanager.InfinispanIndexManager , not the DP directly.. so you are in control of how to initialize the DP.
[17:21] <sannegrinovero> does that help?
[17:27] <gustavonalle> sannegrinovero, not always the InfinispanIndexManager makes sense, for example, single node indexes or local indexes
[17:27] <sannegrinovero> gustavonalle: I guess technically we're also in the position of being able to adjust infinispan-core to be able to return us a Cache which has storage capabilities initialized, even though the Search engine wasn't initialized yet.
[17:28] <sannegrinovero> gustavonalle: ok good point, but generally speaking we should stop exposing all of DP and rather create an X IndexManagers suited for each various use case
[17:28] <sannegrinovero> if we do that, we can simplify configuration by removing the DP related options
[17:29] <sannegrinovero> and also you then have a better hook to deal with such situations
[17:29] <sannegrinovero> for the issue at hand
[17:29] <sannegrinovero> it looks like we agree that all easy workarounds have nasty usability effects
[17:29] <sannegrinovero> so WDYT of making the DP initialization asynch?
[17:30] <gustavonalle> sannegrinovero, I have on easy woraround
[17:30] <gustavonalle> sannegrinovero, one sec
[17:31] <gustavonalle> sannegrinovero, https://github.com/infinispan/infinispan/compare/infinispan:master...gustavonalle:start_deadlock?expand=1
[17:32] <gustavonalle> sannegrinovero, that's the one you did not like :)
[17:33] <sannegrinovero> indeed I think it's very confusing, but I could live with that if we agree it's a temporary solution
[17:34] <sannegrinovero> just asked if emmanuel has an opinion
[17:34] <gustavonalle> sannegrinovero, pros: will never deadlock cons: cannot have data and index in the same cache
[17:34] <gustavonalle> sannegrinovero, but the cons are more like a corner case
[17:34] <sannegrinovero> consider it's quite a change of behaviour, I'm not sure the WARN will be noted
[17:35] <gustavonalle> sannegrinovero, you think so? What behaviour it changes? Apart from not deadlocking on certain circumstances
[17:36] <gustavonalle> sannegrinovero, we are doing something similar when the user sets both InfinispanIndexManager and DP to anything else that is not 'infinispan'
[17:37] <sannegrinovero> the thing is I suspect lots of people might be doing this, and not having any issue because nobody uses programmatic configuration
[17:38] <sannegrinovero> so why not throwing a fatal exception?
[17:39] <sannegrinovero> If you really must punish them for such a configuration, do it with style :D
[17:40] <sannegrinovero> but I have to say that as a user not interested in auto-discovery of programmatic configuration, I'd be a bit annoyed with you breaking my shiny demo :)
[17:41] <gustavonalle> sannegrinovero, that patch did not break any test on the suite
[17:41] <sannegrinovero> do you want me to add some tests :D
[17:42] <gustavonalle> sannegrinovero, :)
[17:42] <gustavonalle> sannegrinovero, a failing test would be get the default cache and manually define 3 caches pointing back to the default cache
[17:43] <gustavonalle> all over configurations would work
[17:43] <sannegrinovero> no I don't think you need to define the configurations.. going with the default would be enough to break. Your own test proves it :)
[17:44] <sannegrinovero> I mean, your test the one you pasted in comments of ISPN-5270
[17:44] <jbossbot> jira [ISPN-5270] Deadlock in InfinispanDirectoryProvider startup [Coding In Progress (Unresolved) Bug, Minor, Gustavo Fernandes] https://issues.jboss.org/browse/ISPN-5270
[17:45] <sannegrinovero> so let's try to converge to a decision
[17:45] <gustavonalle> sannegrinovero, that one deadlocks on master, but not on https://github.com/infinispan/infinispan/compare/infinispan:master...gustavonalle:start_deadlock
[17:45] <sannegrinovero> I didn't mean to strongly disagree on your solution, just make sure you see the perspective of those users which will swear at you
[17:47] <sannegrinovero> gustavonalle: suppose you're starting a Cache A with indexing enabled. All fine so far.
[17:47] <sannegrinovero> gustavonalle: then next day you start a second Cache "B" which also has indexing enabled..
[17:47] <sannegrinovero> but this one is configured to store indexes in a Cache named "A"
[17:48] <sannegrinovero> what is supposed to happen?
[17:51] <gustavonalle> sannegrinovero, and then I decide to do a cacheA.size() :)
[17:52] <gustavonalle> sannegrinovero, this is a use case, although weird I agree
[17:53] <sannegrinovero> gustavonalle: ? I might have lost some of your messages, as I'm not understanding the cacheA.size() answer
[17:54] <sannegrinovero> my point is that in this specific order, you will refuse to enable indexing
[17:54] <sannegrinovero> but in a different order, you will allow it
[17:54] <gustavonalle> sannegrinovero, ignore that, I was just saying what else the use would want to do in that situation: a cacheA.size would give a weird result
[17:55] <sannegrinovero> ok but we're trying to fix that too :)
[17:55] <sannegrinovero> but it would be very weird that the indexing being enabled/not enabled would depend on order of startup of services!
[17:56] <sannegrinovero> imagine yourself trying to restart the cluster and not being able to get it up as it was before because of some weird rule in indexing order
[17:57] <sannegrinovero> gustavonalle: I guess I just convinced myself that this definitely isn't the way to go :)
[17:57] <sannegrinovero> unpredictable is never good
[17:59] <sannegrinovero> are you convinced now?
[18:01] <gustavonalle> hmm...A and B are always indexed?
[18:02] <gustavonalle> if so, it would fail always
[18:03] <sannegrinovero> no, A is indexed but not storing indexes in B so it's valid if it's started first.
[18:06] <gustavonalle> sannegrinovero, yes, you are right maybe this flexibility is too much
[18:08] <sannegrinovero> ok. So what can we do?
[18:09] <sannegrinovero> either we refuse to store an index in an indexed Cache - and by that we refuse starting -> exception
[18:09] <sannegrinovero> or we make sure it just works, making the DP startup asynch
[18:09] <sannegrinovero> am I missing other options?
[18:10] <sannegrinovero> I think I'd make DP async, just because it provides the better transparency to end users
[18:10] <sannegrinovero> but of course we agree it's not an optimal configuration ;){noformat}
> Deadlock in InfinispanDirectoryProvider startup
> -----------------------------------------------
>
> Key: ISPN-5270
> URL: https://issues.jboss.org/browse/ISPN-5270
> Project: Infinispan
> Issue Type: Bug
> Components: Embedded Querying
> Affects Versions: 7.2.0.Alpha1, 7.1.1.Final
> Reporter: Dan Berindei
> Assignee: Gustavo Fernandes
> Priority: Minor
> Attachments: surefire.stacks, surefire2.stacks
>
>
> The InfinispanDirectoryProvider tries to start the metadata, data, and locking caches when it starts up, with {{DefaultCacheManager.startCaches()}}.
> However, when one of these caches (e.g. the metadata cache) starts, the {{LifecycleManager.cacheStarting()}}, which can then try to start the InfinispanDirectoryProvider again:
> {noformat}
> "CacheStartThread,null,LuceneIndexesMetadata" prio=10 tid=0x00007f5f74484000 nid=0xe42 in Object.wait() [0x00007f5efff48000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000000c2180000> (a org.infinispan.manager.DefaultCacheManager$1)
> at java.lang.Thread.join(Thread.java:1281)
> - locked <0x00000000c2180000> (a org.infinispan.manager.DefaultCacheManager$1)
> at java.lang.Thread.join(Thread.java:1355)
> at org.infinispan.manager.DefaultCacheManager.startCaches(DefaultCacheManager.java:465)
> at org.hibernate.search.infinispan.spi.InfinispanDirectoryProvider.start(InfinispanDirectoryProvider.java:84)
> at org.hibernate.search.indexes.spi.DirectoryBasedIndexManager.initialize(DirectoryBasedIndexManager.java:88)
> at org.hibernate.search.indexes.impl.IndexManagerHolder.createIndexManager(IndexManagerHolder.java:256)
> at org.hibernate.search.indexes.impl.IndexManagerHolder.createIndexManager(IndexManagerHolder.java:513)
> - locked <0x00000000ce6001d0> (a org.hibernate.search.indexes.impl.IndexManagerHolder)
> at org.hibernate.search.indexes.impl.IndexManagerHolder.createIndexManagers(IndexManagerHolder.java:482)
> at org.hibernate.search.indexes.impl.IndexManagerHolder.buildEntityIndexBinding(IndexManagerHolder.java:91)
> - locked <0x00000000ce6001d0> (a org.hibernate.search.indexes.impl.IndexManagerHolder)
> at org.hibernate.search.spi.SearchIntegratorBuilder.initDocumentBuilders(SearchIntegratorBuilder.java:366)
> at org.hibernate.search.spi.SearchIntegratorBuilder.buildNewSearchFactory(SearchIntegratorBuilder.java:204)
> at org.hibernate.search.spi.SearchIntegratorBuilder.buildSearchIntegrator(SearchIntegratorBuilder.java:122)
> at org.hibernate.search.spi.SearchFactoryBuilder.buildSearchFactory(SearchFactoryBuilder.java:35)
> at org.infinispan.query.impl.LifecycleManager.getSearchFactory(LifecycleManager.java:260)
> at org.infinispan.query.impl.LifecycleManager.cacheStarting(LifecycleManager.java:102)
> at org.infinispan.factories.ComponentRegistry.notifyCacheStarting(ComponentRegistry.java:230)
> at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:216)
> at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:814)
> at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:591)
> at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:546)
> at org.infinispan.manager.DefaultCacheManager.access$100(DefaultCacheManager.java:115)
> at org.infinispan.manager.DefaultCacheManager$1.run(DefaultCacheManager.java:452)
> {noformat}
> This can hang the test, the attached thread dumps show {{EmbeddedCompatTest}} and {{IndexCacheStopTest}}.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
More information about the infinispan-issues
mailing list