[JBoss JIRA] (ISPN-10343) LocalCacheStateTransferTest random failures
by Dan Berindei (Jira)
Dan Berindei created ISPN-10343:
-----------------------------------
Summary: LocalCacheStateTransferTest random failures
Key: ISPN-10343
URL: https://issues.jboss.org/browse/ISPN-10343
Project: Infinispan
Issue Type: Bug
Components: Test Suite - Core
Affects Versions: 10.0.0.Beta3
Reporter: Dan Berindei
Fix For: 10.0.0.CR1
NodeA starts xsite state transfer before the bridge cluster view is updated, and the push start command is dropped without reaching NodeB. Then NodeA sends a cancel command which does reach NodeB, but before NodeB updates its bridge cluster view, so the response is dropped, and NodeA waits for the response for 20 mins (if the JVM wasn't killed).
{noformat}
01:40:54,271 INFO (testng-Test:[]) [TestSuiteProgress] Test starting: org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.testStateTransferWithClusterIdle
01:40:54,274 INFO (testng-Test:[]) [CLUSTER] [Context=Test][Context=Test-NodeA-48836] ISPN100005: Site 'NYC-2' is online.
01:40:54,277 TRACE (testng-Test:[]) [JGroupsTransport] Test-NodeA-48836 sending backup request 2 to SiteMaster(NYC-2): XSiteStateTransferControlCommand{control=START_RECEIVE, siteName='null', statusOk=false, cacheName='Test'}
01:40:54,277 ERROR (testng-Test:[]) [TEST_RELAY2] Test-NodeA-48836: no route to NYC-2: dropping message
01:40:54,313 TRACE (jgroups-5,bridge-org.infinispan.xsite.statetransfer.Test,_Test-NodeA-48836:LON-1:[]) [TEST_RELAY2] [Relayer _Test-NodeA-48836:LON-1] view: [_Test-NodeA-48836:LON-1|1] (2) [_Test-NodeA-48836:LON-1, _Test-NodeB-37463:NYC-2]
01:40:54,313 TRACE (jgroups-5,bridge-org.infinispan.xsite.statetransfer.Test,_Test-NodeA-48836:LON-1:[]) [JGroupsTransport] Sites view changed: up [NYC-2], down [], new view is [NYC-2, LON-1]
01:40:54,347 TRACE (testng-Test:[]) [JGroupsBackupResponse] Communication error with site NYC-2
org.infinispan.remoting.transport.jgroups.SuspectException: ISPN000400: Node null was suspected
at org.infinispan.remoting.transport.ResponseCollectors.remoteNodeSuspected(ResponseCollectors.java:34) ~[classes/:?]
at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:31) ~[classes/:?]
at org.infinispan.remoting.transport.impl.SingleResponseCollector.targetNotFound(SingleResponseCollector.java:17) ~[classes/:?]
at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:23) ~[classes/:?]
at org.infinispan.remoting.transport.jgroups.SingleSiteRequest.receiveResponse(SingleSiteRequest.java:50) ~[classes/:?]
at org.infinispan.remoting.transport.jgroups.SingleSiteRequest.sitesUnreachable(SingleSiteRequest.java:68) ~[classes/:?]
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$siteUnreachable$7(JGroupsTransport.java:1229) ~[classes/:?]
at org.infinispan.remoting.transport.impl.RequestRepository.lambda$forEach$0(RequestRepository.java:60) ~[classes/:?]
at java.util.concurrent.ConcurrentHashMap.forEach(ConcurrentHashMap.java:1603) ~[?:?]
at org.infinispan.remoting.transport.impl.RequestRepository.forEach(RequestRepository.java:60) ~[classes/:?]
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.siteUnreachable(JGroupsTransport.java:1227) ~[classes/:?]
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$200(JGroupsTransport.java:130) ~[classes/:?]
at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1446) ~[classes/:?]
at org.jgroups.JChannel.up(JChannel.java:756) ~[jgroups-4.1.1.Final.jar:4.1.1.Final]
at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:914) ~[jgroups-4.1.1.Final.jar:4.1.1.Final]
at org.jgroups.protocols.relay.RELAY2.handleMessage(RELAY2.java:533) ~[jgroups-4.1.1.Final.jar:4.1.1.Final]
Suppressed: org.infinispan.util.logging.TraceException
at org.infinispan.remoting.transport.jgroups.JGroupsBackupResponse.waitForBackupToFinish(JGroupsBackupResponse.java:93) [classes/:?]
at org.infinispan.remoting.transport.RetryOnFailureXSiteCommand.execute(RetryOnFailureXSiteCommand.java:64) [classes/:?]
at org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.controlStateTransferOnRemoteSite(XSiteStateTransferManagerImpl.java:343) [classes/:?]
at org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.startPushState(XSiteStateTransferManagerImpl.java:136) [classes/:?]
at org.infinispan.xsite.XSiteAdminOperations.pushState(XSiteAdminOperations.java:276) [classes/:?]
at org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.startStateTransfer(LocalCacheStateTransferTest.java:99) [test-classes/:?]
at org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.testStateTransferWithClusterIdle(LocalCacheStateTransferTest.java:53) [test-classes/:?]
...
01:40:54,348 TRACE (testng-Test:[]) [JGroupsTransport] Test-NodeA-48836 sending backup request 4 to SiteMaster(NYC-2): XSiteStateTransferControlCommand{control=FINISH_RECEIVE, siteName='null', statusOk=false, cacheName='Test'}
01:40:54,348 TRACE (testng-Test:[]) [TEST_RELAY2] routing message to SiteMaster(NYC-2) via _Test-NodeB-37463:NYC-2
01:40:54,349 DEBUG (remote-thread-Test-NodeB-p37359-t2:[]) [XSiteStateConsumerImpl] Ending state transfer from LON-1
01:40:54,349 TRACE (remote-thread-Test-NodeB-p37359-t2:[]) [JGroupsTransport] Test-NodeB-37463 sending response for request 4 to Test-NodeA-48836:LON-1: SuccessfulResponse(null)
01:40:54,349 ERROR (remote-thread-Test-NodeB-p37359-t2:[]) [TEST_RELAY2] Test-NodeB-37463: no route to LON-1: dropping message
01:40:54,350 TRACE (jgroups-6,Test-NodeB-37463:[]) [TEST_RELAY2] [Relayer _Test-NodeB-37463:NYC-2] view: [_Test-NodeA-48836:LON-1|1] (2) [_Test-NodeA-48836:LON-1, _Test-NodeB-37463:NYC-2]
01:40:54,350 TRACE (jgroups-6,Test-NodeB-37463:[]) [JGroupsTransport] Sites view changed: up [NYC-2, LON-1], down [], new view is [NYC-2, LON-1]
... 5 mins later ...
[ERROR] Test org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.testStateTransferWithClusterIdle has been running for more than 300 seconds. Interrupting the test thread and dumping threads of the test suite process and its children.
"testng-LocalCacheStateTransferTest" #17 prio=5 os_prio=0 cpu=26949.68ms elapsed=898.86s tid=0x00007f527d399800 nid=0x7147 waiting on condition [0x00007f5203cfb000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base(a)11.0.3/Native Method)
- parking to wait for <0x00000000c8300010> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.3/LockSupport.java:234)
at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.3/CompletableFuture.java:1798)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.3/ForkJoinPool.java:3128)
at java.util.concurrent.CompletableFuture.timedGet(java.base@11.0.3/CompletableFuture.java:1868)
at java.util.concurrent.CompletableFuture.get(java.base@11.0.3/CompletableFuture.java:2021)
at org.infinispan.remoting.transport.jgroups.JGroupsBackupResponse.waitForBackupToFinish(JGroupsBackupResponse.java:87)
at org.infinispan.remoting.transport.RetryOnFailureXSiteCommand.execute(RetryOnFailureXSiteCommand.java:64)
at org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.controlStateTransferOnRemoteSite(XSiteStateTransferManagerImpl.java:343)
at org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.handleFailure(XSiteStateTransferManagerImpl.java:328)
at org.infinispan.xsite.statetransfer.XSiteStateTransferManagerImpl.startPushState(XSiteStateTransferManagerImpl.java:147)
at org.infinispan.xsite.XSiteAdminOperations.pushState(XSiteAdminOperations.java:276)
at org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.startStateTransfer(LocalCacheStateTransferTest.java:99)
at org.infinispan.xsite.statetransfer.LocalCacheStateTransferTest.testStateTransferWithClusterIdle(LocalCacheStateTransferTest.java:53)
{noformat}
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months
[JBoss JIRA] (ISPN-10342) Test Thread Leaks aren't clearing in time
by Will Burns (Jira)
[ https://issues.jboss.org/browse/ISPN-10342?page=com.atlassian.jira.plugin... ]
Will Burns updated ISPN-10342:
------------------------------
Status: Open (was: New)
> Test Thread Leaks aren't clearing in time
> -----------------------------------------
>
> Key: ISPN-10342
> URL: https://issues.jboss.org/browse/ISPN-10342
> Project: Infinispan
> Issue Type: Bug
> Components: Test Suite - Core
> Reporter: Will Burns
> Assignee: Will Burns
> Priority: Major
> Fix For: 10.0.0.Beta4
>
>
> The current master and existing PRs have a lot of tests that show as leaking. However they seem to be all waiting on state transfer to complete from stack traces I see. It appears that due to the thread context switching and how many concurrent operations we do there is a delay in their completions. However we should confirm this if so, otherwise it sounds like there is an issue with the threads completing.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months
[JBoss JIRA] (ISPN-10342) Test Thread Leaks aren't clearing in time
by Will Burns (Jira)
Will Burns created ISPN-10342:
---------------------------------
Summary: Test Thread Leaks aren't clearing in time
Key: ISPN-10342
URL: https://issues.jboss.org/browse/ISPN-10342
Project: Infinispan
Issue Type: Bug
Components: Test Suite - Core
Reporter: Will Burns
Assignee: Will Burns
Fix For: 10.0.0.Beta4
The current master and existing PRs have a lot of tests that show as leaking. However they seem to be all waiting on state transfer to complete from stack traces I see. It appears that due to the thread context switching and how many concurrent operations we do there is a delay in their completions. However we should confirm this if so, otherwise it sounds like there is an issue with the threads completing.
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months
[JBoss JIRA] (ISPN-10337) JDBC Purge consistency issues
by Will Burns (Jira)
[ https://issues.jboss.org/browse/ISPN-10337?page=com.atlassian.jira.plugin... ]
Will Burns commented on ISPN-10337:
-----------------------------------
Actually we probably want this backported to 9.4.x as well.
> JDBC Purge consistency issues
> -----------------------------
>
> Key: ISPN-10337
> URL: https://issues.jboss.org/browse/ISPN-10337
> Project: Infinispan
> Issue Type: Bug
> Components: Loaders and Stores
> Affects Versions: 10.0.0.Beta3, 9.4.15.Final
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Priority: Major
> Fix For: 10.0.0.Beta4, 9.4.16.Final
>
>
> The JdbcStringBasedStore's purge method requires the following enhancements:
> # The connection should be enrolled in a transaction via {{autoCommit(false)}} with commit/rollback.
> # Implementations of {{TableManager#getSelectOnlyExpiredRowsSql}} should utilise a SELECT ... FOR UPDATE statement to ensure that the expired entries are locked throughout the Tx
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months
[JBoss JIRA] (ISPN-10337) JDBC Purge consistency issues
by Will Burns (Jira)
[ https://issues.jboss.org/browse/ISPN-10337?page=com.atlassian.jira.plugin... ]
Will Burns reopened ISPN-10337:
-------------------------------
> JDBC Purge consistency issues
> -----------------------------
>
> Key: ISPN-10337
> URL: https://issues.jboss.org/browse/ISPN-10337
> Project: Infinispan
> Issue Type: Bug
> Components: Loaders and Stores
> Affects Versions: 10.0.0.Beta3, 9.4.15.Final
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Priority: Major
> Fix For: 10.0.0.Beta4, 9.4.16.Final
>
>
> The JdbcStringBasedStore's purge method requires the following enhancements:
> # The connection should be enrolled in a transaction via {{autoCommit(false)}} with commit/rollback.
> # Implementations of {{TableManager#getSelectOnlyExpiredRowsSql}} should utilise a SELECT ... FOR UPDATE statement to ensure that the expired entries are locked throughout the Tx
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months
[JBoss JIRA] (ISPN-10337) JDBC Purge consistency issues
by Will Burns (Jira)
[ https://issues.jboss.org/browse/ISPN-10337?page=com.atlassian.jira.plugin... ]
Will Burns updated ISPN-10337:
------------------------------
Fix Version/s: 9.4.16.Final
> JDBC Purge consistency issues
> -----------------------------
>
> Key: ISPN-10337
> URL: https://issues.jboss.org/browse/ISPN-10337
> Project: Infinispan
> Issue Type: Bug
> Components: Loaders and Stores
> Affects Versions: 10.0.0.Beta3, 9.4.15.Final
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Priority: Major
> Fix For: 10.0.0.Beta4, 9.4.16.Final
>
>
> The JdbcStringBasedStore's purge method requires the following enhancements:
> # The connection should be enrolled in a transaction via {{autoCommit(false)}} with commit/rollback.
> # Implementations of {{TableManager#getSelectOnlyExpiredRowsSql}} should utilise a SELECT ... FOR UPDATE statement to ensure that the expired entries are locked throughout the Tx
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months
[JBoss JIRA] (ISPN-10337) JDBC Purge consistency issues
by Will Burns (Jira)
[ https://issues.jboss.org/browse/ISPN-10337?page=com.atlassian.jira.plugin... ]
Will Burns updated ISPN-10337:
------------------------------
Status: Resolved (was: Pull Request Sent)
Fix Version/s: 10.0.0.Beta4
Resolution: Done
> JDBC Purge consistency issues
> -----------------------------
>
> Key: ISPN-10337
> URL: https://issues.jboss.org/browse/ISPN-10337
> Project: Infinispan
> Issue Type: Bug
> Components: Loaders and Stores
> Affects Versions: 10.0.0.Beta3, 9.4.15.Final
> Reporter: Ryan Emerson
> Assignee: Ryan Emerson
> Priority: Major
> Fix For: 10.0.0.Beta4
>
>
> The JdbcStringBasedStore's purge method requires the following enhancements:
> # The connection should be enrolled in a transaction via {{autoCommit(false)}} with commit/rollback.
> # Implementations of {{TableManager#getSelectOnlyExpiredRowsSql}} should utilise a SELECT ... FOR UPDATE statement to ensure that the expired entries are locked throughout the Tx
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months
[JBoss JIRA] (ISPN-10341) Queries should be based on Distributed Streams
by Wolf-Dieter Fink (Jira)
Wolf-Dieter Fink created ISPN-10341:
---------------------------------------
Summary: Queries should be based on Distributed Streams
Key: ISPN-10341
URL: https://issues.jboss.org/browse/ISPN-10341
Project: Infinispan
Issue Type: Feature Request
Components: Embedded Querying, Remote Querying
Reporter: Wolf-Dieter Fink
To allow more efficient operations and simplify the API by using streams it should be possible to use stream operations for queries like this example:
cache.stream().filter("from MyObject where attr = 'bla' ").forEach(cache::delete)
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months
[JBoss JIRA] (ISPN-10340) Queries should be based on Distributed Streams
by Wolf-Dieter Fink (Jira)
Wolf-Dieter Fink created ISPN-10340:
---------------------------------------
Summary: Queries should be based on Distributed Streams
Key: ISPN-10340
URL: https://issues.jboss.org/browse/ISPN-10340
Project: Infinispan
Issue Type: Feature Request
Components: Embedded Querying, Remote Querying
Reporter: Wolf-Dieter Fink
To allow more efficient operations and simplify the API by using streams it should be possible to use stream operations for queries like this example:
cache.stream().filter("from MyObject where attr = 'bla' ").forEach(cache::delete)
--
This message was sent by Atlassian Jira
(v7.12.1#712002)
6 years, 9 months