[JBoss JIRA] (ISPN-4498) OutOfMemoryError when CI is ran with tracing
by William Burns (JIRA)
William Burns created ISPN-4498:
-----------------------------------
Summary: OutOfMemoryError when CI is ran with tracing
Key: ISPN-4498
URL: https://issues.jboss.org/browse/ISPN-4498
Project: Infinispan
Issue Type: Bug
Security Level: Public (Everyone can see)
Components: Test Suite - Server
Affects Versions: 7.0.0.Alpha4
Reporter: William Burns
Assignee: Mircea Markus
The CI will fail very often when tracing is enabled with some sort of OOM error.
http://ci.infinispan.org/viewType.html?buildTypeId=Infinispan_MasterHotsp...
Looking closer this appears to be an issue with an accumulation of Threads. These threads aren't running either but rather are held in memory unneedingly. It appears that they are being retained in the log4j1.2 NDC class in it's hashtable. Upon further investigation we never call to NDC.remove which would clear up the current any dead threads.
I have crated JBLOGGING-106 to fix this as well. In the mean time we shouldn't use NDC.
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 6 months
[JBoss JIRA] (ISPN-4497) Race condition in LocalLockMergingSegmentReadLocker results in file content being deleted
by Anuj Shah (JIRA)
[ https://issues.jboss.org/browse/ISPN-4497?page=com.atlassian.jira.plugin.... ]
Anuj Shah commented on ISPN-4497:
---------------------------------
I have a 'test' of sorts to demonstrate the problem:
{code:java}
@Test
public void testMultiThreaded() throws InterruptedException {
EmbeddedCacheManager cacheManager = new DefaultCacheManager();
Cache<Object, Object> metadata = cacheManager.getCache("metadata");
Cache<Object, Object> chunks = cacheManager.getCache("chunks");
Cache<Object, Integer> locks = cacheManager.getCache("locks");
metadata.put(new FileCacheKey("indexName", "fileName"), new FileMetadata(10));
int numThreads = 10;
final LocalLockMergingSegmentReadLocker llmsrl = new LocalLockMergingSegmentReadLocker(locks, chunks, metadata, "indexName");
for (int i = 0; i < numThreads; i++) {
Thread thread = new Thread(new Runnable() {
int counter = 0;
@Override
public void run() {
try {
while (true) {
llmsrl.acquireReadLock("fileName");
Thread.sleep(10);
llmsrl.deleteOrReleaseReadLock("fileName");
// Take a break every now and a again to try and avoid the same LocalReadLock being used constantly
if (counter++ % 10 == 0) {
Thread.sleep(100);
}
}
} catch (InterruptedException e) {
}
}
});
thread.setDaemon(true);
thread.start();
}
// Keep checking every 100ms. The file should never be deleted.
while (true) {
Thread.sleep(100);
assertNotNull(metadata.get(new FileCacheKey("indexName", "fileName")));
}
}
{code}
For me this fails in around 8min
> Race condition in LocalLockMergingSegmentReadLocker results in file content being deleted
> -----------------------------------------------------------------------------------------
>
> Key: ISPN-4497
> URL: https://issues.jboss.org/browse/ISPN-4497
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Lucene Directory
> Affects Versions: 5.2.6.Final
> Reporter: Anuj Shah
> Assignee: Sanne Grinovero
>
> There is a race condition in LocalLockMergingSegmentReadLocker which can lead to more calls delete on the underlying DistributedSegmentReadLocker which results in the file being removed from the caches.
> This happens with three or more threads acquiring and releasing locks on the same file simultaneously:
> # Thread 1 (T1) acquires a lock and creates a {{LocalReadLock}}, call it L1 - the underlying lock is acquired
> # T2 starts to acquire and holds a reference to L1
> # T3 starts to acquire and holds a reference to L1
> # T1 releases - L1 at this stage only has value of 1 - so the underlying lock is released, and L1 is removed from the map
> # T2 continues - finds L1 with value 0 and acquires the underlying lock
> # T3 continues - increments L1 value to 2
> # T2 releases - creates a new {{LocalReadLock}}, L2 - this has zero value so the underlying lock is released, and L2 is removed from the map
> # T3 releases - creates a new {{LocalReadLock}}, L3 - this has zero value so the underlying lock is released, and L3 is removed from the map
> # The final step triggers a real file delete as underlying lock is released one too many times
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 6 months
[JBoss JIRA] (ISPN-4497) Race condition in LocalLockMergingSegmentReadLocker results in file content being deleted
by Anuj Shah (JIRA)
Anuj Shah created ISPN-4497:
-------------------------------
Summary: Race condition in LocalLockMergingSegmentReadLocker results in file content being deleted
Key: ISPN-4497
URL: https://issues.jboss.org/browse/ISPN-4497
Project: Infinispan
Issue Type: Bug
Security Level: Public (Everyone can see)
Components: Lucene Directory
Affects Versions: 5.2.6.Final
Reporter: Anuj Shah
Assignee: Sanne Grinovero
There is a race condition in LocalLockMergingSegmentReadLocker which can lead to more calls delete on the underlying DistributedSegmentReadLocker which results in the file being removed from the caches.
This happens with three or more threads acquiring and releasing locks on the same file simultaneously:
# Thread 1 (T1) acquires a lock and creates a {{LocalReadLock}}, call it L1 - the underlying lock is acquired
# T2 starts to acquire and holds a reference to L1
# T3 starts to acquire and holds a reference to L1
# T1 releases - L1 at this stage only has value of 1 - so the underlying lock is released, and L1 is removed from the map
# T2 continues - finds L1 with value 0 and acquires the underlying lock
# T3 continues - increments L1 value to 2
# T2 releases - creates a new {{LocalReadLock}}, L2 - this has zero value so the underlying lock is released, and L2 is removed from the map
# T3 releases - creates a new {{LocalReadLock}}, L3 - this has zero value so the underlying lock is released, and L3 is removed from the map
# The final step triggers a real file delete as underlying lock is released one too many times
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 6 months
[JBoss JIRA] (ISPN-4343) Rest rolling upgrades, distributed -- new cluster can't load from old cluster properly
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-4343?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-4343:
-----------------------------------------------
Tomas Sykora <tsykora(a)redhat.com> changed the Status of [bug 1104659|https://bugzilla.redhat.com/show_bug.cgi?id=1104659] from NEW to CLOSED
> Rest rolling upgrades, distributed -- new cluster can't load from old cluster properly
> --------------------------------------------------------------------------------------
>
> Key: ISPN-4343
> URL: https://issues.jboss.org/browse/ISPN-4343
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Loaders and Stores, Server
> Affects Versions: 7.0.0.Alpha4
> Reporter: Tomas Sykora
> Assignee: Tomas Sykora
> Priority: Critical
> Labels: rolling_upgrade
> Attachments: cannot_be_cast.txt, clustered-rest-rolling-upgrade.xml, clustered.xml, ISPN-4343.txt, ISPN-4343.zip, restRollUpsTraceLog.zip
>
>
> A try to mimic the process of REST Rolling Upgrades for one old and new server in a clustered environment failed.
> Scenario is quite simple, we start 2 old servers, store some data in, start 2 new servers and point clients to that new cluster.
> When issuing a get on a new cluster (want to fetch old entry from old store), the operation fails with attached stack trace.
> I also include current ISPN testsuite where is added testRestRollingUpgradesDiffVersionsDist test as a reproducer.
> Respective changes are mirrored in my remote branch: https://github.com/tsykora/infinispan/tree/ISPN-4330
> You can run test like:
> mvn clean verify -P suite.rolling.upgrades -Dzip.dist.old=/home/you/servers/previous-ispn-server-version.zip -Dtest=RestRollingUpgradesTest#testRestRollingUpgradesDiffVersionsDist
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 6 months
[JBoss JIRA] (ISPN-4343) Rest rolling upgrades, distributed -- new cluster can't load from old cluster properly
by Tomas Sykora (JIRA)
[ https://issues.jboss.org/browse/ISPN-4343?page=com.atlassian.jira.plugin.... ]
Tomas Sykora resolved ISPN-4343.
--------------------------------
Resolution: Rejected
Update:
I have 2 important information:
1) This use case is working for me now as well.
2) I very like [~pruivo] who helped me a lot with this copy-paste-damn-that-easily-overlooked-thing issue!
Closing, not a bug
Thanks!
> Rest rolling upgrades, distributed -- new cluster can't load from old cluster properly
> --------------------------------------------------------------------------------------
>
> Key: ISPN-4343
> URL: https://issues.jboss.org/browse/ISPN-4343
> Project: Infinispan
> Issue Type: Bug
> Security Level: Public(Everyone can see)
> Components: Loaders and Stores, Server
> Affects Versions: 7.0.0.Alpha4
> Reporter: Tomas Sykora
> Assignee: Tomas Sykora
> Priority: Critical
> Labels: rolling_upgrade
> Attachments: cannot_be_cast.txt, clustered-rest-rolling-upgrade.xml, clustered.xml, ISPN-4343.txt, ISPN-4343.zip, restRollUpsTraceLog.zip
>
>
> A try to mimic the process of REST Rolling Upgrades for one old and new server in a clustered environment failed.
> Scenario is quite simple, we start 2 old servers, store some data in, start 2 new servers and point clients to that new cluster.
> When issuing a get on a new cluster (want to fetch old entry from old store), the operation fails with attached stack trace.
> I also include current ISPN testsuite where is added testRestRollingUpgradesDiffVersionsDist test as a reproducer.
> Respective changes are mirrored in my remote branch: https://github.com/tsykora/infinispan/tree/ISPN-4330
> You can run test like:
> mvn clean verify -P suite.rolling.upgrades -Dzip.dist.old=/home/you/servers/previous-ispn-server-version.zip -Dtest=RestRollingUpgradesTest#testRestRollingUpgradesDiffVersionsDist
--
This message was sent by Atlassian JIRA
(v6.2.6#6264)
10 years, 6 months