[
https://issues.jboss.org/browse/ISPN-1602?page=com.atlassian.jira.plugin....
]
Erik Salter edited comment on ISPN-1602 at 12/19/11 6:04 PM:
-------------------------------------------------------------
Now there are lock timeouts with a stable cluster (no rehash) with ISPN-1584 applied.
http://dl.dropbox.com/u/50401510/5.1.0.CR2/200threads-withoutrehash.7z (I apologize in
advance for the size of this file.)
A good place to start is ServiceGroupKey[edgeDeviceId=45,serviceGroupNo=144] on dht14.
This test is with 200 threads accessing 252 keys. The only thing interesting about this
cache is that it's simply an aggregator. No data is being stored in this cache.
For reference, there is another cache (SopSegment) that has these 200 threads accessing 4
keys without issue.
EDIT:
Here is the issue reproduced with a bit more sane log size and only one cluster in test.
All the locks become stale once two nodes drop out of the cluster.
http://dl.dropbox.com/u/50401510/5.1.0.CR2/stalelock.tgz.
was (Author: an1310):
Now there are lock timeouts with a stable cluster (no rehash) with ISPN-1584 applied.
http://dl.dropbox.com/u/50401510/5.1.0.CR2/200threads-withoutrehash.7z (I apologize in
advance for the size of this file.)
A good place to start is ServiceGroupKey[edgeDeviceId=45,serviceGroupNo=144] on dht14.
This test is with 200 threads accessing 252 keys. The only thing interesting about this
cache is that it's simply an aggregator. No data is being stored in this cache.
For reference, there is another cache (SopSegment) that has these 200 threads accessing 4
keys without issue.
Single view change causes stale locks
-------------------------------------
Key: ISPN-1602
URL:
https://issues.jboss.org/browse/ISPN-1602
Project: Infinispan
Issue Type: Bug
Components: Core API
Affects Versions: 5.1.0.CR1
Reporter: Erik Salter
Assignee: Dan Berindei
Priority: Critical
Fix For: 5.1.0.CR2
Attachments: erm_tcp.xml, session_udp.xml
During load testing of 5.1.0.CR1, we're encountering JGroups 3.x dropping views. We
know due to ISPN-1581, if the number of view changes > 3, there could be a stale lock
on a failed commit. However, we're seeing stale locks occur on a single view change.
In the following logs, the affected cluster is the erm-cluster-xxxx
(We also don't know why JGroups 3.x is unstable. We suspected FLUSH and incorrect FD
settings, but we removed them, and we're still getting dropped messages)
The trace logs (It isn't long at all before the issue occurs) are at:
http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht10/server.l...
http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht11/server.l...
http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht12/server.l...
http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht13/server.l...
http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht14/server.l...
http://dl.dropbox.com/u/50401510/5.1.0.CR1/dec08viewchange/dht15/server.l...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.jboss.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:
http://www.atlassian.com/software/jira