[JBoss JIRA] (ISPN-4810) Local Transactional Cache loses data when eviction is enabled and there are multiple readers and one writer
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-4810?page=com.atlassian.jira.plugin.... ]
William Burns commented on ISPN-4810:
-------------------------------------
[~dan.berindei] Unfortunately I can't say, but yes I assume it was a null returned from polling the queue. However if the size is greater than max there should be something in the queue to remove - since they should have exclusive access (fullMiss is only ever invoked from a miss which has the lock to the segment). A hit could come in but it would have to be over the maximum queue size in the BatchWrapper and can cause havoc if more than 1 thread was running in a LIRS segment.
In regards to our existing LIRS implementation it doesn't use HIR_NONRESIDENT in a meaningful way, so a value shouldn't be able to be null. The only way an entry is set to non resident is if it is evicted in which case it removes it from the stack and queue (non resident are supposed to stay in the stack).
> Local Transactional Cache loses data when eviction is enabled and there are multiple readers and one writer
> -----------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4810
> URL: https://issues.jboss.org/browse/ISPN-4810
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 6.0.2.Final
> Environment: Windows 7 x64 (NTFS)
> Oracle JDK1.7.0_40
> Apache Maven 3.0.5
> Reporter: Horia Chiorean
> Assignee: William Burns
> Labels: modeshape
> Attachments: ispn_concurrent.zip
>
>
> Using Infinispan 6.0.2 and a local, transactional cache backed by a <singleFile> store, with eviction enabled and a small {{max-entries}} setting, we have the following scenario:
> * the main thread (i.e. the "writer") starts a transaction, adds a batch of strings into the cache and also appends the same strings into a List cache entry and then commits the transaction
> * after the above has finished (i.e. after tx.commit) it fires a number of reader threads where each reader thread
> ** checks that the string entries were added into the cache and
> ** checks that the entries were correctly appended to the List entry
> * the above steps are repeated a number of times
> On any given run, based on timing, we're seeing that at some point (after some time) some of the reader threads will not see the latest version of the List entry (i.e. will not see the latest elements that were added into the list) but rather an old, stale List (sort of a "lost update" scenario).
> If we either:
> * disable eviction or
> * set the {{max-entries}} to a large enough value (which I suspect has the same effect - not evicting anything) the problem doesn't show up.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 3 months
[JBoss JIRA] (ISPN-3395) ISPN000196: Failed to recover cluster state after the current node became the coordinator
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3395?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-3395:
------------------------------------
Fixed in 6.0.0.Beta1 with ISPN-3051.
> ISPN000196: Failed to recover cluster state after the current node became the coordinator
> -----------------------------------------------------------------------------------------
>
> Key: ISPN-3395
> URL: https://issues.jboss.org/browse/ISPN-3395
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Affects Versions: 5.3.0.Final
> Reporter: Mayank Agarwal
> Fix For: 6.0.2.Final, 7.0.0.Final
>
>
> We are using infinispan 5.3.0.Final in our distributed application. we are testing infinispan in HA scenarios and getting following exception when new node becomes co-ordinator.
> ISPN000196: Failed to recover cluster state after the current node became the coordinator
> java.lang.NullPointerException: null
> at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:455) ~[infinispan-core-5.3.0.1.Final.jar:5.3.0.1.Final]
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:235) ~[infinispan-core-5.3.0.1.Final.jar:5.3.0.1.Final]
> at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:647) ~[infinispan-core-5.3.0.1.Final.jar:5.3.0.1.Final]
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.6.0_25]
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) ~[na:1.6.0_25]
> at java.util.concurrent.FutureTask.run(Unknown Source) [na:1.6.0_25]
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) [na:1.6.0_25]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.6.0_25]
> at java.lang.Thread.run(Unknown Source) [na:1.6.0_25]
> This is happening because cacheTopology is null at ClusterTopologyManagerImpl.java:455
> at 449: code is checking cacheTopology for null that for loop which is updating cacheStatusMap at 457 should be in that check itself.
> Fix:
> --- a/core/src/main/java/org/infinispan/topology/ClusterTopologyManagerImpl.java
> +++ b/core/src/main/java/org/infinispan/topology/ClusterTopologyManagerImpl.java
> @@ -448,7 +448,7 @@ public class ClusterTopologyManagerImpl implements ClusterTopologyManager {
> // but didn't get a response back yet
> if (cacheTopology != null) {
> topologyList.add(cacheTopology);
> - }
> +
>
> // Add all the members of the topology that have sent responses first
> // If we only added the sender, we could end up with a different member order
> @@ -457,6 +457,7 @@ public class ClusterTopologyManagerImpl implements ClusterTopologyManager {
> cacheStatusMap.get(cacheName).addMember(member);
> }
> }
> + }
>
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 3 months
[JBoss JIRA] (ISPN-3395) ISPN000196: Failed to recover cluster state after the current node became the coordinator
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3395?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-3395:
-------------------------------
Fix Version/s: 6.0.2.Final
> ISPN000196: Failed to recover cluster state after the current node became the coordinator
> -----------------------------------------------------------------------------------------
>
> Key: ISPN-3395
> URL: https://issues.jboss.org/browse/ISPN-3395
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Affects Versions: 5.3.0.Final
> Reporter: Mayank Agarwal
> Fix For: 6.0.2.Final, 7.0.0.Final
>
>
> We are using infinispan 5.3.0.Final in our distributed application. we are testing infinispan in HA scenarios and getting following exception when new node becomes co-ordinator.
> ISPN000196: Failed to recover cluster state after the current node became the coordinator
> java.lang.NullPointerException: null
> at org.infinispan.topology.ClusterTopologyManagerImpl.recoverClusterStatus(ClusterTopologyManagerImpl.java:455) ~[infinispan-core-5.3.0.1.Final.jar:5.3.0.1.Final]
> at org.infinispan.topology.ClusterTopologyManagerImpl.handleNewView(ClusterTopologyManagerImpl.java:235) ~[infinispan-core-5.3.0.1.Final.jar:5.3.0.1.Final]
> at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener$1.run(ClusterTopologyManagerImpl.java:647) ~[infinispan-core-5.3.0.1.Final.jar:5.3.0.1.Final]
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.6.0_25]
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) ~[na:1.6.0_25]
> at java.util.concurrent.FutureTask.run(Unknown Source) [na:1.6.0_25]
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) [na:1.6.0_25]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.6.0_25]
> at java.lang.Thread.run(Unknown Source) [na:1.6.0_25]
> This is happening because cacheTopology is null at ClusterTopologyManagerImpl.java:455
> at 449: code is checking cacheTopology for null that for loop which is updating cacheStatusMap at 457 should be in that check itself.
> Fix:
> --- a/core/src/main/java/org/infinispan/topology/ClusterTopologyManagerImpl.java
> +++ b/core/src/main/java/org/infinispan/topology/ClusterTopologyManagerImpl.java
> @@ -448,7 +448,7 @@ public class ClusterTopologyManagerImpl implements ClusterTopologyManager {
> // but didn't get a response back yet
> if (cacheTopology != null) {
> topologyList.add(cacheTopology);
> - }
> +
>
> // Add all the members of the topology that have sent responses first
> // If we only added the sender, we could end up with a different member order
> @@ -457,6 +457,7 @@ public class ClusterTopologyManagerImpl implements ClusterTopologyManager {
> cacheStatusMap.get(cacheName).addMember(member);
> }
> }
> + }
>
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 3 months
[JBoss JIRA] (ISPN-5179) Add distributed execution and map/reduce job statistics
by Vladimir Blagojevic (JIRA)
[ https://issues.jboss.org/browse/ISPN-5179?page=com.atlassian.jira.plugin.... ]
Vladimir Blagojevic edited comment on ISPN-5179 at 2/4/15 5:28 AM:
-------------------------------------------------------------------
For the first iteration I propose the following levels of statistics information:
- System wide level
- MapReduceTask level
System wide level includes MapReduceTask ids of the running and completed map/reduce tasks. We will keep a limited history of completed tasks in the cluster registry. All map/reduce stats are cluster-wide, in another words, user/admin does not need to connect to a master task node N and query it for map/reduce jobs originating on that node N.
A sample output for system wide statistics would be:
{code:title=Sample response|borderStyle=solid}
{
"masterNode":"IP address",
"inprogressCount":"2",
"completedCount":"2",
"inprogress":[
{"id":"actualID1"},
{"id":"actualID2"}
],
"completed":[
{"id":"actualID3"},
{"id":"actualID4"}
]
}
{code}
We could include any other system wide statistic related to map reduce - as the need arise and as we see fit.
MapReduceTask level - given an id of a task in progress or a completed task would return statistics related to that particular map/reduce task. A sample output would be:
{code:title=Sample response|borderStyle=solid}
{
"reduceProgress" : 72.104515,
"failedReduceAttempts" : 0,
"mapsRunning" : 0,
"state" : "RUNNING",
"reducesRunning" : 1,
"mapsCompleted" : 1,
"startTime" : 1326860720902,
"id" : "actualID3",
"elapsedTime" : 64432,
"reducesCompleted" : 0,
"mapProgress" : 100,
"failedMapAttempts" : 0,
"finishTime" : 0
}
{code}
was (Author: vblagojevic):
For the first iteration I propose the following levels of statistics information:
- System wide level
- MapReduceTask level
System wide level includes MapReduceTask ids of the running tasks and completed tasks. We could keep a limited history of completed tasks. A sample output for system wide statistics could be:
{code:title=Sample response|borderStyle=solid}
{
"masterNode":"IP address",
"inprogressCount":"2",
"completedCount":"2",
"inprogress":[
{"id":"actualID1"},
{"id":"actualID2"}
],
"completed":[
{"id":"actualID3"},
{"id":"actualID4"}
]
}
{code}
We could include any other system wide statistic related to map reduce - as the need arise and as we see fit.
MapReduceTask level - given an id of a task in progress or a completed task would return statistics related to that particular task. A sample output could be:
{code:title=Sample response|borderStyle=solid}
{
"reduceProgress" : 72.104515,
"failedReduceAttempts" : 0,
"mapsRunning" : 0,
"state" : "RUNNING",
"reducesRunning" : 1,
"mapsCompleted" : 1,
"startTime" : 1326860720902,
"id" : "actualID3",
"elapsedTime" : 64432,
"reducesCompleted" : 0,
"mapProgress" : 100,
"failedMapAttempts" : 0,
"finishTime" : 0
}
{code}
> Add distributed execution and map/reduce job statistics
> --------------------------------------------------------
>
> Key: ISPN-5179
> URL: https://issues.jboss.org/browse/ISPN-5179
> Project: Infinispan
> Issue Type: Feature Request
> Components: JMX, reporting and management
> Reporter: Vladimir Blagojevic
> Assignee: Vladimir Blagojevic
> Fix For: 7.2.0.Final
>
>
> We should add DMR/JMX statistics for the running distributed execution jobs as well as map/reduce jobs. The statistics will also include overview/total system statistics of previously executed jobs; we might store statistics of individual executed jobs in some internal cache. However, the primary objective is to calculate and maintain dist.exec and map/reduce job statistics for Infinispan admin console.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 3 months
[JBoss JIRA] (ISPN-5179) Add distributed execution and map/reduce job statistics
by Vladimir Blagojevic (JIRA)
[ https://issues.jboss.org/browse/ISPN-5179?page=com.atlassian.jira.plugin.... ]
Vladimir Blagojevic edited comment on ISPN-5179 at 2/4/15 5:23 AM:
-------------------------------------------------------------------
For the first iteration I propose the following levels of statistics information:
- System wide level
- MapReduceTask level
System wide level includes MapReduceTask ids of the running tasks and completed tasks. We could keep a limited history of completed tasks. A sample output for system wide statistics could be:
{code:title=Sample response|borderStyle=solid}
{
"masterNode":"IP address",
"inprogressCount":"2",
"completedCount":"2",
"inprogress":[
{"id":"actualID1"},
{"id":"actualID2"}
],
"completed":[
{"id":"actualID3"},
{"id":"actualID4"}
]
}
{code}
We could include any other system wide statistic related to map reduce - as the need arise and as we see fit.
MapReduceTask level - given an id of a task in progress or a completed task would return statistics related to that particular task. A sample output could be:
{code:title=Sample response|borderStyle=solid}
{
"reduceProgress" : 72.104515,
"failedReduceAttempts" : 0,
"mapsRunning" : 0,
"state" : "RUNNING",
"reducesRunning" : 1,
"mapsCompleted" : 1,
"startTime" : 1326860720902,
"id" : "actualID3",
"elapsedTime" : 64432,
"reducesCompleted" : 0,
"mapProgress" : 100,
"failedMapAttempts" : 0,
"finishTime" : 0
}
{code}
was (Author: vblagojevic):
For the first iteration I propose the following levels of statistics information:
- System wide level
- MapReduceTask level
System wide level includes MapReduceTask ids of the running tasks and completed tasks. We could keep a limited history of completed tasks. A sample output for system wide statistics could be:
{code:title=Sample response|borderStyle=solid}
{
"masterNode":"IP address",
"inprogressCount":"2",
"completedCount":"2",
"inprogress":[
{"id":"actualID1"},
{"id":"actualID2"}
],
"completed":[
{"id":"actualID3"},
{"id":"actualID4"}
]
}
{code}
We could include any other system wide statistic related to map reduce - as the need arise and as we see fit.
MapReduceTask level - given an id of a task in progress or a completed task would return statistics related to that particular task. A sample output could be:
{code:title=Sample response|borderStyle=solid}
{
"runningReduceAttempts" : 1,
"reduceProgress" : 72.104515,
"failedReduceAttempts" : 0,
"mapsRunning" : 0,
"state" : "RUNNING",
"reducesRunning" : 1,
"reducesPending" : 0,
"mapsCompleted" : 1,
"startTime" : 1326860720902,
"id" : "actualID3",
"runningMapAttempts" : 0,
"mapsPending" : 0,
"elapsedTime" : 64432,
"reducesCompleted" : 0,
"mapProgress" : 100,
"failedMapAttempts" : 0,
"finishTime" : 0
}
{code}
> Add distributed execution and map/reduce job statistics
> --------------------------------------------------------
>
> Key: ISPN-5179
> URL: https://issues.jboss.org/browse/ISPN-5179
> Project: Infinispan
> Issue Type: Feature Request
> Components: JMX, reporting and management
> Reporter: Vladimir Blagojevic
> Assignee: Vladimir Blagojevic
> Fix For: 7.2.0.Final
>
>
> We should add DMR/JMX statistics for the running distributed execution jobs as well as map/reduce jobs. The statistics will also include overview/total system statistics of previously executed jobs; we might store statistics of individual executed jobs in some internal cache. However, the primary objective is to calculate and maintain dist.exec and map/reduce job statistics for Infinispan admin console.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 3 months
[JBoss JIRA] (ISPN-4810) Local Transactional Cache loses data when eviction is enabled and there are multiple readers and one writer
by Horia Chiorean (JIRA)
[ https://issues.jboss.org/browse/ISPN-4810?page=com.atlassian.jira.plugin.... ]
Horia Chiorean edited comment on ISPN-4810 at 2/4/15 5:16 AM:
--------------------------------------------------------------
[~dan.berindei]: unfortunately no, for a couple of reasons:
1. https://issues.jboss.org/browse/ISPN-4983 - which seems to be only fixed in {{7.1.0}}. Without this we can't even compile our code against 7.x (this was an ISPN SPI change between 6.x and 7.x where certain functionality became "non-public")
2. we need to be able to support Wildfly. Atm. the latest final version of Wildfly is {{8.2.0}} which uses ISPN 6.x
So after this issue is fixed, it would help us if the fix was backported (at least to 6.x). If that's not possible, then we have to wait until there's a final version of Wildfly which uses a version newer or equal to {{7.1.0}}.
was (Author: hchiorean):
[~dan.berindei]: unfortunately no, for a couple of reasons:
1. https://issues.jboss.org/browse/ISPN-4983 - which seems to be only fixed in {{7.1.0}}. Without this we can't even compile our code against 7.x
2. we need to be able to support Wildfly. Atm. the latest final version of Wildfly is {{8.2.0}} which uses ISPN 6.x
So after this issue is fixed, it would help us if the fix was backported (at least to 6.x). If that's not possible, then we have to wait until there's a final version of Wildfly which uses a version newer or equal to {{7.1.0}}.
> Local Transactional Cache loses data when eviction is enabled and there are multiple readers and one writer
> -----------------------------------------------------------------------------------------------------------
>
> Key: ISPN-4810
> URL: https://issues.jboss.org/browse/ISPN-4810
> Project: Infinispan
> Issue Type: Bug
> Affects Versions: 6.0.2.Final
> Environment: Windows 7 x64 (NTFS)
> Oracle JDK1.7.0_40
> Apache Maven 3.0.5
> Reporter: Horia Chiorean
> Assignee: William Burns
> Labels: modeshape
> Attachments: ispn_concurrent.zip
>
>
> Using Infinispan 6.0.2 and a local, transactional cache backed by a <singleFile> store, with eviction enabled and a small {{max-entries}} setting, we have the following scenario:
> * the main thread (i.e. the "writer") starts a transaction, adds a batch of strings into the cache and also appends the same strings into a List cache entry and then commits the transaction
> * after the above has finished (i.e. after tx.commit) it fires a number of reader threads where each reader thread
> ** checks that the string entries were added into the cache and
> ** checks that the entries were correctly appended to the List entry
> * the above steps are repeated a number of times
> On any given run, based on timing, we're seeing that at some point (after some time) some of the reader threads will not see the latest version of the List entry (i.e. will not see the latest elements that were added into the list) but rather an old, stale List (sort of a "lost update" scenario).
> If we either:
> * disable eviction or
> * set the {{max-entries}} to a large enough value (which I suspect has the same effect - not evicting anything) the problem doesn't show up.
--
This message was sent by Atlassian JIRA
(v6.3.11#6341)
9 years, 3 months