[JBoss JIRA] (ISPN-4619) Partition handling support for asymmetric clusters
by Paul Ferraro (JIRA)
[ https://issues.jboss.org/browse/ISPN-4619?page=com.atlassian.jira.plugin.... ]
Paul Ferraro commented on ISPN-4619:
------------------------------------
I'd like to increase the priority of this, since WF/EAP can't realistically use partition handling until asymmetric clusters are supported (since we generally scope deployments to caches).
> Partition handling support for asymmetric clusters
> --------------------------------------------------
>
> Key: ISPN-4619
> URL: https://issues.jboss.org/browse/ISPN-4619
> Project: Infinispan
> Issue Type: Feature Request
> Components: Core
> Affects Versions: 7.0.0.Beta1
> Reporter: Dan Berindei
> Priority: Optional
> Labels: partition_handling
> Fix For: 7.0.0.CR2
>
>
> Partition handling currently relies on every cache being defined and running on the coordinator. We could relax this, as the partition handling configuration can be easily added to the cache join command.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 5 months
[JBoss JIRA] (ISPN-4575) Map/Reduce incorrect results with a non-shared non-tx intermediate cache
by Vladimir Blagojevic (JIRA)
[ https://issues.jboss.org/browse/ISPN-4575?page=com.atlassian.jira.plugin.... ]
Vladimir Blagojevic commented on ISPN-4575:
-------------------------------------------
Ok Dan, will do so!
> Map/Reduce incorrect results with a non-shared non-tx intermediate cache
> ------------------------------------------------------------------------
>
> Key: ISPN-4575
> URL: https://issues.jboss.org/browse/ISPN-4575
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Distributed Execution and Map/Reduce
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Vladimir Blagojevic
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.CR2
>
>
> In a non-tx cache, if a command is started with topology id {{T}}, and when it is replicated on another node the distribution interceptor sees topology {{T+1}}, it throws an {{OutdatedTopologyException}}. The originator of the command will then retry the command, setting topology {{T+1}}.
> When this happens with a {{PutKeyValueCommand(k, MapReduceManagerImpl.DeltaAwareList)}}, it can lead to duplicate intermediate values.
> Say _A_ is the primary owner of {{k}} in {{T}}, _B_ is a backup owner both in {{T}} and {{T+1}}, and _C_ is the backup owner in {{T}} and the primary owner in {{T+1}} (i.e. _C_ just joined and a rebalance is in progress during {{T}} - see {{NonTxBackupOwnerBecomingPrimaryOwnerTest}}).
> _A_ starts the {{PutKeyValueCommand}} and replicates it to _B_ and _C_. _C_ applies the command, but _B_ already has topology {{T+1}} and throws an {{OutdatedTopologyException}}. _A_ installs topology {{T+1}}, sends the command to _C_ (as the new primary owner), which replicates it to _B_ and then applies it locally a second time.
> This scenario can happen during a M/R task even without nodes joining or leaving. That's because {{CreateCacheCommand}} only calls {{getCache()}} on each member, it doesn't wait for the cache to have a certain number of members or for state transfer to be complete for all the members. The last member to join the intermediate cache is guaranteed to have topology {{T+1}}, but the others may have topology {{T}} by the time the combine phase starts inserting values in the intermediate cache.
> I have seen the {{OutdatedTopologyException}} happen pretty often during the test suite, especially after I removed the duplicate {{invokeRemotely}} call in {{MapReduceTask.executeTaskInit()}}. Most of them were harmless, but there was one failure in CI: http://ci.infinispan.org/viewLog.html?buildId=9811&tab=buildResultsDiv&bu...
> A short-term fix would be to wait for all the members to finish joining in {{CreateCacheCommand}}. Long-term, M/R tasks should be resilient to topology changes, so we should investigate making {{PutKeyValue(k, DeltaAwareList)}} handle {{OutdatedTopologyException}} s.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 5 months
[JBoss JIRA] (ISPN-4814) Publishing the site via awestruct doesn't work
by Gustavo Fernandes (JIRA)
[ https://issues.jboss.org/browse/ISPN-4814?page=com.atlassian.jira.plugin.... ]
Gustavo Fernandes updated ISPN-4814:
------------------------------------
Status: Resolved (was: Pull Request Sent)
Resolution: Done
> Publishing the site via awestruct doesn't work
> ----------------------------------------------
>
> Key: ISPN-4814
> URL: https://issues.jboss.org/browse/ISPN-4814
> Project: Infinispan
> Issue Type: Bug
> Components: Build process
> Affects Versions: 7.0.0.CR1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Labels: site
> Fix For: 7.0.0.CR2
>
>
> The {{publish_production.sh}} script uses {{awestruct --deploy}} to publish the site. Normally it should do something like this:
> {noformat}
> git checkout master
> cp -r _site/* .
> git commit -a -m "Published master to GitHub pages."
> git push origin master
> {noformat}
> Instead, it deletes everything on the master branch and pushes that, taking the site down (because GitHub relies on a CNAME file to find the domain name). The solution is to revert the commit (since some files are not generated) and publish the site manually.
> This could be caused by an incompatibility between the version of git on the CI machine and the 'git' gem used by awestruct.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 5 months
[JBoss JIRA] (ISPN-4575) Map/Reduce incorrect results with a non-shared non-tx intermediate cache
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4575?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-4575:
------------------------------------
Vladimir, I integrated the PR but then I noticed that the duplicate CreateCacheCommand RPC that I mentioned in the description is still there:
{code}
rpc.invokeRemotely(cache.getRpcManager().getMembers(), ccc, rpcOptionsBuilder.build());
Map<Address, Response> map = rpc.invokeRemotely(cache.getRpcManager().getMembers(), ccc, rpcOptionsBuilder.build());
{code}
Could you issue another PR to remove the first line?
> Map/Reduce incorrect results with a non-shared non-tx intermediate cache
> ------------------------------------------------------------------------
>
> Key: ISPN-4575
> URL: https://issues.jboss.org/browse/ISPN-4575
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Distributed Execution and Map/Reduce
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Vladimir Blagojevic
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.CR2
>
>
> In a non-tx cache, if a command is started with topology id {{T}}, and when it is replicated on another node the distribution interceptor sees topology {{T+1}}, it throws an {{OutdatedTopologyException}}. The originator of the command will then retry the command, setting topology {{T+1}}.
> When this happens with a {{PutKeyValueCommand(k, MapReduceManagerImpl.DeltaAwareList)}}, it can lead to duplicate intermediate values.
> Say _A_ is the primary owner of {{k}} in {{T}}, _B_ is a backup owner both in {{T}} and {{T+1}}, and _C_ is the backup owner in {{T}} and the primary owner in {{T+1}} (i.e. _C_ just joined and a rebalance is in progress during {{T}} - see {{NonTxBackupOwnerBecomingPrimaryOwnerTest}}).
> _A_ starts the {{PutKeyValueCommand}} and replicates it to _B_ and _C_. _C_ applies the command, but _B_ already has topology {{T+1}} and throws an {{OutdatedTopologyException}}. _A_ installs topology {{T+1}}, sends the command to _C_ (as the new primary owner), which replicates it to _B_ and then applies it locally a second time.
> This scenario can happen during a M/R task even without nodes joining or leaving. That's because {{CreateCacheCommand}} only calls {{getCache()}} on each member, it doesn't wait for the cache to have a certain number of members or for state transfer to be complete for all the members. The last member to join the intermediate cache is guaranteed to have topology {{T+1}}, but the others may have topology {{T}} by the time the combine phase starts inserting values in the intermediate cache.
> I have seen the {{OutdatedTopologyException}} happen pretty often during the test suite, especially after I removed the duplicate {{invokeRemotely}} call in {{MapReduceTask.executeTaskInit()}}. Most of them were harmless, but there was one failure in CI: http://ci.infinispan.org/viewLog.html?buildId=9811&tab=buildResultsDiv&bu...
> A short-term fix would be to wait for all the members to finish joining in {{CreateCacheCommand}}. Long-term, M/R tasks should be resilient to topology changes, so we should investigate making {{PutKeyValue(k, DeltaAwareList)}} handle {{OutdatedTopologyException}} s.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 5 months
[JBoss JIRA] (ISPN-4575) Map/Reduce incorrect results with a non-shared non-tx intermediate cache
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-4575?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-4575:
-------------------------------
Status: Pull Request Sent (was: Open)
> Map/Reduce incorrect results with a non-shared non-tx intermediate cache
> ------------------------------------------------------------------------
>
> Key: ISPN-4575
> URL: https://issues.jboss.org/browse/ISPN-4575
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Distributed Execution and Map/Reduce
> Affects Versions: 7.0.0.Alpha5
> Reporter: Dan Berindei
> Assignee: Vladimir Blagojevic
> Priority: Blocker
> Labels: testsuite_stability
> Fix For: 7.0.0.CR2
>
>
> In a non-tx cache, if a command is started with topology id {{T}}, and when it is replicated on another node the distribution interceptor sees topology {{T+1}}, it throws an {{OutdatedTopologyException}}. The originator of the command will then retry the command, setting topology {{T+1}}.
> When this happens with a {{PutKeyValueCommand(k, MapReduceManagerImpl.DeltaAwareList)}}, it can lead to duplicate intermediate values.
> Say _A_ is the primary owner of {{k}} in {{T}}, _B_ is a backup owner both in {{T}} and {{T+1}}, and _C_ is the backup owner in {{T}} and the primary owner in {{T+1}} (i.e. _C_ just joined and a rebalance is in progress during {{T}} - see {{NonTxBackupOwnerBecomingPrimaryOwnerTest}}).
> _A_ starts the {{PutKeyValueCommand}} and replicates it to _B_ and _C_. _C_ applies the command, but _B_ already has topology {{T+1}} and throws an {{OutdatedTopologyException}}. _A_ installs topology {{T+1}}, sends the command to _C_ (as the new primary owner), which replicates it to _B_ and then applies it locally a second time.
> This scenario can happen during a M/R task even without nodes joining or leaving. That's because {{CreateCacheCommand}} only calls {{getCache()}} on each member, it doesn't wait for the cache to have a certain number of members or for state transfer to be complete for all the members. The last member to join the intermediate cache is guaranteed to have topology {{T+1}}, but the others may have topology {{T}} by the time the combine phase starts inserting values in the intermediate cache.
> I have seen the {{OutdatedTopologyException}} happen pretty often during the test suite, especially after I removed the duplicate {{invokeRemotely}} call in {{MapReduceTask.executeTaskInit()}}. Most of them were harmless, but there was one failure in CI: http://ci.infinispan.org/viewLog.html?buildId=9811&tab=buildResultsDiv&bu...
> A short-term fix would be to wait for all the members to finish joining in {{CreateCacheCommand}}. Long-term, M/R tasks should be resilient to topology changes, so we should investigate making {{PutKeyValue(k, DeltaAwareList)}} handle {{OutdatedTopologyException}} s.
--
This message was sent by Atlassian JIRA
(v6.3.1#6329)
11 years, 5 months