[JBoss JIRA] (ISPN-5570) Cross-site: retry backup commands
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-5570?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-5570:
--------------------------------
Fix Version/s: 8.2.0.Alpha1
(was: 8.1.0.Final)
> Cross-site: retry backup commands
> ---------------------------------
>
> Key: ISPN-5570
> URL: https://issues.jboss.org/browse/ISPN-5570
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Cross-Site Replication
> Affects Versions: 7.2.3.Final
> Reporter: Dan Berindei
> Fix For: 8.2.0.Alpha1
>
>
> There are 3 phases in a backup RPC:
> 1. Sender -> Local site master: caused by the site master is shutting down or crashing, or by a network split.
> 2. Local site master -> Remote site master:
> 2.1. Local site master is no longer a site master, e.g. because it's shutting down or because it's no longer coordinator after a merge.
> 2.2. Remote site master is not longer a site master.
> 2.3. Link between local site and remote site is down.
> 3. Remote site master -> Backup targets
> Replication failures in phase 3 are handled by retrying (except for TimeoutExceptions), because {{BaseBackupReceiver}} uses regular cache methods to perform the updates.
> But replication failures in phases 1 and 2 are not handled in any way, except for causing the remote site to be taken offline after a certain number of replication failures (if backup is synchronous). We should instead retry backup RPCs when we get a {{SuspectException}} or {{UnreachableException}}, and perhaps even when we get no response (2.2?), and only stop when the timeout expires or when the backup is taken offline.
> Async backup probably needs retrying as well, and perhaps even a more sophisticated approach like I-RAC (ISPN-2634).
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months
[JBoss JIRA] (ISPN-5533) M/R DeltaAwareList can add duplicate values because of topology changes
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-5533?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-5533:
--------------------------------
Fix Version/s: 8.2.0.Alpha1
(was: 8.1.0.Final)
> M/R DeltaAwareList can add duplicate values because of topology changes
> -----------------------------------------------------------------------
>
> Key: ISPN-5533
> URL: https://issues.jboss.org/browse/ISPN-5533
> Project: Infinispan
> Issue Type: Bug
> Components: Core, Distributed Execution and Map/Reduce
> Affects Versions: 7.2.2.Final, 8.0.0.Alpha1
> Reporter: Dan Berindei
> Fix For: 8.2.0.Alpha1
>
>
> By default, the intermediate cache is non-transactional, so a topology change will cause write commands to be retried. Because a {{PutKeyValueCommand(K, DeltaAwareList)}} command is not idempotent, a retried command will append extra intermediate values to the list.
> The M/R framework tries to guard against this by waiting for all the nodes to initialize the intermediate cache before starting the reduce phase, but it cannot guard against nodes joining or leaving during the reduce phase.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months
[JBoss JIRA] (ISPN-5557) Core threading redesign
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-5557?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-5557:
--------------------------------
Fix Version/s: 8.2.0.Alpha1
(was: 8.1.0.Final)
> Core threading redesign
> -----------------------
>
> Key: ISPN-5557
> URL: https://issues.jboss.org/browse/ISPN-5557
> Project: Infinispan
> Issue Type: Task
> Components: Core
> Affects Versions: 7.2.2.Final
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Critical
> Fix For: 8.2.0.Alpha1
>
>
> Infinispan needs a lot of threads, because everything is synchronous: locking, remote command invocations, cache writers. This causes various issues, from general context switching overhead to the thread pools getting full and causing deadlocks.
> We should redesign the core so that most blocking happens on the application threads, and the number of internal threads is kept to a minimum.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months
[JBoss JIRA] (ISPN-5521) Upgrade to Hibernate ORM 5.0.1.Final
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-5521?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-5521:
--------------------------------
Fix Version/s: 8.2.0.Alpha1
(was: 8.1.0.Final)
> Upgrade to Hibernate ORM 5.0.1.Final
> ------------------------------------
>
> Key: ISPN-5521
> URL: https://issues.jboss.org/browse/ISPN-5521
> Project: Infinispan
> Issue Type: Component Upgrade
> Components: Loaders and Stores
> Reporter: Sanne Grinovero
> Assignee: Tristan Tarrant
> Fix For: 8.2.0.Alpha1
>
>
> I'm opening this to make sure we keep Infinispan aligned with the other platforms, now moving to Hibernate 5.
> This affects at least the JPA CacheStore, I'm not sure if other components.
> Among many improvements, noticeable for Infinispan there is better OSGi support.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months
[JBoss JIRA] (ISPN-5513) State Transfer can miss entries that are concurrently activated
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-5513?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-5513:
--------------------------------
Fix Version/s: 8.2.0.Alpha1
(was: 8.1.0.Final)
> State Transfer can miss entries that are concurrently activated
> ---------------------------------------------------------------
>
> Key: ISPN-5513
> URL: https://issues.jboss.org/browse/ISPN-5513
> Project: Infinispan
> Issue Type: Bug
> Components: State Transfer
> Affects Versions: 8.0.0.Alpha1
> Reporter: William Burns
> Fix For: 8.2.0.Alpha1
>
>
> Currently the OutboundTransferTask iterates upon the data container and then runs process for the state loader. However if an entry is activated during or after the data container iteration it is possible this entry is then not seen and subsequently is not present in the store when it is processed.
> EntryRetriever had this same issue and it was required to register a cache listener to listen for activations and then replay the data after finishing with the store.
> This can cause duplicate values as well, however replacing the same exact value is fine and if a non ST write occurs the state is ignored anyways.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months
[JBoss JIRA] (ISPN-5515) Purge store if there is another node already running
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-5515?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-5515:
--------------------------------
Fix Version/s: 8.2.0.Alpha1
(was: 8.1.0.Final)
> Purge store if there is another node already running
> ----------------------------------------------------
>
> Key: ISPN-5515
> URL: https://issues.jboss.org/browse/ISPN-5515
> Project: Infinispan
> Issue Type: Enhancement
> Components: Core, Loaders and Stores
> Affects Versions: 7.2.2.Final, 8.0.0.Alpha1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 8.2.0.Alpha1
>
>
> Preloading happens before communicating with other nodes that might already have the cache running. When joining the existing members, the cache then waits to receive the first CH in which it is a member, and then deletes only the entries in the segments that it doesn't own in that CH.
> The intention of this was to remove as little as possible from the existing data, e.g. if the first node to start up is not the one that was stopped last. But the preloaded entries are not replicated to the other nodes, so this can lead to inconsistencies.
> It would be better to delay preloading until we know we are the first node to start up, but failing that we could clear the data container and the store before receiving the initial state.
> Note that this will only allow preloading data from one node. Restoring data from more nodes is harder to do, and we will implement it as part of graceful restart.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months
[JBoss JIRA] (ISPN-5510) Provide better Hot Rod client socket timeout and retry defaults
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-5510?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-5510:
--------------------------------
Fix Version/s: 8.2.0.Alpha1
(was: 8.1.0.Final)
> Provide better Hot Rod client socket timeout and retry defaults
> ---------------------------------------------------------------
>
> Key: ISPN-5510
> URL: https://issues.jboss.org/browse/ISPN-5510
> Project: Infinispan
> Issue Type: Enhancement
> Reporter: Galder Zamarreño
> Assignee: Galder Zamarreño
> Fix For: 8.2.0.Alpha1
>
>
> The current defaults are:
> * Socket timeout = 60 seconds
> * Max retries = 10
> As a result of these defaults, if the server hangs an operation, it'd take 10 minutes (60 second timeout x 10 retries) for the operation to finally return an exception to the client, which is way too much.
> So, these default value should change to be more aggressive: 30 second socket timeout and 3 max retries.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months
[JBoss JIRA] (ISPN-5506) Dist/ReplEmbeddedRestHotRodTest failing randomly with "java.net.SocketException: Socket closed"
by Adrian Nistor (JIRA)
[ https://issues.jboss.org/browse/ISPN-5506?page=com.atlassian.jira.plugin.... ]
Adrian Nistor updated ISPN-5506:
--------------------------------
Fix Version/s: 8.2.0.Alpha1
(was: 8.1.0.Final)
> Dist/ReplEmbeddedRestHotRodTest failing randomly with "java.net.SocketException: Socket closed"
> -----------------------------------------------------------------------------------------------
>
> Key: ISPN-5506
> URL: https://issues.jboss.org/browse/ISPN-5506
> Project: Infinispan
> Issue Type: Bug
> Components: Remote Protocols
> Affects Versions: 7.2.1.Final, 8.0.0.Alpha1
> Reporter: Galder Zamarreño
> Assignee: Galder Zamarreño
> Fix For: 8.2.0.Alpha1
>
> Attachments: infinispan_5506.tgz
>
>
> {code}
> testEmbeddedPutRestHotRodGet(org.infinispan.it.compatibility.DistEmbeddedRestHotRodTest) Time elapsed: 0.002 sec <<< FAILURE!
> java.net.SocketException: Socket closed
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
> at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at java.net.Socket.connect(Socket.java:538)
> at java.net.Socket.<init>(Socket.java:434)
> at java.net.Socket.<init>(Socket.java:286)
> at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80)
> at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122)
> at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707)
> at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387)
> at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
> at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
> at org.infinispan.it.compatibility.ReplEmbeddedRestHotRodTest.testEmbeddedPutRestHotRodGet(ReplEmbeddedRestHotRodTest.java:80)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years, 7 months