[infinispan-issues] [JBoss JIRA] (ISPN-9111) Internal caches should be replicated across sites

Tue May 15 12:13:00 EDT 2018

     [ https://issues.jboss.org/browse/ISPN-9111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Galder Zamarreño updated ISPN-9111:
-----------------------------------
    Description: 
Given a cache manager, we should look for all enabled x-site locations and add those sites as SYNC backups for the protobuf metadata cache. Without this data, the user has to implement its own code to make sure the data is added in each site which is troublesome.

Using SYNC/FAIL combo turns out to be very buggy. In the initial test created, only one site was up and the other was not. The put call to replicate the metadata was failing (as a result of ISPN-9113) but this was going under the radar (more tests needed!), and it ended up waiting for the replication timeout to happen.

Even after replication timeout happened, the put call was completing fine. This is because invocation batching was enabled for protobuf metadata cache which means any update failures would not make the cache operations fail. It’s unclear whether this is something this is a bug of invocation batching itself, or whether it’s the combination of of invocation batching being enabled and the location where x-site backup replication is called. This can easily be replicated by modifying JGroupsTransport.ChannelCallbacks.up to throw a runtime exception when dealing with SITE_UNREACHABLE event, and then execute ProtobufMetadataXSiteStateTransferTest.

  was:Given a cache manager, we should look for all enabled x-site locations and add those sites as SYNC backups for the protobuf metadata cache. Without this data, the user has to implement its own code to make sure the data is added in each site which is troublesome.

> Internal caches should be replicated across sites
> -------------------------------------------------
>
>                 Key: ISPN-9111
>                 URL: https://issues.jboss.org/browse/ISPN-9111
>             Project: Infinispan
>          Issue Type: Enhancement
>          Components: Cross-Site Replication, Remote Querying
>            Reporter: Galder Zamarreño
>            Assignee: Galder Zamarreño
>              Labels: redhat-summit-18
>
> Given a cache manager, we should look for all enabled x-site locations and add those sites as SYNC backups for the protobuf metadata cache. Without this data, the user has to implement its own code to make sure the data is added in each site which is troublesome.
> Using SYNC/FAIL combo turns out to be very buggy. In the initial test created, only one site was up and the other was not. The put call to replicate the metadata was failing (as a result of ISPN-9113) but this was going under the radar (more tests needed!), and it ended up waiting for the replication timeout to happen.
> Even after replication timeout happened, the put call was completing fine. This is because invocation batching was enabled for protobuf metadata cache which means any update failures would not make the cache operations fail. It’s unclear whether this is something this is a bug of invocation batching itself, or whether it’s the combination of of invocation batching being enabled and the location where x-site backup replication is called. This can easily be replicated by modifying JGroupsTransport.ChannelCallbacks.up to throw a runtime exception when dealing with SITE_UNREACHABLE event, and then execute ProtobufMetadataXSiteStateTransferTest.

--
This message was sent by Atlassian JIRA
(v7.5.0#75005)