[JBoss JIRA] (ISPN-12191) Remove GSON
by Gustavo Fernandes (Jira)
Gustavo Fernandes created ISPN-12191:
----------------------------------------
Summary: Remove GSON
Key: ISPN-12191
URL: https://issues.redhat.com/browse/ISPN-12191
Project: Infinispan
Issue Type: Bug
Components: Hot Rod
Affects Versions: 12.0.0.Dev01
Reporter: Gustavo Fernandes
Assignee: Gustavo Fernandes
The GSON library is used in the Hot Rod client to be able to serialize the Query Request and Query Response in JSON as opposed to protobuf. It should use the internal json library instead
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
3 years, 8 months
[JBoss JIRA] (ISPN-12190) Re-add hibernate search for wildfly modules
by Gustavo Fernandes (Jira)
Gustavo Fernandes created ISPN-12190:
----------------------------------------
Summary: Re-add hibernate search for wildfly modules
Key: ISPN-12190
URL: https://issues.redhat.com/browse/ISPN-12190
Project: Infinispan
Issue Type: Task
Components: WildFly modules
Affects Versions: 12.0.0.Dev01
Reporter: Gustavo Fernandes
Assignee: Gustavo Fernandes
They were commented out to speed-up Hibernate Search 6 integration
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
3 years, 8 months
[JBoss JIRA] (ISPN-12188) TcpConnection.Receiver.run() blocking call
by Pedro Ruivo (Jira)
Pedro Ruivo created ISPN-12188:
----------------------------------
Summary: TcpConnection.Receiver.run() blocking call
Key: ISPN-12188
URL: https://issues.redhat.com/browse/ISPN-12188
Project: Infinispan
Issue Type: Bug
Components: Core, Test Suite
Affects Versions: 11.0.1.Final
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 12.0.0.Final, 11.0.2.Final
{noformat}
[TestSuiteProgress] Test failed: org.infinispan.distribution.rehash.WorkDuringJoinTest[DIST_SYNC, tx=false].BlockingChecker
22:28:37.967 [Connection.Receiver [127.0.0.1:34169 - 127.0.0.1:8001]-8,WorkDuringJoinTest-NodeC] ERROR org.infinispan.commons.test.TestSuiteProgress - Test failed: org.infinispan.distribution.rehash.WorkDuringJoinTest[DIST_SYNC, tx=false].BlockingChecker
java.lang.AssertionError: Blocking call! java.net.SocketInputStream#socketRead0 on thread Thread[Connection.Receiver [127.0.0.1:34169 - 127.0.0.1:8001]-8,WorkDuringJoinTest-NodeC,5,ISPN-non-blocking-thread-group]
at org.infinispan.util.CoreTestBlockHoundIntegration.lambda$applyTo$0(CoreTestBlockHoundIntegration.java:45) ~[test-classes/:?]
at reactor.blockhound.BlockHound$Builder.lambda$install$8(BlockHound.java:383) ~[blockhound-1.0.3.RELEASE.jar:?]
at reactor.blockhound.BlockHoundRuntime.checkBlocking(BlockHoundRuntime.java:89) [?:?]
at java.net.SocketInputStream.socketRead0(SocketInputStream.java) [?:?]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:115) [?:?]
at java.net.SocketInputStream.read(SocketInputStream.java:168) [?:?]
at java.net.SocketInputStream.read(SocketInputStream.java:140) [?:?]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:252) [?:?]
at java.io.BufferedInputStream.read(BufferedInputStream.java:271) [?:?]
at java.io.DataInputStream.readInt(DataInputStream.java:392) [?:?]
at org.jgroups.blocks.cs.TcpConnection$Receiver.run(TcpConnection.java:301) [jgroups-4.2.1.Final.jar:4.2.1.Final]
at java.lang.Thread.run(Thread.java:834) [?:?]
{noformat}
The blocking call doesn't always happen, but it appears to be more common in the unstable CI builds:
https://ci.infinispan.org/job/InfinispanAlternateBuilds/job/InfinispanUns...
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
3 years, 8 months
[JBoss JIRA] (ISPN-11176) XSite Max Idle
by Will Burns (Jira)
[ https://issues.redhat.com/browse/ISPN-11176?page=com.atlassian.jira.plugi... ]
Will Burns commented on ISPN-11176:
-----------------------------------
To further expand upon choice #1 here is some more detailed info
XSite Max Idle is a feature that builds upon the max idle feature introduced in https://issues.redhat.com/browse/ISPN-11020 but extends this to also provide max idle across multiple sites.
We assume that clustered max idle will invoke the remove expired command when a key is found to be expired on its local node and that any such command is valid when invoked.
When an entry is found to have expired via max idle, the node who has found the entry has expired will before replicating the remove expired in the localsite will instead send a synchronous xsite message to all backups and ask if they have a more recent access for the given key.
> If any site has a more recent access it will send that it is a valid access. The remote node will also send a touch command to any other node in its local site to update its access time (this can be done asynchronously). The originating node will receive the response and send a touch command updating all of its local nodes and return the value to the user.
> If all sites respond that the key hasn't been accessed recently the original node will add the remove expired command to the IRAC pending queue and let the command remove the entry from the local site as normal returning null the user. The remove expired command will not increment the version and instead use the current version, allowing for any conflicting writes to overwrite it.
For the following we assume the given sites topology for a given key k
Site 1:
Node A (primary owner)
Node B (backup owner)
Node C
Site 2
Node D (primary owner)
Node E (backup owner)
Node F
Use Case 1: Read of non expired entry
NodeB reads the entry and finds it is not expired. NodeB synchronously send a touch command to Node A (any other owner of the same key in the same site). Once the touch command is complete the non null value is returned to the user.
Use Case 2: Read of expired entry w/ no concurrent access and expired remote site
NodeB reads the entry finds it is expired. It then invokes a remove expired command. Before replicating or removing the entry locally NodeB sends an xsite "check last access" command to Site 2 which then ensures the command is ran on any owner. In this case an owner (Node D or Node E) says the entry is expired and returns a value to symbolize this (-1, 0 etc.) which is received by NodeB. NodeB then registers the remove expired with the IRAC replication queue. NodeB then processes the remove expired as normall in the local site, removing the value and returning null to the user after complete. IRAC then remoeves the entry from the remote site asynchronously.
Use Case 3: Read of expired entry w/ no concurrent access and not expired remote site
NodeB reads the entry finds it is expired. It then invokes a remove expired command. Before replicating or removing the entry locally NodeB sends an xsite "check last access" command to Site 2 which then ensures the command is ran on any owner. In this case an owner, Node E, says the entry is not expired it will send a touch command to the other owner in its site, Node D, and returns a value > 0 which is received by NodeB. NodeB then broadcasts a touch command to all owners in its local cluster (NodeA) with the updated access time and once completed returns the non null value to the caller.
Use Case 4: Concurrent reads from same site that are both expired
NodeB and NodeA both read and find the entry is expired. They both send xsite checks and operate the same as above, just with work duplicated. If the concurrent read is done after the touch command is replicated then the work is not duplicated. Note that if the reads are on the same node only one remove expired command is done as it wait on the prior one to complete first.
Use Case 5: Concurrent reads from different sites that are both expired
If NodeB and NodeD both read an expired entry they will both ask each other if expired causing duplicated xsite messages (assuming the IRAC replication is not done before the second access).
Use Case 6: Concurrent expired read with write in same site and expired in remote site
In this case the behavior depends if the read of expired value is done on the primary or not.
If the read is on the primary than the ordering is handled by locking as the remove expired and other write operation are not performed concurrently.
If the read is not on the primary then we may have an issue with the xsite check being performed with the write and possibly losing the write. We may have to promote the xsite check to be done while holding the lock on primary.
Use Case 7: Concurrent expired read with write in same site and not expired in remote site
Same as Case 6 above in that we will need to most likely make the xsite max idle check and write not be able to ran concurrently.
Use Case 8: Concurrent expired read with write in different site and expired in remote site
Read may see it as not expired (if write already applied) and behaves the same as Use Case 3
However if it does see it expired it will add an entry to IRAC replication queue which will may conflict with the write, however the write should win as it will have a newer version as we don't use an incremented version for the remove
Use Case 9: Concurrent expired read with write in different site and not expired in remote site
Same as Case 8 however the xsite response will always say it is valid. Do we care that the value is different though?
Consistency Issues
1. Notification of previous value during a write conflict with IRAC may not be updated for all nodes in the same fashion.
2. Write operations will not return the non expired previous value if the local site is expired (but not other site). Should we do this??
3. Case 6 above shows that we may require the xsite check to be done while holding the lock on primary. Need to confirm.
4. Case 9 shows that a read that says it is valid may actually be a read for a new value that isn't yet replicated from the other site. Is that okay?
> XSite Max Idle
> --------------
>
> Key: ISPN-11176
> URL: https://issues.redhat.com/browse/ISPN-11176
> Project: Infinispan
> Issue Type: Enhancement
> Components: Cross-Site Replication, Expiration
> Reporter: Will Burns
> Assignee: Will Burns
> Priority: Major
> Fix For: 12.0.0.Final
>
>
> Max idle expiration currently doesn't work with xsite. That is if an entry was written and replicated to both sites but one site never reads the value, but the other does. If they then need to read the value from the other site it will be expired (assuming the max idle time has elapsed).
> There are a few ways we can do this.
> 1. Keep access times local to every site. When a site finds an entry is expired it asks the other site(s) if it has a more recent access. If a site is known to have gone down we should touch all entries, since they may not have updated access times. Requires very little additional xsite communication.
> 2. Batch touch commands and only send every so often. Has window of loss, but should be small. Requires more site usage. Wouldn't work for really low max idle times as an entry could expire before the touch command is replicated.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
3 years, 8 months
[JBoss JIRA] (ISPN-12179) Allow cache to start in degraded mode
by Dan Berindei (Jira)
[ https://issues.redhat.com/browse/ISPN-12179?page=com.atlassian.jira.plugi... ]
Dan Berindei updated ISPN-12179:
--------------------------------
Status: Pull Request Sent (was: Open)
Git Pull Request: https://github.com/infinispan/infinispan/pull/8603
> Allow cache to start in degraded mode
> -------------------------------------
>
> Key: ISPN-12179
> URL: https://issues.redhat.com/browse/ISPN-12179
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 11.0.1.Final, 12.0.0.Dev01
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Priority: Major
> Fix For: 12.0.0.Final, 12.0.0.Dev02
>
>
> Caches expect to receive an initial state for the other nodes during startup ({{await-initial-transfer="true"}} by default).
> There is an exception to this rule when rebalancing is suspended: the cache is able to start on a joiner without receiving any state.
> However, this exception does not apply when a cache is in DEGRADED mode and rebalancing is implicitly suspended. Instead a joiner will wait until the cache goes back to AVAILABLE mode in order to start the initial state transfer, or fail to start with a generic {{Initial state transfer timed out for cache %s on %s}} error message.
> We should handle caches in DEGRADED mode the same way we handle caches with rebalancing suspended, and allow joiners to start without waiting.
>
>
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
3 years, 8 months