[JBoss JIRA] (ISPN-11299) Stale values can be indexed during State Transfer
by Gustavo Fernandes (Jira)
Gustavo Fernandes created ISPN-11299:
----------------------------------------
Summary: Stale values can be indexed during State Transfer
Key: ISPN-11299
URL: https://issues.redhat.com/browse/ISPN-11299
Project: Infinispan
Issue Type: Bug
Components: Core
Affects Versions: 10.1.1.Final
Reporter: Gustavo Fernandes
If an operation is discarded during state transfer because it was already modified locally, the QueryInterceptor still propagates it to the index, leaving the index out of sync.
Sequence of events:
* T1: State Transfer starts
* T2: The EWI (EntryWrappingInterceptor) starts tracking all non-state-transfer operations
* T3: An entry is added locally
* T4: EWI stores the key
* T5: The same key arrives from State Transfer
* T6: QueryInterceptor indexes it (The QI is installed after EWI but indexing happens before storing the entry in the data container)
* T7: The entry operation is not committed in the data container since it was tracked before as ```DiscardPolicy{discardStateTransfer=true, discardXSiteStateTransfer=false}```
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-11017) Cluster fails and doesn't recover under load
by Ryan Emerson (Jira)
[ https://issues.redhat.com/browse/ISPN-11017?page=com.atlassian.jira.plugi... ]
Ryan Emerson commented on ISPN-11017:
-------------------------------------
[~jreimann-1] Is it Ok to close this issue? I believe your issue was resolved by a configuration change.
> Cluster fails and doesn't recover under load
> --------------------------------------------
>
> Key: ISPN-11017
> URL: https://issues.redhat.com/browse/ISPN-11017
> Project: Infinispan
> Issue Type: Bug
> Components: Server
> Affects Versions: 10.0.1.Final
> Environment: Running in OpenShift, with a stateful set of 12 nodes, a distributed cache with 3 owners, async indexing enabled, persistence with rocksdb.
> Reporter: Jens Reimann
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: infinispan.xml
>
>
> After running the load test for a few seconds the inifinispan cluster stops accepting requests and the nodes start to split off from the cluster. In the server's log you can find tons of exceptions like:
> {code:java}
> 10:42:26,939 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p4-t1) ISPN000136: Error executing command PutKeyValueCommand on Cache '___protobuf_metadata', writing keys [deviceRegistry.proto]: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 10 seconds for key deviceRegistry.proto and requestor GlobalTx:infinispan-2-61958:249. Lock is held by GlobalTx:infinispan-2-61958:248
> {code}
> Stopping the load test doesn't let the cluster recover. Most (not all) of the liveness checks fail and pods get restarted. But even after 1 hour, the cluster is still in a non-working state.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-10985) Liveness/readiness scripts don't work with custom configuration
by Ryan Emerson (Jira)
[ https://issues.redhat.com/browse/ISPN-10985?page=com.atlassian.jira.plugi... ]
Ryan Emerson updated ISPN-10985:
--------------------------------
Sprint: (was: DataGrid Sprint #40)
> Liveness/readiness scripts don't work with custom configuration
> ---------------------------------------------------------------
>
> Key: ISPN-10985
> URL: https://issues.redhat.com/browse/ISPN-10985
> Project: Infinispan
> Issue Type: Bug
> Components: OpenShift
> Affects Versions: 10.0.1.Final
> Environment: OpenShift, custom configuration
> Reporter: Jens Reimann
> Assignee: Ryan Emerson
> Priority: Critical
>
> Using a custom configuration, the liveness/readiness scripts (`/opt/infinispan/bin/readinessProbe.sh`) no longer work. They do use the in-image config files to evaluate the state of HTTPS, however as the in-image config file is not using, this may result in the following error:
> ~~~
> sh-4.4$ /opt/infinispan/bin/readinessProbe.sh
> curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number
> ~~~
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-5547) Get MarshalledValue when iterating the persistent cache with storeAsBinary set to true
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-5547?page=com.atlassian.jira.plugin... ]
Pedro Zapata Fernandez closed ISPN-5547.
----------------------------------------
Resolution: Out of Date
> Get MarshalledValue when iterating the persistent cache with storeAsBinary set to true
> --------------------------------------------------------------------------------------
>
> Key: ISPN-5547
> URL: https://issues.redhat.com/browse/ISPN-5547
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 7.1.1.Final
> Reporter: Fei Chen
> Priority: Blocker
>
> Please see https://developer.jboss.org/thread/258545 for details. The key issue is:
> The ClassCastException happens when trying to iterator the cache:
> 06:54:47,057 ERROR [org.jboss.as.ejb3.invocation] (http-localhost/127.0.0.1:8680-1) JBAS014134: EJB Invocation failed on component PublishManagerLocalBean for method public abstract java.util.Set com.test.PublishManager.getChannels(): javax.ejb.EJBException: java.lang.ClassCastException: org.infinispan.marshall.core.MarshalledValue cannot be cast to com.test.Channel
> The problem is: I store only one entry into the cache. But afterwards when iterating the cache, 2 entries are returned. One entry is with type Channel which looks correct, but there is another entry with type MarshalledValue.
> After a few investigation, we see this issue only happens when the 1. cache is persistent and 2. storeAsBinary is enabled.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-6992) TimeoutException: Replication timeout when handling request with DIST cache
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-6992?page=com.atlassian.jira.plugin... ]
Pedro Zapata Fernandez closed ISPN-6992.
----------------------------------------
Resolution: Out of Date
> TimeoutException: Replication timeout when handling request with DIST cache
> ---------------------------------------------------------------------------
>
> Key: ISPN-6992
> URL: https://issues.redhat.com/browse/ISPN-6992
> Project: Infinispan
> Issue Type: Bug
> Components: Core
> Affects Versions: 8.2.4.Final
> Environment: 2 data owner on 3 machine
> Reporter: yingming jiang
> Priority: Blocker
>
> 8.2.4 distributed-cache when the data not on the machine,remote call ,raise time out exception,
> my config as follow:
> infinispan.xml
> <infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="urn:infinispan:config:8.2 http://www.infinispan.org/schemas/infinispan-config-8.2.xsd"
> xmlns="urn:infinispan:config:8.2">
> <jgroups>
> <stack-file name="tcp" path="jgroups-my.xml" />
> </jgroups>
> <cache-container default-cache="default">
> <transport stack="tcp" />
> <distributed-cache name="links">
> </distributed-cache>
> </cache-container>
> </infinispan>
> jgroups-my.xml
> <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="urn:org:jgroups"
> xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups-3.6.xsd">
> <TCP bind_addr="${jgroups.tcp.address:127.0.0.1}"
> bind_port="${jgroups.tcp.port:7800}"
> recv_buf_size="${tcp.recv_buf_size:20M}"
> send_buf_size="${tcp.send_buf_size:2M}"
> sock_conn_timeout="300"/>
> <TCPGOSSIP initial_hosts="${jgroups.tcpgossip.initial_hosts}"/>
> <MERGE3/>
> <FD/>
> <VERIFY_SUSPECT/>
> <pbcast.NAKACK2 use_mcast_xmit="false"/>
> <UNICAST3/>
> <pbcast.STABLE/>
> <pbcast.GMS/>
> <MFC/>
> <FRAG2/>
> </config>
> error message:
> 2016-09-01 11:16:35,380 ERROR [InvocationContextInterceptor] (main) ISPN000136: Error executing command GetKeyValueCommand, writing keys []
> org.infinispan.util.concurrent.TimeoutException: Replication timeout for sptn-win-63-48742
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:801)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.staggeredProcessNext(CommandAwareRpcDispatcher.java:375)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.lambda$processCallsStaggered$3(CommandAwareRpcDispatcher.java:357)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "main" org.infinispan.util.concurrent.TimeoutException: Replication timeout for sptn-win-63-48742
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.checkRsp(JGroupsTransport.java:801)
> at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$1(JGroupsTransport.java:642)
> at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.staggeredProcessNext(CommandAwareRpcDispatcher.java:375)
> at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.lambda$processCallsStaggered$3(CommandAwareRpcDispatcher.java:357)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> the same as the follow issue,the issue is gone when using -Dinfinispan.stagger.delay=0.
> https://issues.jboss.org/browse/WFLY-6926
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-11017) Cluster fails and doesn't recover under load
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-11017?page=com.atlassian.jira.plugi... ]
Pedro Zapata Fernandez reassigned ISPN-11017:
---------------------------------------------
Assignee: Ryan Emerson
> Cluster fails and doesn't recover under load
> --------------------------------------------
>
> Key: ISPN-11017
> URL: https://issues.redhat.com/browse/ISPN-11017
> Project: Infinispan
> Issue Type: Bug
> Components: Server
> Affects Versions: 10.0.1.Final
> Environment: Running in OpenShift, with a stateful set of 12 nodes, a distributed cache with 3 owners, async indexing enabled, persistence with rocksdb.
> Reporter: Jens Reimann
> Assignee: Ryan Emerson
> Priority: Blocker
> Attachments: infinispan.xml
>
>
> After running the load test for a few seconds the inifinispan cluster stops accepting requests and the nodes start to split off from the cluster. In the server's log you can find tons of exceptions like:
> {code:java}
> 10:42:26,939 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p4-t1) ISPN000136: Error executing command PutKeyValueCommand on Cache '___protobuf_metadata', writing keys [deviceRegistry.proto]: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 10 seconds for key deviceRegistry.proto and requestor GlobalTx:infinispan-2-61958:249. Lock is held by GlobalTx:infinispan-2-61958:248
> {code}
> Stopping the load test doesn't let the cluster recover. Most (not all) of the liveness checks fail and pods get restarted. But even after 1 hour, the cluster is still in a non-working state.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-11017) Cluster fails and doesn't recover under load
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-11017?page=com.atlassian.jira.plugi... ]
Pedro Zapata Fernandez reassigned ISPN-11017:
---------------------------------------------
Assignee: Dan Berindei (was: Ryan Emerson)
> Cluster fails and doesn't recover under load
> --------------------------------------------
>
> Key: ISPN-11017
> URL: https://issues.redhat.com/browse/ISPN-11017
> Project: Infinispan
> Issue Type: Bug
> Components: Server
> Affects Versions: 10.0.1.Final
> Environment: Running in OpenShift, with a stateful set of 12 nodes, a distributed cache with 3 owners, async indexing enabled, persistence with rocksdb.
> Reporter: Jens Reimann
> Assignee: Dan Berindei
> Priority: Blocker
> Attachments: infinispan.xml
>
>
> After running the load test for a few seconds the inifinispan cluster stops accepting requests and the nodes start to split off from the cluster. In the server's log you can find tons of exceptions like:
> {code:java}
> 10:42:26,939 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p4-t1) ISPN000136: Error executing command PutKeyValueCommand on Cache '___protobuf_metadata', writing keys [deviceRegistry.proto]: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 10 seconds for key deviceRegistry.proto and requestor GlobalTx:infinispan-2-61958:249. Lock is held by GlobalTx:infinispan-2-61958:248
> {code}
> Stopping the load test doesn't let the cluster recover. Most (not all) of the liveness checks fail and pods get restarted. But even after 1 hour, the cluster is still in a non-working state.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-11017) Cluster fails and doesn't recover under load
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-11017?page=com.atlassian.jira.plugi... ]
Pedro Zapata Fernandez updated ISPN-11017:
------------------------------------------
Priority: Critical (was: Blocker)
> Cluster fails and doesn't recover under load
> --------------------------------------------
>
> Key: ISPN-11017
> URL: https://issues.redhat.com/browse/ISPN-11017
> Project: Infinispan
> Issue Type: Bug
> Components: Server
> Affects Versions: 10.0.1.Final
> Environment: Running in OpenShift, with a stateful set of 12 nodes, a distributed cache with 3 owners, async indexing enabled, persistence with rocksdb.
> Reporter: Jens Reimann
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: infinispan.xml
>
>
> After running the load test for a few seconds the inifinispan cluster stops accepting requests and the nodes start to split off from the cluster. In the server's log you can find tons of exceptions like:
> {code:java}
> 10:42:26,939 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (timeout-thread--p4-t1) ISPN000136: Error executing command PutKeyValueCommand on Cache '___protobuf_metadata', writing keys [deviceRegistry.proto]: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 10 seconds for key deviceRegistry.proto and requestor GlobalTx:infinispan-2-61958:249. Lock is held by GlobalTx:infinispan-2-61958:248
> {code}
> Stopping the load test doesn't let the cluster recover. Most (not all) of the liveness checks fail and pods get restarted. But even after 1 hour, the cluster is still in a non-working state.
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months
[JBoss JIRA] (ISPN-10985) Liveness/readiness scripts don't work with custom configuration
by Pedro Zapata Fernandez (Jira)
[ https://issues.redhat.com/browse/ISPN-10985?page=com.atlassian.jira.plugi... ]
Pedro Zapata Fernandez reassigned ISPN-10985:
---------------------------------------------
Assignee: Ryan Emerson
> Liveness/readiness scripts don't work with custom configuration
> ---------------------------------------------------------------
>
> Key: ISPN-10985
> URL: https://issues.redhat.com/browse/ISPN-10985
> Project: Infinispan
> Issue Type: Bug
> Components: OpenShift
> Affects Versions: 10.0.1.Final
> Environment: OpenShift, custom configuration
> Reporter: Jens Reimann
> Assignee: Ryan Emerson
> Priority: Blocker
>
> Using a custom configuration, the liveness/readiness scripts (`/opt/infinispan/bin/readinessProbe.sh`) no longer work. They do use the in-image config files to evaluate the state of HTTPS, however as the in-image config file is not using, this may result in the following error:
> ~~~
> sh-4.4$ /opt/infinispan/bin/readinessProbe.sh
> curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number
> ~~~
--
This message was sent by Atlassian Jira
(v7.13.8#713008)
6 years, 2 months