[JBoss JIRA] (ISPN-3140) JMX operation to suppress state transfer
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-3140?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-3140:
-----------------------------------------------
Tristan Tarrant <ttarrant(a)redhat.com> made a comment on [bug 974402|https://bugzilla.redhat.com/show_bug.cgi?id=974402]
Anna, disabling
> JMX operation to suppress state transfer
> ----------------------------------------
>
> Key: ISPN-3140
> URL: https://issues.jboss.org/browse/ISPN-3140
> Project: Infinispan
> Issue Type: Feature Request
> Components: Distributed Cache, State transfer
> Affects Versions: 5.2.6.Final
> Reporter: Manik Surtani
> Assignee: Dan Berindei
> Fix For: 5.2.7.Final, 5.3.0.CR2, 5.3.0.Final
>
>
> This feature request is to expose a JMX operation on each node, to suppress state transfer for a period of time. This flag would be {{false}} by default.
> The use case of this flag would be to ease bringing down (and up) a cluster for maintenance work. A typical workflow would be:
> 1) Shut down application requests to the data grid
> 2) Suppress state transfer on all nodes via JMX
> 3) Bring down all nodes
> 4) Perform maintenance work
> 5) Bring up nodes, one at a time. As each node comes up, disable state transfer for the node via JMX.
> 6) Once all nodes are up, enable state transfer for each node again via JMX
> 7) Allow application requests to reach the grid again.
> The purpose of this is to allow smooth and fast shutdown and startup, remove the risk of OOM errors (when bringing a grid down).
> This is a small but useful subset of full manual state transfer as defined in ISPN-1394.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 5 months
[JBoss JIRA] (ISPN-3373) HotRod topology cache should allow disabling awaitInitialTransfer
by Tristan Tarrant (JIRA)
Tristan Tarrant created ISPN-3373:
-------------------------------------
Summary: HotRod topology cache should allow disabling awaitInitialTransfer
Key: ISPN-3373
URL: https://issues.jboss.org/browse/ISPN-3373
Project: Infinispan
Issue Type: Feature Request
Components: Remote protocols, Server
Affects Versions: 6.0.0.Alpha1
Reporter: Tristan Tarrant
Assignee: Tristan Tarrant
Fix For: 6.0.0.Alpha2
When a server with a hotrod endpoint joins a cluster where state transfer has been disabled (ISPN-3140), waiting for initial state transfer should also be disabled on the topology cache, otherwise failures occur.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 5 months
[JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown
by Takayoshi Kimura (JIRA)
[ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.... ]
Takayoshi Kimura commented on ISPN-3366:
----------------------------------------
2nd run with array debug option: https://www.dropbox.com/s/ayqttazfrljloqn/ISPN-3366-full-logs-2nd.zip
* node1: 26698
* node2: 26735
* node3: 26551
* node4: 0
* Total: 79984
8 entries missing, see missingkeys.txt
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
> Key: ISPN-3366
> URL: https://issues.jboss.org/browse/ISPN-3366
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: ISPN-3366-logs.zip
>
>
> Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 5 months
[JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown
by Takayoshi Kimura (JIRA)
[ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.... ]
Takayoshi Kimura edited comment on ISPN-3366 at 7/28/13 9:24 PM:
-----------------------------------------------------------------
2nd run with array debug option: https://www.dropbox.com/s/ayqttazfrljloqn/ISPN-3366-full-logs-2nd.zip
* node1: 26698
* node2: 26735
* node3: 26551
* node4: 0
* Total: 79984
No error on the client side and 8 entries missing on the server side, see missingkeys.txt
was (Author: tkimura):
2nd run with array debug option: https://www.dropbox.com/s/ayqttazfrljloqn/ISPN-3366-full-logs-2nd.zip
* node1: 26698
* node2: 26735
* node3: 26551
* node4: 0
* Total: 79984
8 entries missing, see missingkeys.txt
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
> Key: ISPN-3366
> URL: https://issues.jboss.org/browse/ISPN-3366
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: ISPN-3366-logs.zip
>
>
> Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 5 months
[JBoss JIRA] (ISPN-3357) Insufficient owners with putIfAbsent during node join rebalance
by Takayoshi Kimura (JIRA)
[ https://issues.jboss.org/browse/ISPN-3357?page=com.atlassian.jira.plugin.... ]
Takayoshi Kimura commented on ISPN-3357:
----------------------------------------
Sure, full TRACE logs here: https://www.dropbox.com/s/1tnrg0osxfghu2d/ISPN-3357-full-logs.zip
> Insufficient owners with putIfAbsent during node join rebalance
> ---------------------------------------------------------------
>
> Key: ISPN-3357
> URL: https://issues.jboss.org/browse/ISPN-3357
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: 7c29bccb.log
>
>
> Here is test scenario:
> * DIST numOwners=2, start with 3 nodes cluster then join 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 20074
> * node2: 19888
> * node3: 20114
> * node4: 18885
> Total is 78961, 1039 entries are missing. No error on HotRod client side so 80000 entries should be there.
> Let's take a look at example missing entry, hash(thread01key151) = 7c29bccb.
> Current CH: owners(7c29bccb) are [node1, node2]
> Pending CH: owners(7c29bccb) are [node1, node2, node4]
> Balanced CH: owners(7c29bccb) are [node1, node4]
> The events sequence is:
> * hotrod -> node1
> * node1 -> node2, node4
> * node2 committed entry
> * node4 performed clustered get before write, got a value from node2 and will not commit the entry because this node thinks it's not changed/created
> * node1 committed entry
> * node2 invalidates the entry because it's no longer an owner
> Result owners(7c29bccb) are only node1 and node4 is missing. This entry may be completely lost by further rebalances when node4 is donor for this segment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 5 months
[JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown
by Takayoshi Kimura (JIRA)
[ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.... ]
Takayoshi Kimura commented on ISPN-3366:
----------------------------------------
Full TRACE logs of the first run (no array debug option): https://www.dropbox.com/s/tef5isxeor5519t/ISPN-3366-full-logs.zip
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
> Key: ISPN-3366
> URL: https://issues.jboss.org/browse/ISPN-3366
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: ISPN-3366-logs.zip
>
>
> Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 5 months
[JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown
by Takayoshi Kimura (JIRA)
[ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.... ]
Takayoshi Kimura commented on ISPN-3366:
----------------------------------------
The key format is "threadXXkeyYYY" so the beginning of the array always matches between keys. I don't know what exactly the prefix "033e0d" means right now but the "7468726561" is "threa". Probably they are different entries.
Anyway I'll upload full logs and re-run with array debug option.
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
> Key: ISPN-3366
> URL: https://issues.jboss.org/browse/ISPN-3366
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: ISPN-3366-logs.zip
>
>
> Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
11 years, 5 months