[JBoss JIRA] (ISPN-3357) Insufficient owners with putIfAbsent during node join rebalance
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-3357?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-3357:
-----------------------------------
I have written down a test in library mode exhibiting this behaviour. Please, let me know as fix is ready, I'll try to check that.
> Insufficient owners with putIfAbsent during node join rebalance
> ---------------------------------------------------------------
>
> Key: ISPN-3357
> URL: https://issues.jboss.org/browse/ISPN-3357
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: 7c29bccb.log
>
>
> Here is test scenario:
> * DIST numOwners=2, start with 3 nodes cluster then join 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 20074
> * node2: 19888
> * node3: 20114
> * node4: 18885
> Total is 78961, 1039 entries are missing. No error on HotRod client side so 80000 entries should be there.
> Let's take a look at example missing entry, hash(thread01key151) = 7c29bccb.
> Current CH: owners(7c29bccb) are [node1, node2]
> Pending CH: owners(7c29bccb) are [node1, node2, node4]
> Balanced CH: owners(7c29bccb) are [node1, node4]
> The events sequence is:
> * hotrod -> node1
> * node1 -> node2, node4
> * node2 committed entry
> * node4 performed clustered get before write, got a value from node2 and will not commit the entry because this node thinks it's not changed/created
> * node1 committed entry
> * node2 invalidates the entry because it's no longer an owner
> Result owners(7c29bccb) are only node1 and node4 is missing. This entry may be completely lost by further rebalances when node4 is donor for this segment.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 4 months
[JBoss JIRA] (ISPN-3390) SingleFileCacheStore.start() fails if cache name contains File.separator
by Paul Ferraro (JIRA)
Paul Ferraro created ISPN-3390:
----------------------------------
Summary: SingleFileCacheStore.start() fails if cache name contains File.separator
Key: ISPN-3390
URL: https://issues.jboss.org/browse/ISPN-3390
Project: Infinispan
Issue Type: Bug
Components: Loaders and Stores
Affects Versions: 6.0.0.Alpha1
Reporter: Paul Ferraro
Assignee: Paul Ferraro
Using the old FileCacheStore, if the cache name contained File.separator, then the location of the file cache store included corresponding subdirectories.
e.g.
If my cache name is "myhost/mydeployment.war", then the file cache store includes a subdirectory "myhost".
In the new SingleFileCacheStore implementation, however, start() fails because only the directories contained within the location are created. When the File object is constructed using the cache name, if the cache name contains a File.separator, those subdirectories are *not* created, and start() fails with the following message:
"Directory ... does not exist and cannot be created!"
As I see it, this can be fixed in 1 of 2 ways:
1. Escape any file separators in the cache name when constructing the file name
2. Create the necessary subdirectories on the File object created using the cache name
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 4 months
[JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-3366:
-----------------------------------
I have a test-case in library mode (ispn 5.2.4) showing this bug - 4 nodes up, 3 of them loading data (100000 entries in total) via normal PUT operations. During the loading one of the nodes is restarted. If the put operation on some of the loading nodes gets exception, the put is repeated, but even with that in the end I end up with few keys returning null on one of the nodes.
[~dan.berindei]: please let me know as you'll have some fix for this issue, I'll try to test it.
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
> Key: ISPN-3366
> URL: https://issues.jboss.org/browse/ISPN-3366
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: ISPN-3366-logs.zip
>
>
> Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 4 months
[JBoss JIRA] (ISPN-3322) Use UNICAST3
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-3322?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-3322:
-------------------------------
Issue Type: Task (was: Enhancement)
Assignee: Mircea Markus
Fix Version/s: 6.0.0.Alpha3
Affects Version/s: 6.0.0.Alpha1
5.3.0.Final
One more reason for the upgrade: JGRP-1658 is already fixed for UNICAST3, but the fix won't be backported to UNICAST2.
> Use UNICAST3
> ------------
>
> Key: ISPN-3322
> URL: https://issues.jboss.org/browse/ISPN-3322
> Project: Infinispan
> Issue Type: Task
> Components: RPC
> Affects Versions: 5.3.0.Final, 6.0.0.Alpha1
> Reporter: Sanne Grinovero
> Assignee: Mircea Markus
> Fix For: 6.0.0.Alpha3
>
>
> As suggested by Bela: it seems UNICAST3 is generally better than previous versions.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 4 months
[JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant commented on ISPN-3366:
---------------------------------------
[~nekop], you can get strictPeerToPeer working modifying the infinispan-defaults.xml within the infinispan subsystem jar (org.jboss.as.clustering.infinispan)
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
> Key: ISPN-3366
> URL: https://issues.jboss.org/browse/ISPN-3366
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: ISPN-3366-logs.zip
>
>
> Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 4 months
[JBoss JIRA] (ISPN-3366) Data loss when entry forwarding to primary owner and primary owner shutdown
by Tristan Tarrant (JIRA)
[ https://issues.jboss.org/browse/ISPN-3366?page=com.atlassian.jira.plugin.... ]
Tristan Tarrant updated ISPN-3366:
----------------------------------
Description:
Looks like a problem in entry forwarding.
Here is test scenario:
* DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
* HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
After the test run, the numberOfEntries on each node are:
* node1: 26608
* node2: 26622
* node3: 26746
* node4: 0
Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
Current CH: owners(574ff563) are [node4, node1]
The events sequence is:
* hotrod -> node1
* node1 forwarding it to primary owner node4
* node4 doesn't process the forwarded entry, shutdown
Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
was:
Looks like a problem in entry forwarding.
Here is test scenario:
* DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
* HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
After the test run, the numberOfEntries on each node are:
* node1: 26608
* node2: 26622
* node3: 26746
* node4: 0
Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
Current CH: owners(574ff563) are [node4, node1]
The events sequence is:
* hotrod -> node1
* node1 forwarding it to primary owner node4
* node4 doesn't process the forwarded entry, shutdown
Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
> Data loss when entry forwarding to primary owner and primary owner shutdown
> ---------------------------------------------------------------------------
>
> Key: ISPN-3366
> URL: https://issues.jboss.org/browse/ISPN-3366
> Project: Infinispan
> Issue Type: Bug
> Components: Distributed Cache
> Affects Versions: 5.2.4.Final, 6.0.0.Alpha1
> Reporter: Takayoshi Kimura
> Assignee: Dan Berindei
> Priority: Critical
> Attachments: ISPN-3366-logs.zip
>
>
> Looks like a problem in entry forwarding.
> Here is test scenario:
> * DIST numOwners=2, start with 4 nodes cluster then normal shutdown 1 node during load
> * HotRod putIfAbsent accesses from 40 threads (1 process, 1 remote cache instance), 40000 entries total
> After the test run, the numberOfEntries on each node are:
> * node1: 26608
> * node2: 26622
> * node3: 26746
> * node4: 0
> Total is 79976 and HotRod client received 11 errors, so 79976 + (11 * 2) = 79998. It means 1 entry is completely missing.
> Let's take a look at the missing entry, hash(thread16key59) = 574ff563.
> Current CH: owners(574ff563) are [node4, node1]
> The events sequence is:
> * hotrod -> node1
> * node1 forwarding it to primary owner node4
> * node4 doesn't process the forwarded entry, shutdown
> Result owners(7c29bccb) is [] empty. This entry is completely lost without any errors.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
12 years, 4 months