[JBoss JIRA] (ISPN-2550) NoSuchElementException in Hot Rod Encoder
by RH Bugzilla Integration (JIRA)
[ https://issues.jboss.org/browse/ISPN-2550?page=com.atlassian.jira.plugin.... ]
RH Bugzilla Integration commented on ISPN-2550:
-----------------------------------------------
Tristan Tarrant <ttarrant(a)redhat.com> changed the Status of [bug 886565|https://bugzilla.redhat.com/show_bug.cgi?id=886565] from MODIFIED to ON_QA
> NoSuchElementException in Hot Rod Encoder
> -----------------------------------------
>
> Key: ISPN-2550
> URL: https://issues.jboss.org/browse/ISPN-2550
> Project: Infinispan
> Issue Type: Bug
> Components: Remote protocols
> Affects Versions: 5.2.0.Beta4
> Reporter: Michal Linhard
> Assignee: Dan Berindei
> Priority: Blocker
> Fix For: 5.2.0.CR1
>
>
> Tomas noticed this a while ago in a specific functional test:
> https://bugzilla.redhat.com/show_bug.cgi?id=875151
> I'm creating a more general JIRA, cause I'm having this in resilience test.
> What I found by quick debug, is that here:
> https://github.com/infinispan/infinispan/blob/master/server/hotrod/src/ma...
> {code}
> for (segmentIdx <- 0 until numSegments) {
> val denormalizedSegmentHashIds = allDenormalizedHashIds(segmentIdx)
> val segmentOwners = ch.locateOwnersForSegment(segmentIdx)
> for (ownerIdx <- 0 until segmentOwners.length) {
> val address = segmentOwners(ownerIdx % segmentOwners.size)
> val serverAddress = members(address)
> val hashId = denormalizedSegmentHashIds(ownerIdx)
> log.tracef("Writing hash id %d for %s:%s", hashId, serverAddress.host, serverAddress.port)
> writeString(serverAddress.host, buf)
> writeUnsignedShort(serverAddress.port, buf)
> buf.writeInt(hashId)
> }
> }
> {code}
> we're trying to obtain serverAddress for nonexistent address and NoSuchElementException is not handled properly.
> It hapens after I kill a node in a resilience test and the exception appears when querying for the node in the members cache.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 3 months
[JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.... ]
Mircea Markus commented on ISPN-2697:
-------------------------------------
[~an1310] Seems like we use RSVP flag for the operations aggregated by CacheTopologyControlCommand, including REBALANCE_START (see CommandAwareRpcDispatcher.processCalls). Might be a different reason though.
> HotRodServer startup fails when its record cannot be inserted into topology cache
> ---------------------------------------------------------------------------------
>
> Key: ISPN-2697
> URL: https://issues.jboss.org/browse/ISPN-2697
> Project: Infinispan
> Issue Type: Bug
> Components: Remote protocols
> Affects Versions: 5.2.0.Beta6
> Reporter: Radim Vansa
> Assignee: Galder Zamarreño
> Priority: Critical
> Fix For: 5.2.0.CR2
>
>
> When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}).
> However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 3 months
[JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2697:
--------------------------------
Priority: Critical (was: Major)
> HotRodServer startup fails when its record cannot be inserted into topology cache
> ---------------------------------------------------------------------------------
>
> Key: ISPN-2697
> URL: https://issues.jboss.org/browse/ISPN-2697
> Project: Infinispan
> Issue Type: Bug
> Components: Remote protocols
> Affects Versions: 5.2.0.Beta6
> Reporter: Radim Vansa
> Assignee: Galder Zamarreño
> Priority: Critical
> Fix For: 5.2.0.CR2
>
>
> When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}).
> However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 3 months
[JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.... ]
Mircea Markus commented on ISPN-2697:
-------------------------------------
Comment from [~an1310] on IRC
{quote} I see something very similar in embedded mode with the rebalance failing because a node didn't get responses for the REBALANCE_START operation. {quote}
> HotRodServer startup fails when its record cannot be inserted into topology cache
> ---------------------------------------------------------------------------------
>
> Key: ISPN-2697
> URL: https://issues.jboss.org/browse/ISPN-2697
> Project: Infinispan
> Issue Type: Bug
> Components: Remote protocols
> Affects Versions: 5.2.0.Beta6
> Reporter: Radim Vansa
> Assignee: Galder Zamarreño
> Priority: Critical
> Fix For: 5.2.0.CR2
>
>
> When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}).
> However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 3 months
[JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache
by Mircea Markus (JIRA)
[ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.... ]
Mircea Markus updated ISPN-2697:
--------------------------------
Fix Version/s: 5.2.0.CR2
(was: 5.3.0.Final)
> HotRodServer startup fails when its record cannot be inserted into topology cache
> ---------------------------------------------------------------------------------
>
> Key: ISPN-2697
> URL: https://issues.jboss.org/browse/ISPN-2697
> Project: Infinispan
> Issue Type: Bug
> Components: Remote protocols
> Affects Versions: 5.2.0.Beta6
> Reporter: Radim Vansa
> Assignee: Galder Zamarreño
> Fix For: 5.2.0.CR2
>
>
> When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}).
> However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 3 months
[JBoss JIRA] (ISPN-2697) HotRodServer startup fails when its record cannot be inserted into topology cache
by Radim Vansa (JIRA)
[ https://issues.jboss.org/browse/ISPN-2697?page=com.atlassian.jira.plugin.... ]
Radim Vansa commented on ISPN-2697:
-----------------------------------
OK, I have revised the code and you are right (of course ;-)). Maybe the doc (I mean both in-code doc and design doc which does not mention retransmission at all) should be more verbose about retransmission and not only garbage collection.
Do I understand it correctly now, that the stability message which triggers retransmission is only sent after all gossips are collected. If the whole cluster sends in average one of these gossips every X seconds (where default X is 20, we use 5), the stability message will be sent only every (clusterSize * X) seconds? (in our 64 node cluster this is 5 minutes 20 seconds)? That seems like a long time to me.
> HotRodServer startup fails when its record cannot be inserted into topology cache
> ---------------------------------------------------------------------------------
>
> Key: ISPN-2697
> URL: https://issues.jboss.org/browse/ISPN-2697
> Project: Infinispan
> Issue Type: Bug
> Components: Remote protocols
> Affects Versions: 5.2.0.Beta6
> Reporter: Radim Vansa
> Assignee: Galder Zamarreño
> Fix For: 5.3.0.Final
>
>
> When the HotRodServer starts it inserts its record to __hotRodTopologyCache ({{HotRodServer.addSelfToTopologyView(...)}}).
> However, this put may very easily fail - as the command is broadcasted using NAKACK2 protocol, if the message gets lost and there's no following broadcasted message, the message will be not retransmitted and the put operation times out (Replication timeout), which fails the whole HotRodServer startup, all because of one lost UDP message.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 3 months
[JBoss JIRA] (ISPN-2574) Segment transfer not restarted if the owner fails
by Michal Linhard (JIRA)
[ https://issues.jboss.org/browse/ISPN-2574?page=com.atlassian.jira.plugin.... ]
Michal Linhard commented on ISPN-2574:
--------------------------------------
Just checked, also fails for 5.2.0.CR1
> Segment transfer not restarted if the owner fails
> -------------------------------------------------
>
> Key: ISPN-2574
> URL: https://issues.jboss.org/browse/ISPN-2574
> Project: Infinispan
> Issue Type: Bug
> Components: State transfer
> Affects Versions: 5.2.0.Beta4
> Reporter: Radim Vansa
> Assignee: Adrian Nistor
> Priority: Critical
> Fix For: 5.2.0.CR1
>
>
> Imagine this situation in distributed cache with 3 owners:
> 1) The segment X is owned by nodes A, B, C
> 2) Node B fails -> CH_UPDATE and then REBALANCE_START are broadcasted
> 3) Node D starts transfer of segment X from C
> 4) Node C fails -> another CH_UPDATE is broadcasted
> 5) D handes the CH_UPDATE and removes the transfer of segment X from C, but does not start another transfer from A
> The {{addedSegments}} does not contain the restarted transfer, because all transfers from write consistent hash are removed from it in the beginning - the segment is considered received here although the transfer is still in progress.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
13 years, 3 months