No I don't have FD_SOCK next to FD. I saw the doubled node when the whole cluster had
a high load with lots of garbage collection runs. That was the reason why I thought that
the use of FD_SOCK will lessen the likelihood of the problem, but it still can happen,
especially when a node takes a GC run of about 5 to 8 seconds. Ok, the nodes were less
powerful machines, now they are much stronger :).
I have a UDP-based config. And some of the timing parameter are not very smart. The result
is that it sometimes happens that a node is suspected before he had the chance to send his
first message. But changing these parameters will again only reduce the likelihood of the
doubled node, but it can't be avoided.
I agree, the base problem is that the doubled node is not completely recognised. I thought
it was easier to make a node name unique, than detecting that he is in the list twice and
that one of this names is from a dead member. But it may be that the unique name is not a
clean design and that it only works because of some side effects ;).
View the original post :
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4039444#...
Reply to the post :
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&a...