On Fri, 2009-03-27 at 09:15 -0500, Brian Stansberry wrote:
 jean-frederic clere wrote:
 > Paul Ferraro wrote:
 >> Currently, the HAModClusterService (where httpd communication is
 >> coordinated by an HA singleton) does not react to crashed/hung members.
 >> Specifically, when the HA singleton gets a callback that the group
 >> membership changes, it does not send any REMOVE-APP messages to httpd on
 >> behalf of the member that just left.  Currently, httpd will detect the
 >> failure (via a disconnected socket) on its own and sets its internal
 >> state accordingly, e.g. a STATUS message will return NOTOK.
 >>
 >> The non-handling of dropped members is actually a good thing in the
 >> event of a network partition, where communication between nodes is lost,
 >> but communication between httpd and the nodes is unaffected.  If we were
 >> handling dropped members, we would have to handle the ugly scenario
 >> described here:
 >> 
https://jira.jboss.org/jira/browse/MODCLUSTER-66
 >>
 >> Jean-Frederic: a few questions...
 >> 1. Is it up to the AS to drive the recovery of a NOTOK node when it
 >> becomes functional again?
 > 
 > Yes.
 > 
 >>  In the case of a crashed member, fresh
 >> CONFIG/ENABLE-APP messages will be sent upon node restart.  In the case
 >> of a re-merged network partition, no additional messages are sent.  Is
 >> the subsequent STATUS message (with a non-zero lbfactor) enough to
 >> trigger the recovery of this node?
 > 
 > Yes. 
Good to know.
 
 >> 2. Can httpd detect hung nodes?  A hung node will not affect
the
 >> connected state of the AJP/HTTP/S connector - it could only detect this
 >> by sending data to the connector and timing out on the response.
 > 
 > The hung node will be detected and marked as broken but the 
 > corresponding request(s) may be delayed or lost due to time-out.
 > 
 
 How long does this take, say in a typical case where the hung node was 
 up and running with a pool of AJP connections open? Is it the 10 secs, 
 the default value of the "ping" property listed at 
 
https://www.jboss.org/mod_cluster/java/properties.html#proxy ? 
I think he's talking about "nodeTimeout".
 Also, if a request is being handled by a hung node and the 
 HAModClusterService tells httpd to stop that node, the request will 
 fail, yes? It shouldn't just fail over, as it may have already caused 
 the transfer of my $1,000,000 to my secret account at UBS. Failing over 
 would cause transfer of a second $1,000,000 and sadly I don't have that 
 much. 
Not unlike those damn double-clickers...
 >>
 >> And some questions for open discussion:
 >> What does HAModClusterService really buy us over the normal
 >> ModClusterService?  Do the benefits outweigh the complexity?
 >>  * Maintains a uniform view of proxy status across each AS node
 >>  * Can detect and send STOP-APP/REMOVE-APP messages on behalf of
 >> hung/crashed nodes (if httpd cannot already do this) (not yet
 >> implemented)
 >>    + Requires special handling of network partitions
 >>  * Potentially improve scalability by minimizing network traffic for
 >> very large clusters.
 
 Assume a near-term goal is to run a 150 node cluster with say 10 httpd 
 servers. Assume the background thread runs every 10 seconds. That comes 
 to 150 connections per second across the cluster being opened/closed to 
 handle STATUS. Each httpd server handles 15 connections per second.
 
 With HAModClusterService the way it is now, you get the same, because 
 besides STATUS each node also checks its ability to communicate w/ each 
 httpd in order to validate its ability to become master. But let's 
 assume we add some complexity to allow that health check to become much 
 more infrequent. So ignore those ping checks. So, w/ HAModClusterService 
 you get 1 connection/sec being opened closed across the cluster for 
 status, 0.1 connection/sec per httpd.  But the STATUS request sent 
 across each connection has a much bigger payload. 
True, the STATUS request body is larger than the INFO request (no body),
but the resulting STATUS-RSP is significantly smaller than the
corresponding INFO-RSP.
 How significant is the cost of opening/closing all those
connections?
 
 >>    e.g. non-masters ping httpd less often
 >>  * Anything else?
 > 
 
 1) Management?? You don't want to have to interact with every node to do 
 management tasks (e.g. disable app X on domain A to drain sessions so we 
 can shut down the domain.) A node having a complete view might allow 
 more sophisticated management operations.  This is takes more thought 
 though. 
Good point.
 2) The more sophisticated case discussed on 
 
https://jira.jboss.org/jira/browse/MODCLUSTER-66, where a primary 
 partition approach is appropriate rather than letting minority 
 subpartitions continue to live. But TBH mod_cluster might not be the 
 right place to handle this. Probably more appropriate is to have 
 something associated with the webapp itself determine it is in a 
 minority partition and undeploy the webapp if so. Whether being in a 
 minority partition is inappropriate for a particular webapp is beyond 
 scope for mod_cluster.
 
 I'd originally thought the HA version would add benefit to the load 
 balance factor calculation, but that was wrong-headed. 
I would argue that there is some (albeit small) value to the load being
requested on each node at the same time.  I would expect this to result
in slightly less load swinging than if individual nodes calculated their
load at different times, scattered across the status interval.
 > Well I prefer the JAVA code deciding if a node is broken that
httpd. I 
 > really want to keep the complexity in httpd to minimum and the talk I 
 > had until now at the ApacheCon seems to show that is probably the best 
 > way to go.
 > 
 
 Agreed that httpd should be as simple as possible. But to handle the 
 non-HAModClusterService case it will need to at least detect broken 
 connections and basic response timeouts, right? So depending on how long 
 it takes to detect hung nodes, httpd might be detecting them before 
 HAModClusterService. I'm thinking of 3 scenarios:
 
 1) Node completely crashes. HAModClusterService will detect this almost 
 immediately; I'd think httpd would as well unless it just happened to 
 not have connections open.
 
 2) Some condition that causes the channel used by HAModClusterService to 
 not process messages. This will lead to the node being suspected after 
 31.5 seconds with the default channel config. But httpd might detect a 
 timeout faster than that?
 
 3) Some condition that causes all the JBoss Web threads to block but 
 doesn't impact the HAModClusterService channel. (QE's Radoslav Husar, 
 Bela and I are trying to diagnose such a case right now.)  Only httpd 
 will detect this; JGroups will not. We could add some logic in JBoss Web 
 that would allow it to detect such a situation and then let 
 (HA)ModClusterService disable the node. But non-HA ModClusterService 
 could do that just as well as the HA version. 
I imagine this is not uncommon, e.g. overloaded/deadlocked database
causing all application threads to wait.
 Out of these 3 cases, HAModClusterService does a better job than
httpd 
 itself only in the #2 case, and there only if it takes httpd > 31.5 secs 
 to detect a hung response. 
Although, for case #2, using the plain non-HA ModClusterService avoids
the problem entirely.
 > Cheers
 > 
 > Jean-Frederic
 > 
 >>
 >> Paul
 >>
 >> _______________________________________________
 >> mod_cluster-dev mailing list
 >> mod_cluster-dev(a)lists.jboss.org
 >> 
https://lists.jboss.org/mailman/listinfo/mod_cluster-dev
 >>
 > 
 > _______________________________________________
 > mod_cluster-dev mailing list
 > mod_cluster-dev(a)lists.jboss.org
 > 
https://lists.jboss.org/mailman/listinfo/mod_cluster-dev