[mod_cluster-dev] Handling crashed/hung AS nodes
Paul Ferraro
paul.ferraro at redhat.com
Fri Mar 27 15:13:35 EDT 2009
On Fri, 2009-03-27 at 09:15 -0500, Brian Stansberry wrote:
> jean-frederic clere wrote:
> > Paul Ferraro wrote:
> >> Currently, the HAModClusterService (where httpd communication is
> >> coordinated by an HA singleton) does not react to crashed/hung members.
> >> Specifically, when the HA singleton gets a callback that the group
> >> membership changes, it does not send any REMOVE-APP messages to httpd on
> >> behalf of the member that just left. Currently, httpd will detect the
> >> failure (via a disconnected socket) on its own and sets its internal
> >> state accordingly, e.g. a STATUS message will return NOTOK.
> >>
> >> The non-handling of dropped members is actually a good thing in the
> >> event of a network partition, where communication between nodes is lost,
> >> but communication between httpd and the nodes is unaffected. If we were
> >> handling dropped members, we would have to handle the ugly scenario
> >> described here:
> >> https://jira.jboss.org/jira/browse/MODCLUSTER-66
> >>
> >> Jean-Frederic: a few questions...
> >> 1. Is it up to the AS to drive the recovery of a NOTOK node when it
> >> becomes functional again?
> >
> > Yes.
> >
> >> In the case of a crashed member, fresh
> >> CONFIG/ENABLE-APP messages will be sent upon node restart. In the case
> >> of a re-merged network partition, no additional messages are sent. Is
> >> the subsequent STATUS message (with a non-zero lbfactor) enough to
> >> trigger the recovery of this node?
> >
> > Yes.
Good to know.
> >> 2. Can httpd detect hung nodes? A hung node will not affect the
> >> connected state of the AJP/HTTP/S connector - it could only detect this
> >> by sending data to the connector and timing out on the response.
> >
> > The hung node will be detected and marked as broken but the
> > corresponding request(s) may be delayed or lost due to time-out.
> >
>
> How long does this take, say in a typical case where the hung node was
> up and running with a pool of AJP connections open? Is it the 10 secs,
> the default value of the "ping" property listed at
> https://www.jboss.org/mod_cluster/java/properties.html#proxy ?
I think he's talking about "nodeTimeout".
> Also, if a request is being handled by a hung node and the
> HAModClusterService tells httpd to stop that node, the request will
> fail, yes? It shouldn't just fail over, as it may have already caused
> the transfer of my $1,000,000 to my secret account at UBS. Failing over
> would cause transfer of a second $1,000,000 and sadly I don't have that
> much.
Not unlike those damn double-clickers...
> >>
> >> And some questions for open discussion:
> >> What does HAModClusterService really buy us over the normal
> >> ModClusterService? Do the benefits outweigh the complexity?
> >> * Maintains a uniform view of proxy status across each AS node
> >> * Can detect and send STOP-APP/REMOVE-APP messages on behalf of
> >> hung/crashed nodes (if httpd cannot already do this) (not yet
> >> implemented)
> >> + Requires special handling of network partitions
> >> * Potentially improve scalability by minimizing network traffic for
> >> very large clusters.
>
> Assume a near-term goal is to run a 150 node cluster with say 10 httpd
> servers. Assume the background thread runs every 10 seconds. That comes
> to 150 connections per second across the cluster being opened/closed to
> handle STATUS. Each httpd server handles 15 connections per second.
>
> With HAModClusterService the way it is now, you get the same, because
> besides STATUS each node also checks its ability to communicate w/ each
> httpd in order to validate its ability to become master. But let's
> assume we add some complexity to allow that health check to become much
> more infrequent. So ignore those ping checks. So, w/ HAModClusterService
> you get 1 connection/sec being opened closed across the cluster for
> status, 0.1 connection/sec per httpd. But the STATUS request sent
> across each connection has a much bigger payload.
True, the STATUS request body is larger than the INFO request (no body),
but the resulting STATUS-RSP is significantly smaller than the
corresponding INFO-RSP.
> How significant is the cost of opening/closing all those connections?
>
> >> e.g. non-masters ping httpd less often
> >> * Anything else?
> >
>
> 1) Management?? You don't want to have to interact with every node to do
> management tasks (e.g. disable app X on domain A to drain sessions so we
> can shut down the domain.) A node having a complete view might allow
> more sophisticated management operations. This is takes more thought
> though.
Good point.
> 2) The more sophisticated case discussed on
> https://jira.jboss.org/jira/browse/MODCLUSTER-66, where a primary
> partition approach is appropriate rather than letting minority
> subpartitions continue to live. But TBH mod_cluster might not be the
> right place to handle this. Probably more appropriate is to have
> something associated with the webapp itself determine it is in a
> minority partition and undeploy the webapp if so. Whether being in a
> minority partition is inappropriate for a particular webapp is beyond
> scope for mod_cluster.
>
> I'd originally thought the HA version would add benefit to the load
> balance factor calculation, but that was wrong-headed.
I would argue that there is some (albeit small) value to the load being
requested on each node at the same time. I would expect this to result
in slightly less load swinging than if individual nodes calculated their
load at different times, scattered across the status interval.
> > Well I prefer the JAVA code deciding if a node is broken that httpd. I
> > really want to keep the complexity in httpd to minimum and the talk I
> > had until now at the ApacheCon seems to show that is probably the best
> > way to go.
> >
>
> Agreed that httpd should be as simple as possible. But to handle the
> non-HAModClusterService case it will need to at least detect broken
> connections and basic response timeouts, right? So depending on how long
> it takes to detect hung nodes, httpd might be detecting them before
> HAModClusterService. I'm thinking of 3 scenarios:
>
> 1) Node completely crashes. HAModClusterService will detect this almost
> immediately; I'd think httpd would as well unless it just happened to
> not have connections open.
>
> 2) Some condition that causes the channel used by HAModClusterService to
> not process messages. This will lead to the node being suspected after
> 31.5 seconds with the default channel config. But httpd might detect a
> timeout faster than that?
>
> 3) Some condition that causes all the JBoss Web threads to block but
> doesn't impact the HAModClusterService channel. (QE's Radoslav Husar,
> Bela and I are trying to diagnose such a case right now.) Only httpd
> will detect this; JGroups will not. We could add some logic in JBoss Web
> that would allow it to detect such a situation and then let
> (HA)ModClusterService disable the node. But non-HA ModClusterService
> could do that just as well as the HA version.
I imagine this is not uncommon, e.g. overloaded/deadlocked database
causing all application threads to wait.
> Out of these 3 cases, HAModClusterService does a better job than httpd
> itself only in the #2 case, and there only if it takes httpd > 31.5 secs
> to detect a hung response.
Although, for case #2, using the plain non-HA ModClusterService avoids
the problem entirely.
> > Cheers
> >
> > Jean-Frederic
> >
> >>
> >> Paul
> >>
> >> _______________________________________________
> >> mod_cluster-dev mailing list
> >> mod_cluster-dev at lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/mod_cluster-dev
> >>
> >
> > _______________________________________________
> > mod_cluster-dev mailing list
> > mod_cluster-dev at lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/mod_cluster-dev
>
>
More information about the mod_cluster-dev
mailing list