[jboss-dev-forums] [Design of Messaging on JBoss (Messaging/JBoss)] - Re: Changes on the JBM stack on JBoss5

Wed Nov 5 23:15:33 EST 2008

The differences in the control channel config:

UDP.singleton_name.  In the AS this instance of the UDP transport protocol is shared across numerous channels. Giving the transport a name is what allows this to work.

UDP.mcast_addr and mcast_port. These are just the values for the shared transport rather than the old JBM-specific ones. Don't see what difference this would make.

UDP.loopback=true.  We were seeing inscrutable startup failures for the AS when this was false on machines that had improperly configured multicast. With true you get startup failures (nodes can't cluster because multicast doesn't work) but they aren't inscrutable. I'd consider changing this back if we could somehow establish it's a cause of whatever your problem is.

UDP.enable_bundling=false is just putting in the config file the default you had before. No change.

UDP.ip_ttl=2. Longstanding AS default value to limit multicast propagation. In most testsuites, multicast doesn't even need to propagate off the test machine, so I doubt this is your problem.

UDP.timer.num_threads=12.  Your unspecified value defaults to 4.  This is because the transport is meant to be shared between different services in the AS, so the number of threads available to run timer tasks is increased.

UDP.thread_pool.min_threads="20". Old value = 1. With thread_pool.queue_enabled="true" and thread_pool.queue_max_size="1000", once that 1 min_thread was carrying a message up the stack or handling it at the application level, you would have to receive 1000 more messages and fill the queue before a 2nd thread would be created in the pool to handle to take a message off the queue.  With a shared transport, it's possible those messages are for completely unrelated services and while the 1 thread is busy say in the session replication cache, 1000 JBM messages pile up in the queue. You need a larger # of min threads to ensure threads are available to read the queue.  Testing with just one showed very poor performance in multi-node clusters.  I can't see why having more threads available in a pool would cause a problem.

UDP.thread_pool.rejection_policy="discard". Was "run". You can hang the entire cluster with "run", since it allows the single thread that reads messages off the wire to end up going into code that blocks in NAKACK or UNICAST or even into arbitrary application code.  With multi-node clusters in tests under load, we found it was quite easy to hang the cluster with "run".

UDP.oob_thread_pool.max_threads="20". See UDP.thread_pool.min_threads="20" above.

UDP.oob_thread_pool.rejection_policy="run"  Was "Run". This is just consistency in capitalization.

FD.timeout and maxTries. With old values, would take 50 secs to detect a hung node. That's a long time. That was reasonable in AS 4, where the single-threaded channel and no OOB thread pool made it quite possible for FD heartbeats to go unacknowledged for a long time while the single thread was busy doing something else.  With the thread pool and OOB messages, there's no reason FD heartbeats should go unacknowledged for so long, so we reduced the timeout period to 30 secs.

GMS.shun="true". Previous value of false makes no logical sense in conjunction with FD.shun="true".  This was discussed on a JBM forum thread a while back.

TBH, I don't see why any of these would cause the error you reported, but until I see more details I don't really know what the error was.

View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4187165#4187165

Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4187165