[jboss-user] [JBoss Cache: Core Edition] - jbosscache/jgroups replicated cache not working
joachimm
do-not-reply at jboss.com
Mon Jun 22 11:44:24 EDT 2009
Using jbosscache-core 3.1.0.GA on jdk 1.6.
We have a shared map of users that are logged in to our site. This is used by our back end services to send alerts to users if they are _not_ logged in. Each front end web app has a node id, and each user can have multiple sessions per node:
/loggedInUsers/userId/nodeId/sessionId
We hook into the servlet session listener lifecycle to remove sessions from the cache that have timed out, or when users log off. If a /userId/ node has no children, it is removed and the user is considered logged off of the site.
We are currently deployed with a single front end and a single back end, both on the same linux machine.
We have a number of problems:
| * the front end and back end cache state is different- e.g. often the front end will have fewer entries than the back end
|
| * the cache does not always reflect the correct state- sessions persist after they expire in the servlet container, and /userId/ nodes that have no remaining children are not cleaned up.
|
| * the eviction policy does not clean up stale entries- we've defined a time to live of 30 minutes to match our session timeout on the front end. We see entries in the cache that are many hours old.
|
| * we see membership warnings in the logs periodically even though processes are still running and we are running on one node only.
|
In short, it is not working. I'm hoping that there is some configuration issue that someone can spot, or a different way of configuring jgroups that might help. I've included the config at the end.
I also have a couple of questions:
| * is there a way to get the last modified/accessed time of an entry through the API without touching the entry and thereby increasing it's TTL?
|
| * is there a way to remove an Fqn via JMX?
|
| * is there work that the application has to do with @NodeEvicted or other annotations to ensure that the cache state is the same on all nodes (I sort of thought this was the point of jbosscache).
|
Thanks for any help you can provide. --Joachim
Here are the errors we are seeing periodically:
anonymous wrote : server.log.2009-06-18:2009-06-18 22:44:32,279 WARN [org.jgroups.protocols.FD] - <I was suspected by 127.0.0.1:59480; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK>
| server.log.2009-06-18:2009-06-18 22:44:32,279 WARN [org.jgroups.protocols.FD] - <I was suspected by 127.0.0.1:48930; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK>
| server.log.2009-06-18:2009-06-18 22:44:32,795 WARN [org.jgroups.protocols.pbcast.GMS] - <I (127.0.0.1:35383) am not a member of view [127.0.0.1:51683|6] [127.0.0.1:51683, 127.0.0.1:57843, 127.0.0.1:43144, 127.0.0.1:33294, 127.0.0.1:48930]; discarding view>
| server.log.2009-06-18:2009-06-18 22:44:32,971 WARN [org.jgroups.protocols.pbcast.GMS] - <I (127.0.0.1:51031) am not a member of view [127.0.0.1:59480|16] [127.0.0.1:59480]; discarding view>
| server.log.2009-06-18:2009-06-18 22:44:53,056 WARN [org.jgroups.protocols.pbcast.GMS] - <Merge aborted. Merge leader did not get MergeData from all subgroup coordinators [127.0.0.1:51683, 127.0.0.1:35383]>
| server.log.2009-06-18:2009-06-18 22:45:03,648 WARN [org.jgroups.protocols.pbcast.GMS] - <Merge aborted. Merge leader did not get MergeData from all subgroup coordinators [127.0.0.1:59480, 127.0.0.1:51031]>
| server.log.2009-06-18:2009-06-18 22:45:03,648 WARN [org.jgroups.protocols.pbcast.GMS] - <merge was supposed to be cancelled at merge participant 127.0.0.1:51031 (merge_id=[127.0.0.1:51031|1245365098648]), but it is not since merge ids do not match>
| server.log.2009-06-18:2009-06-18 22:45:08,605 WARN [org.jgroups.protocols.pbcast.GMS] - <Merge aborted. Merge leader did not get MergeData from all subgroup coordinators [127.0.0.1:51683, 127.0.0.1:35383]>
| server.log.2009-06-18:2009-06-18 22:45:08,609 WARN [org.jgroups.protocols.pbcast.GMS] - <merge was supposed to be cancelled at merge participant 127.0.0.1:35383 (merge_id=[127.0.0.1:35383|1245365103604]), but it is not since merge ids do not match>
| server.log.2009-06-18:2009-06-18 22:45:11,269 WARN [org.jgroups.protocols.pbcast.GMS] - <GMS flush by coordinator at 127.0.0.1:35383 failed>
| server.log.2009-06-18:2009-06-18 22:45:11,269 WARN [org.jgroups.protocols.pbcast.GMS] - <Since flush failed at 127.0.0.1:35383 rejected merge to 127.0.0.1:35383, merge_id=[127.0.0.1:35383|1245365088055]>
| server.log.2009-06-18:2009-06-18 22:45:11,269 ERROR [org.jgroups.protocols.pbcast.GMS] - <merge_id ([127.0.0.1:35383|1245365088055]) or this.merge_id (null) is null (sender=127.0.0.1:35383).>
| server.log.2009-06-18:2009-06-18 22:45:23,157 WARN [org.jgroups.protocols.pbcast.GMS] - <GMS flush by coordinator at 127.0.0.1:51031 failed>
| server.log.2009-06-18:2009-06-18 22:45:23,157 WARN [org.jgroups.protocols.pbcast.GMS] - <resume([127.0.0.1:51031|1245365098648]) does not match [127.0.0.1:51031|1245365122421]>
| server.log.2009-06-18:2009-06-18 22:45:23,157 WARN [org.jgroups.protocols.pbcast.GMS] - <Since flush failed at 127.0.0.1:51031 rejected merge to 127.0.0.1:51031, merge_id=[127.0.0.1:51031|1245365098648]>
| server.log.2009-06-18:2009-06-18 22:45:23,169 ERROR [org.jgroups.protocols.pbcast.GMS] - <this.merge_id ([127.0.0.1:51031|1245365122421]) is different from merge_id ([127.0.0.1:51031|1245365098648])>
| server.log.2009-06-18:2009-06-18 22:45:24,101 WARN [org.jgroups.protocols.pbcast.GMS] - <Merge aborted. Merge leader did not get MergeData from all subgroup coordinators [127.0.0.1:51683, 127.0.0.1:35383]>
| server.log.2009-06-18:2009-06-18 22:45:26,006 WARN [org.jgroups.protocols.pbcast.GMS] - <GMS flush by coordinator at 127.0.0.1:35383 failed>
| server.log.2009-06-18:2009-06-18 22:45:26,006 WARN [org.jgroups.protocols.pbcast.GMS] - <Since flush failed at 127.0.0.1:35383 rejected merge to 127.0.0.1:35383, merge_id=[127.0.0.1:35383|1245365103604]>
| server.log.2009-06-18:2009-06-18 22:45:26,006 ERROR [org.jgroups.protocols.pbcast.GMS] - <merge_id ([127.0.0.1:35383|1245365103604]) or this.merge_id (null) is null (sender=127.0.0.1:35383).>
| server.log.2009-06-18:2009-06-18 22:45:27,422 WARN [org.jgroups.protocols.pbcast.GMS] - <Merge aborted. Merge leader did not get MergeData from all subgroup coordinators [127.0.0.1:59480, 127.0.0.1:51031]>
| server.log.2009-06-18:2009-06-18 22:45:27,422 WARN [org.jgroups.protocols.pbcast.GMS] - <merge was supposed to be cancelled at merge participant 127.0.0.1:51031 (merge_id=[127.0.0.1:51031|1245365122421]), but it is not since merge ids do not match>
| server.log.2009-06-18:2009-06-18 22:45:33,806 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <sender 127.0.0.1:51031 not found in xmit_table>
| server.log.2009-06-18:2009-06-18 22:45:33,806 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <range is null>
| server.log.2009-06-18:2009-06-18 22:45:33,806 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <sender 127.0.0.1:59480 not found in xmit_table>
| server.log.2009-06-18:2009-06-18 22:45:33,806 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <range is null>
| server.log.2009-06-18:2009-06-18 22:45:36,062 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <sender 127.0.0.1:35383 not found in xmit_table>
| server.log.2009-06-18:2009-06-18 22:45:36,062 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <range is null>
| server.log.2009-06-18:2009-06-18 22:45:36,062 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <sender 127.0.0.1:51683 not found in xmit_table>
| server.log.2009-06-18:2009-06-18 22:45:36,062 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <range is null>
| server.log.2009-06-18:2009-06-18 22:45:36,062 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <sender 127.0.0.1:57843 not found in xmit_table>
| server.log.2009-06-18:2009-06-18 22:45:36,062 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <range is null>
| server.log.2009-06-18:2009-06-18 22:45:36,062 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <sender 127.0.0.1:43144 not found in xmit_table>
| server.log.2009-06-18:2009-06-18 22:45:36,062 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <range is null>
| server.log.2009-06-18:2009-06-18 22:45:36,062 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <sender 127.0.0.1:33294 not found in xmit_table>
| server.log.2009-06-18:2009-06-18 22:45:36,062 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <range is null>
| server.log.2009-06-18:2009-06-18 22:45:36,062 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <sender 127.0.0.1:48930 not found in xmit_table>
| server.log.2009-06-18:2009-06-18 22:45:36,062 ERROR [org.jgroups.protocols.pbcast.NAKACK] - <range is null>
| server.log.2009-06-19:2009-06-19 03:47:12,927 WARN [org.jgroups.protocols.pbcast.GMS] - <I (127.0.0.1:40908) am not a member of view [127.0.0.1:59480|18] [127.0.0.1:59480]; discarding view>
| server.log.2009-06-19:2009-06-19 03:47:12,931 WARN [org.jgroups.protocols.FD] - <I was suspected by 127.0.0.1:59480; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK>
| server.log.2009-06-19:2009-06-19 03:47:12,951 WARN [org.jgroups.protocols.pbcast.GMS] - <I (127.0.0.1:44101) am not a member of view [127.0.0.1:51683|8] [127.0.0.1:51683, 127.0.0.1:57843, 127.0.0.1:43144, 127.0.0.1:33294, 127.0.0.1:48930]; discarding view>
| server.log.2009-06-19:2009-06-19 03:47:12,995 WARN [org.jgroups.protocols.FD] - <I was suspected by 127.0.0.1:48930; ignoring the SUSPECT message and sending back a HEARTBEAT_ACK>
Here's the config:
| <jbosscache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:jboss:jbosscache-core:config:3.0">
|
| <!-- Configure the TransactionManager -->
| <transaction transactionManagerLookupClass="org.jboss.cache.transaction.GenericTransactionManagerLookup"/>
|
| <clustering mode="replication" clusterName="loggedInUsers">
| <!-- JGroups protocol stack properties. -->
| <!-- changed addr and port (to isolate channels) -->
| <jgroupsConfig>
| <UDP discard_incompatible_packets="true" enable_bundling="false" enable_diagnostics="false" ip_ttl="2"
| loopback="false" max_bundle_size="64000" max_bundle_timeout="30" mcast_addr="228.10.10.11"
| mcast_port="45589" mcast_recv_buf_size="25000000" mcast_send_buf_size="640000"
| oob_thread_pool.enabled="true" oob_thread_pool.keep_alive_time="10000" oob_thread_pool.max_threads="4"
| oob_thread_pool.min_threads="1" oob_thread_pool.queue_enabled="true" oob_thread_pool.queue_max_size="10"
| oob_thread_pool.rejection_policy="Run" thread_naming_pattern="pl" thread_pool.enabled="true"
| thread_pool.keep_alive_time="30000" thread_pool.max_threads="25" thread_pool.min_threads="1"
| thread_pool.queue_enabled="true" thread_pool.queue_max_size="10" thread_pool.rejection_policy="Run"
| tos="8" ucast_recv_buf_size="20000000" ucast_send_buf_size="640000" use_concurrent_stack="true"
| use_incoming_packet_handler="true"/>
| <PING num_initial_members="3" timeout="2000"/>
| <MERGE2 max_interval="30000" min_interval="10000"/>
| <FD_SOCK/>
| <FD max_tries="5" shun="true" timeout="10000"/>
| <VERIFY_SUSPECT timeout="1500"/>
| <pbcast.NAKACK discard_delivered_msgs="true" gc_lag="0" retransmit_timeout="300,600,1200,2400,4800"
| use_mcast_xmit="false"/>
| <UNICAST timeout="300,600,1200,2400,3600"/>
| <pbcast.STABLE desired_avg_gossip="50000" max_bytes="400000" stability_delay="1000"/>
| <pbcast.GMS join_timeout="5000" print_local_addr="true" shun="false" view_ack_collection_timeout="5000"
| view_bundling="true"/>
| <FRAG2 frag_size="60000"/>
| <pbcast.STREAMING_STATE_TRANSFER/>
| <pbcast.FLUSH timeout="0"/>
| </jgroupsConfig>
|
| <sync replTimeout="20000"/>
| <!-- Alternatively, to use async replication, comment out the element above and uncomment the element below. -->
| <!--<async />-->
|
| </clustering>
|
| <eviction wakeUpInterval="60000">
| <region name="/loggedInUsers" algorithmClass="org.jboss.cache.eviction.LRUAlgorithm" eventQueueSize="200000">
| <property name="timeToLive" value="1800001" />
| </region>
| </eviction>
|
| </jbosscache>
|
|
Thanks!
View the original post : http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4239139#4239139
Reply to the post : http://www.jboss.org/index.html?module=bb&op=posting&mode=reply&p=4239139
More information about the jboss-user
mailing list