[JBoss JIRA] (WFLY-7075) proposal (Extension): allow checking of mixed persistence context synchronization types to be skipped or to check if unsync context is already joined to JTA TX
by Scott Marlow (JIRA)
[ https://issues.jboss.org/browse/WFLY-7075?page=com.atlassian.jira.plugin.... ]
Scott Marlow commented on WFLY-7075:
------------------------------------
Good to hear from you again. By trying suggested solution, I assume your building with the changes from [https://github.com/scottmarlow/wildfly/tree/WFLY-7075_unsyncXPCExtension]?
I agree that applications should be able to set the wildfly.jpa.allowjoinedunsync=true property which I added a unit test for. If that isn't working as expected, we can look to see if I missed something in my code change.
How do you inject the entity manager in your application? Via @Inject or @PersistenceContext?
> proposal (Extension): allow checking of mixed persistence context synchronization types to be skipped or to check if unsync context is already joined to JTA TX
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-7075
> URL: https://issues.jboss.org/browse/WFLY-7075
> Project: WildFly
> Issue Type: Feature Request
> Components: JPA / Hibernate
> Affects Versions: 10.1.0.Final
> Environment: *AppServer:* WildFly 10.1.0.Final
> *WildFly-jpa:* wildfly-jpa-10.1.0.Final.jar
> Reporter: Viacheslav Astapkovich
> Assignee: Scott Marlow
> Attachments: kitchensink-ear.rar, Screenshot_10.png, Screenshot_7.png, Screenshot_8.png, Screenshot_9.png
>
>
> As we mentioned in https://issues.jboss.org/browse/WFLY-6127 we found bug and made our solution.
>
> *The obtained defect:*
> A defect related to the check of synchronization type (to satisfy JPA spec 2.1 section 7.6.4.1) was found in WildFly 10.1.0.Final.
> The Method getSynchronizationType of class ExtendedEntityManager ALWAYS returns type of synchronization as SYNCHRONIZED (jar file: wildfly-jpa-10.1.0.Final.jar)
> *FIX:*
> We made a fork of WildFly-jpa project at: https://github.com/argustelecom/wildfly/tree/WFLY-6127
> Our Fix commit: https://github.com/wildfly/wildfly/commit/3bff5fde3cfc23f3999dc75c320029e...
> _Changes_: The method getSynchronizationType returns declared synchronization type.
> [WFLY-7108] is now the tracking jira for the fix.
> *Consequences:*
> We use own realisation of Martin Fowler pattern "Unit of Work". We initialize UNSYNCHRONIZED Extended Persistence Context and our UnitOfWork realisation manages the synchronization with transaction.
> Our Beans could be controlled by UnitOfWork, but also could be used as part of WebService call.
> It requires a declaration of synchronize persistence context.
> We catch IllegalStateException after we fixed defect of synchronization type determination, because we initialize UNSYNCHRONIZED persistence context, but we use declare SYNCHRONIZED persistence context in our beans.
> However, our UnitOfWork realisation control synchronization of persistence context and we can synchronize context before synchronization type check.
> *Our actions:*
> We add ability to check synchronization type in the method testForMixedSynchronizationTypes of class TransactionScopedEntityManager by isJoinToTransaction method (i.e. the actual type of synchronization).
> This ability realized by property "jboss.as.jpa.syncasjoin" in persistence.xml file. Only if this property setted to true - we perform testForMixedSynchronizationTypes by isJoinToTransaction method. We work as usual if this property not defined or setted to false.
> _Commit_: https://github.com/wildfly/wildfly/commit/195a8a65a9fae006ad603e425f6a16d...
> *Example for reproduction:*
> I modified quickstart example: [^kitchensink-ear.rar]
> MemberRepository begin Extended UNSYNCHRONIZED Persistence Context.
> MemberFinder.find called from MemberRepository. MemberFinder declared "SYNCHRONIZED", but MemberRepository declared UNSYNCHRONIZED.
> MemberRepository also perform join Transaction.
> Questions from [~smarlow]:
> - whether you could instead use an application-managed entity manager instead (which is similar to extended persistence context except the app controls it.
> We decided to use Container-managed EntityManager, because
> - Application-managed entity managers don’t automatically propagate the JTA transaction context. With application-managed entity managers the persistence context is not propagated to application components
> - The container makes our job
> We want to use the existing mechanism
> *In Addition:*
> Formally, this is out of JPA SPEC, but we have the following reasons:
> - In the development process of the JPA specification got a question: "Should we relax this if the PC has been joined to the transaction?".
> But unfortunately, Linda DeMichiel decided to leave as current behavior because no feedback was given.
> ( https://java.net/projects/jpa-spec/lists/jsr338-experts/archive/2011-08/m... )
> - We found JIRA task https://java.net/jira/browse/JPA_SPEC-6 but it was closed because "No feedback in favor of changing current approach"
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (JGRP-2092) MERGE3: merge never happens
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2092?page=com.atlassian.jira.plugin.... ]
Bela Ban edited comment on JGRP-2092 at 9/19/16 12:35 PM:
----------------------------------------------------------
OK, the issue has been resolved. The main changes are:
* MERGE3 now has all members (not just the coords) check for view inconsistencies if {{only_coords_run_consistency_checker}} is false (default)
** (This is done every {{check_interval}} ms)
* To determine a merge leader, all actual coordinators are used. If this returns null (because there are no coords), _all_ members will be used
* Participants (not just coordinators) are now also allowed to handle merges and install merge views
was (Author: belaban):
OK, the issue has been resolved. The main changes are:
* MERGE3 now has all members (not just the coords) check for view inconsistencies if {{only_coords_run_consistency_checker}} is false (default)
** (This is done every {{check_interval}} ms)
* To determine a merge leader, all actual coordiantors are used. If this returns null (because there are no coords), _all_ members will be used
* Participants (not just coordinators) are now also allowed to handle merges and install merge views
> MERGE3: merge never happens
> ---------------------------
>
> Key: JGRP-2092
> URL: https://issues.jboss.org/browse/JGRP-2092
> Project: JGroups
> Issue Type: Bug
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 3.6.11, 4.0
>
> Attachments: jgroups.txt
>
>
> (Reported by Neal Dillman)
> In the case below, a merge doesn't seem to happen. Write a unit test to reprodue this.
> {noformat}
> Host A view: B, X, Y, Z, A (where B should be coordinator)
> Host B view: C, Q, R, S, B (where C should be coordinator)
> Host C view: A, M, N, O, C (where A should be coordinator)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (JGRP-2092) MERGE3: merge never happens
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2092?page=com.atlassian.jira.plugin.... ]
Bela Ban edited comment on JGRP-2092 at 9/19/16 12:32 PM:
----------------------------------------------------------
Hi [~ndillman]
{quote}
We no longer see the issue as it was worked around with staggering responses for all of the "bulk response" messages (ie: view change, etc).
{quote}
This may have reduced the chances of such a scenario happening, but I don't think it eliminated it completely. In my next comment, I'll show how such a scenario can be reproduced.
Yes, a solution will involve a non-coordinator becoming merge leader, but this requires code changes in 3 places: the merge detection, the merge leader determination and the actual merge.
was (Author: belaban):
Hi [~ndillman]
{quote}
We no longer see the issue as it was worked around with staggering responses for all of the "bulk response" messages (ie: view change, etc).
{quote}
This may have reduced the chances of such a scenario happening, but I don't think it eliminated it completely. In my next comment, I'll show how such a scenario can be reproduced.
Yes, a solution will involve a non-coordinator becoming merge leader, but this requires code changes in 3 places: the merge detection, the merge leader determination and the actual merge merge.
> MERGE3: merge never happens
> ---------------------------
>
> Key: JGRP-2092
> URL: https://issues.jboss.org/browse/JGRP-2092
> Project: JGroups
> Issue Type: Bug
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 3.6.11, 4.0
>
> Attachments: jgroups.txt
>
>
> (Reported by Neal Dillman)
> In the case below, a merge doesn't seem to happen. Write a unit test to reprodue this.
> {noformat}
> Host A view: B, X, Y, Z, A (where B should be coordinator)
> Host B view: C, Q, R, S, B (where C should be coordinator)
> Host C view: A, M, N, O, C (where A should be coordinator)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (WFLY-7140) Injection with @EJB is not working as expected with CDI (REST) beans
by Wolf-Dieter Fink (JIRA)
Wolf-Dieter Fink created WFLY-7140:
--------------------------------------
Summary: Injection with @EJB is not working as expected with CDI (REST) beans
Key: WFLY-7140
URL: https://issues.jboss.org/browse/WFLY-7140
Project: WildFly
Issue Type: Bug
Reporter: Wolf-Dieter Fink
Assignee: Jason Greene
The injection with @EJB should work the same way in a Rest service (CDI Bean) as it does in a Servlet.
@EJB(lookup = "ejb:jboss-ejb-multi-server-app-one/ejb/AppOneBean!org.jboss.as.quickstarts.ejb.multi.server.app.AppOne")
is not working correct if used in a CDI Bean in the reproducer example.
Reproducer can be found here:
git@github.com:wfink/jboss-eap-quickstarts.git
BRANCH: 6.4.x_ejb-multi-server_reproducerEJB-injection
SubProject: ejb-multi-server (used only a part of it to have a web-app and a ejb-app)
see ejb-multi-server/README-reproducerEJB-injection
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (JGRP-2086) FD_SOCK is keep trying to create a new socket to the killed server
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2086?page=com.atlassian.jira.plugin.... ]
Bela Ban resolved JGRP-2086.
----------------------------
Resolution: Done
> FD_SOCK is keep trying to create a new socket to the killed server
> ------------------------------------------------------------------
>
> Key: JGRP-2086
> URL: https://issues.jboss.org/browse/JGRP-2086
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.3
> Environment: JDG 6.6.0 (jgroups-3.6.3.Final-redhat-4.jar)
> Reporter: Osamu Nagano
> Assignee: Bela Ban
> Fix For: 3.6.11, 4.0
>
>
> In most cases FD_SOCK can detect a killed server immediately. But for unknown reason, FD_SOCK is keep trying to create a new socket to the killed server. As a consequence, installing a new cluster view is delayed until FD_ALL is triggered.
> m04_n007_server.log is showing the behaviour. There is 28 nodes (4 machines (m03, ..., m06) and 7 nodes (n001, ..., n007) on each) and all nodes on m03 are killed at the same time on 15:07:34,543. FD_SOCK is keep trying to connect to a killed node saying "socket address for m03_n001/clustered could not be fetched, retrying".
> {noformat}
> [n007] 15:07:39,543 TRACE [org.jgroups.protocols.FD_SOCK] (Timer-8,shared=udp) m04_n007/clustered: broadcasting SUSPECT message (suspected_mbrs=[m03_n005/clustered, m03_n007/clustered])
> [n007] 15:07:39,544 TRACE [org.jgroups.protocols.FD_SOCK] (INT-20,shared=udp) m04_n007/clustered: received SUSPECT message from m04_n007/clustered: suspects=[m03_n005/clustered, m03_n007/clustered]
> [n007] 15:07:39,546 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: socket address for m03_n001/clustered could not be fetched, retrying
> [n007] 15:07:40,546 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: ping_dest is m03_n001/clustered, pingable_mbrs=[m03_n001/clustered, m03_n002/clustered, m03_n003/clustered, m03_n004/clustered, m03_n006/clustered, m06_n001/clustered, m06_n002/clustered, m06_n003/clustered, m06_n004/clustered, m06_n005/clustered, m06_n006/clustered, m06_n007/clustered, m05_n001/clustered, m05_n002/clustered, m05_n003/clustered, m05_n004/clustered, m05_n005/clustered, m05_n006/clustered, m05_n007/clustered, m04_n001/clustered, m04_n002/clustered, m04_n003/clustered, m04_n004/clustered, m04_n005/clustered, m04_n006/clustered, m04_n007/clustered]
> [n007] 15:07:41,546 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: socket address for m03_n001/clustered could not be fetched, retrying
> [n007] 15:07:42,546 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: ping_dest is m03_n001/clustered, pingable_mbrs=[m03_n001/clustered, m03_n002/clustered, m03_n003/clustered, m03_n004/clustered, m03_n006/clustered, m06_n001/clustered, m06_n002/clustered, m06_n003/clustered, m06_n004/clustered, m06_n005/clustered, m06_n006/clustered, m06_n007/clustered, m05_n001/clustered, m05_n002/clustered, m05_n003/clustered, m05_n004/clustered, m05_n005/clustered, m05_n006/clustered, m05_n007/clustered, m04_n001/clustered, m04_n002/clustered, m04_n003/clustered, m04_n004/clustered, m04_n005/clustered, m04_n006/clustered, m04_n007/clustered]
> [n007] 15:07:43,547 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: socket address for m03_n001/clustered could not be fetched, retrying
> ...
> [n007] 15:10:53,700 DEBUG [org.jgroups.protocols.FD_ALL] (Timer-26,shared=udp) haven't received a heartbeat from m03_n005/clustered for 200059 ms, adding it to suspect list
> {noformat}
> From the TRACE log, you can find an address cache of FD_SOCK has only 23 members.
> {noformat}
> [n007] 14:40:50,471 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: got cache from m03_n005/clustered: cache is {
> m04_n006/clustered=172.20.66.34:9945,
> m05_n005/clustered=172.20.66.35:9938,
> m06_n004/clustered=172.20.66.36:9931,
> m03_n007/clustered=172.20.66.33:9952,
> m05_n001/clustered=172.20.66.35:9910,
> m06_n005/clustered=172.20.66.36:9938,
> m05_n006/clustered=172.20.66.35:9945,
> m03_n005/clustered=172.20.66.33:9938,
> m05_n004/clustered=172.20.66.35:9931,
> m04_n003/clustered=172.20.66.34:9924,
> m04_n007/clustered=172.20.66.34:9952,
> m05_n002/clustered=172.20.66.35:9917,
> m05_n003/clustered=172.20.66.35:9924,
> m04_n004/clustered=172.20.66.34:9931,
> m06_n001/clustered=172.20.66.36:9910,
> m06_n007/clustered=172.20.66.36:9952,
> m04_n005/clustered=172.20.66.34:9938,
> m04_n001/clustered=172.20.66.34:9910,
> m05_n007/clustered=172.20.66.35:9952,
> m06_n002/clustered=172.20.66.36:9917,
> m06_n006/clustered=172.20.66.36:9945,
> m04_n002/clustered=172.20.66.34:9917,
> m06_n003/clustered=172.20.66.36:9924}
> {noformat}
> While pingable_mbrs has all 28 members which is from the current available cluster view.
> {noformat}
> [n007] 14:40:50,472 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: ping_dest is m03_n005/clustered, pingable_mbrs=[
> m03_n005/clustered,
> m03_n007/clustered,
> m03_n001/clustered,
> m03_n002/clustered,
> m03_n003/clustered,
> m03_n004/clustered,
> m03_n006/clustered,
> m06_n001/clustered,
> m06_n002/clustered,
> m06_n003/clustered,
> m06_n004/clustered,
> m06_n005/clustered,
> m06_n006/clustered,
> m06_n007/clustered,
> m05_n001/clustered,
> m05_n002/clustered,
> m05_n003/clustered,
> m05_n004/clustered,
> m05_n005/clustered,
> m05_n006/clustered,
> m05_n007/clustered,
> m04_n001/clustered,
> m04_n002/clustered,
> m04_n003/clustered,
> m04_n004/clustered,
> m04_n005/clustered,
> m04_n006/clustered,
> m04_n007/clustered]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (WFLY-7126) No log messages comming from Elytron
by Darran Lofthouse (JIRA)
[ https://issues.jboss.org/browse/WFLY-7126?page=com.atlassian.jira.plugin.... ]
Darran Lofthouse commented on WFLY-7126:
----------------------------------------
We don't have anyone working on this so far but I would suggest start small, make some changes to a small area send in a PR and then get some feedback.
This is one of those issues that is a fine line between having enough information for users to diagnose problems whilst at the same time not adding a bottleneck to something that will be invoked a lot within the application server.
> No log messages comming from Elytron
> ------------------------------------
>
> Key: WFLY-7126
> URL: https://issues.jboss.org/browse/WFLY-7126
> Project: WildFly
> Issue Type: Bug
> Components: Security
> Reporter: Josef Cacek
> Assignee: Darran Lofthouse
> Priority: Critical
>
> Elytron functionality is not covered (sufficiently) by log messages.
> The log messages are cornerstone for customers when they're investigating configuration or functional issues.
> Even when enabling {{TRACE}} log-level I was seeing No log messages coming from Elytron when I was configuring web authentication. When authentication fails it's not clear what's wrong - if password is invalid or permission mapper doesn't work or something else happened.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (WFLY-7126) No log messages comming from Elytron
by Ingo Weiss (JIRA)
[ https://issues.jboss.org/browse/WFLY-7126?page=com.atlassian.jira.plugin.... ]
Ingo Weiss commented on WFLY-7126:
----------------------------------
Hi [~dlofthouse], do you mind if I work on this one?
> No log messages comming from Elytron
> ------------------------------------
>
> Key: WFLY-7126
> URL: https://issues.jboss.org/browse/WFLY-7126
> Project: WildFly
> Issue Type: Bug
> Components: Security
> Reporter: Josef Cacek
> Assignee: Darran Lofthouse
> Priority: Critical
>
> Elytron functionality is not covered (sufficiently) by log messages.
> The log messages are cornerstone for customers when they're investigating configuration or functional issues.
> Even when enabling {{TRACE}} log-level I was seeing No log messages coming from Elytron when I was configuring web authentication. When authentication fails it's not clear what's wrong - if password is invalid or permission mapper doesn't work or something else happened.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (WFCORE-301) Configuration of individual contexts for http management interface.
by Brian Stansberry (JIRA)
[ https://issues.jboss.org/browse/WFCORE-301?page=com.atlassian.jira.plugin... ]
Brian Stansberry resolved WFCORE-301.
-------------------------------------
Resolution: Duplicate Issue
> Configuration of individual contexts for http management interface.
> -------------------------------------------------------------------
>
> Key: WFCORE-301
> URL: https://issues.jboss.org/browse/WFCORE-301
> Project: WildFly Core
> Issue Type: Sub-task
> Components: Domain Management
> Reporter: Darran Lofthouse
> Assignee: Darran Lofthouse
> Labels: affects_elytron
> Fix For: 3.0.0.Alpha9
>
>
> At the moment all management requests are handled over the '/management' context, we also have a '/console' context to serve up the files for the admin console.
> The '/management' context is secured using standard HTTP mechanisms, this decision was taken so that clients could be written in different languages and all they would need to know is how to use standard authentication mechanisms. Due to problems where web browsers could run malicious scripts cross origin resource sharing is completely disabled for this context.
> We need to start to open up the handling of cross origin requests for a couple of reasons: -
> - Enabling Keycloak SSO support.
> - Alternative console distribution options
> The '/management' context is going to be retained as-is for legacy clients, possibly even switched off by default.
> A new context can then be added using non-browser based authentication, this could be SSO Keycloak or could be a form of Digest authentication where the response is handled by the console and not the web browser - either way as the browser is bypassed it is no longer at risk of sending malicious cross origin requests.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (JGRP-2086) FD_SOCK is keep trying to create a new socket to the killed server
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2086?page=com.atlassian.jira.plugin.... ]
Bela Ban edited comment on JGRP-2086 at 9/19/16 10:37 AM:
----------------------------------------------------------
OK, replacing the HashMap with a LazyRemovalMap fixed this issue. Backporting to 3.6.x ...
was (Author: belaban):
OK, replacing the HashMap with a LazyRemovalMap fixed this issue.
> FD_SOCK is keep trying to create a new socket to the killed server
> ------------------------------------------------------------------
>
> Key: JGRP-2086
> URL: https://issues.jboss.org/browse/JGRP-2086
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.3
> Environment: JDG 6.6.0 (jgroups-3.6.3.Final-redhat-4.jar)
> Reporter: Osamu Nagano
> Assignee: Bela Ban
> Fix For: 3.6.11, 4.0
>
>
> In most cases FD_SOCK can detect a killed server immediately. But for unknown reason, FD_SOCK is keep trying to create a new socket to the killed server. As a consequence, installing a new cluster view is delayed until FD_ALL is triggered.
> m04_n007_server.log is showing the behaviour. There is 28 nodes (4 machines (m03, ..., m06) and 7 nodes (n001, ..., n007) on each) and all nodes on m03 are killed at the same time on 15:07:34,543. FD_SOCK is keep trying to connect to a killed node saying "socket address for m03_n001/clustered could not be fetched, retrying".
> {noformat}
> [n007] 15:07:39,543 TRACE [org.jgroups.protocols.FD_SOCK] (Timer-8,shared=udp) m04_n007/clustered: broadcasting SUSPECT message (suspected_mbrs=[m03_n005/clustered, m03_n007/clustered])
> [n007] 15:07:39,544 TRACE [org.jgroups.protocols.FD_SOCK] (INT-20,shared=udp) m04_n007/clustered: received SUSPECT message from m04_n007/clustered: suspects=[m03_n005/clustered, m03_n007/clustered]
> [n007] 15:07:39,546 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: socket address for m03_n001/clustered could not be fetched, retrying
> [n007] 15:07:40,546 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: ping_dest is m03_n001/clustered, pingable_mbrs=[m03_n001/clustered, m03_n002/clustered, m03_n003/clustered, m03_n004/clustered, m03_n006/clustered, m06_n001/clustered, m06_n002/clustered, m06_n003/clustered, m06_n004/clustered, m06_n005/clustered, m06_n006/clustered, m06_n007/clustered, m05_n001/clustered, m05_n002/clustered, m05_n003/clustered, m05_n004/clustered, m05_n005/clustered, m05_n006/clustered, m05_n007/clustered, m04_n001/clustered, m04_n002/clustered, m04_n003/clustered, m04_n004/clustered, m04_n005/clustered, m04_n006/clustered, m04_n007/clustered]
> [n007] 15:07:41,546 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: socket address for m03_n001/clustered could not be fetched, retrying
> [n007] 15:07:42,546 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: ping_dest is m03_n001/clustered, pingable_mbrs=[m03_n001/clustered, m03_n002/clustered, m03_n003/clustered, m03_n004/clustered, m03_n006/clustered, m06_n001/clustered, m06_n002/clustered, m06_n003/clustered, m06_n004/clustered, m06_n005/clustered, m06_n006/clustered, m06_n007/clustered, m05_n001/clustered, m05_n002/clustered, m05_n003/clustered, m05_n004/clustered, m05_n005/clustered, m05_n006/clustered, m05_n007/clustered, m04_n001/clustered, m04_n002/clustered, m04_n003/clustered, m04_n004/clustered, m04_n005/clustered, m04_n006/clustered, m04_n007/clustered]
> [n007] 15:07:43,547 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: socket address for m03_n001/clustered could not be fetched, retrying
> ...
> [n007] 15:10:53,700 DEBUG [org.jgroups.protocols.FD_ALL] (Timer-26,shared=udp) haven't received a heartbeat from m03_n005/clustered for 200059 ms, adding it to suspect list
> {noformat}
> From the TRACE log, you can find an address cache of FD_SOCK has only 23 members.
> {noformat}
> [n007] 14:40:50,471 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: got cache from m03_n005/clustered: cache is {
> m04_n006/clustered=172.20.66.34:9945,
> m05_n005/clustered=172.20.66.35:9938,
> m06_n004/clustered=172.20.66.36:9931,
> m03_n007/clustered=172.20.66.33:9952,
> m05_n001/clustered=172.20.66.35:9910,
> m06_n005/clustered=172.20.66.36:9938,
> m05_n006/clustered=172.20.66.35:9945,
> m03_n005/clustered=172.20.66.33:9938,
> m05_n004/clustered=172.20.66.35:9931,
> m04_n003/clustered=172.20.66.34:9924,
> m04_n007/clustered=172.20.66.34:9952,
> m05_n002/clustered=172.20.66.35:9917,
> m05_n003/clustered=172.20.66.35:9924,
> m04_n004/clustered=172.20.66.34:9931,
> m06_n001/clustered=172.20.66.36:9910,
> m06_n007/clustered=172.20.66.36:9952,
> m04_n005/clustered=172.20.66.34:9938,
> m04_n001/clustered=172.20.66.34:9910,
> m05_n007/clustered=172.20.66.35:9952,
> m06_n002/clustered=172.20.66.36:9917,
> m06_n006/clustered=172.20.66.36:9945,
> m04_n002/clustered=172.20.66.34:9917,
> m06_n003/clustered=172.20.66.36:9924}
> {noformat}
> While pingable_mbrs has all 28 members which is from the current available cluster view.
> {noformat}
> [n007] 14:40:50,472 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: ping_dest is m03_n005/clustered, pingable_mbrs=[
> m03_n005/clustered,
> m03_n007/clustered,
> m03_n001/clustered,
> m03_n002/clustered,
> m03_n003/clustered,
> m03_n004/clustered,
> m03_n006/clustered,
> m06_n001/clustered,
> m06_n002/clustered,
> m06_n003/clustered,
> m06_n004/clustered,
> m06_n005/clustered,
> m06_n006/clustered,
> m06_n007/clustered,
> m05_n001/clustered,
> m05_n002/clustered,
> m05_n003/clustered,
> m05_n004/clustered,
> m05_n005/clustered,
> m05_n006/clustered,
> m05_n007/clustered,
> m04_n001/clustered,
> m04_n002/clustered,
> m04_n003/clustered,
> m04_n004/clustered,
> m04_n005/clustered,
> m04_n006/clustered,
> m04_n007/clustered]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months
[JBoss JIRA] (JGRP-2086) FD_SOCK is keep trying to create a new socket to the killed server
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2086?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2086:
--------------------------------
OK, replacing the HashMap with a LazyRemovalMap fixed this issue.
> FD_SOCK is keep trying to create a new socket to the killed server
> ------------------------------------------------------------------
>
> Key: JGRP-2086
> URL: https://issues.jboss.org/browse/JGRP-2086
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.3
> Environment: JDG 6.6.0 (jgroups-3.6.3.Final-redhat-4.jar)
> Reporter: Osamu Nagano
> Assignee: Bela Ban
> Fix For: 3.6.11, 4.0
>
>
> In most cases FD_SOCK can detect a killed server immediately. But for unknown reason, FD_SOCK is keep trying to create a new socket to the killed server. As a consequence, installing a new cluster view is delayed until FD_ALL is triggered.
> m04_n007_server.log is showing the behaviour. There is 28 nodes (4 machines (m03, ..., m06) and 7 nodes (n001, ..., n007) on each) and all nodes on m03 are killed at the same time on 15:07:34,543. FD_SOCK is keep trying to connect to a killed node saying "socket address for m03_n001/clustered could not be fetched, retrying".
> {noformat}
> [n007] 15:07:39,543 TRACE [org.jgroups.protocols.FD_SOCK] (Timer-8,shared=udp) m04_n007/clustered: broadcasting SUSPECT message (suspected_mbrs=[m03_n005/clustered, m03_n007/clustered])
> [n007] 15:07:39,544 TRACE [org.jgroups.protocols.FD_SOCK] (INT-20,shared=udp) m04_n007/clustered: received SUSPECT message from m04_n007/clustered: suspects=[m03_n005/clustered, m03_n007/clustered]
> [n007] 15:07:39,546 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: socket address for m03_n001/clustered could not be fetched, retrying
> [n007] 15:07:40,546 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: ping_dest is m03_n001/clustered, pingable_mbrs=[m03_n001/clustered, m03_n002/clustered, m03_n003/clustered, m03_n004/clustered, m03_n006/clustered, m06_n001/clustered, m06_n002/clustered, m06_n003/clustered, m06_n004/clustered, m06_n005/clustered, m06_n006/clustered, m06_n007/clustered, m05_n001/clustered, m05_n002/clustered, m05_n003/clustered, m05_n004/clustered, m05_n005/clustered, m05_n006/clustered, m05_n007/clustered, m04_n001/clustered, m04_n002/clustered, m04_n003/clustered, m04_n004/clustered, m04_n005/clustered, m04_n006/clustered, m04_n007/clustered]
> [n007] 15:07:41,546 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: socket address for m03_n001/clustered could not be fetched, retrying
> [n007] 15:07:42,546 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: ping_dest is m03_n001/clustered, pingable_mbrs=[m03_n001/clustered, m03_n002/clustered, m03_n003/clustered, m03_n004/clustered, m03_n006/clustered, m06_n001/clustered, m06_n002/clustered, m06_n003/clustered, m06_n004/clustered, m06_n005/clustered, m06_n006/clustered, m06_n007/clustered, m05_n001/clustered, m05_n002/clustered, m05_n003/clustered, m05_n004/clustered, m05_n005/clustered, m05_n006/clustered, m05_n007/clustered, m04_n001/clustered, m04_n002/clustered, m04_n003/clustered, m04_n004/clustered, m04_n005/clustered, m04_n006/clustered, m04_n007/clustered]
> [n007] 15:07:43,547 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: socket address for m03_n001/clustered could not be fetched, retrying
> ...
> [n007] 15:10:53,700 DEBUG [org.jgroups.protocols.FD_ALL] (Timer-26,shared=udp) haven't received a heartbeat from m03_n005/clustered for 200059 ms, adding it to suspect list
> {noformat}
> From the TRACE log, you can find an address cache of FD_SOCK has only 23 members.
> {noformat}
> [n007] 14:40:50,471 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: got cache from m03_n005/clustered: cache is {
> m04_n006/clustered=172.20.66.34:9945,
> m05_n005/clustered=172.20.66.35:9938,
> m06_n004/clustered=172.20.66.36:9931,
> m03_n007/clustered=172.20.66.33:9952,
> m05_n001/clustered=172.20.66.35:9910,
> m06_n005/clustered=172.20.66.36:9938,
> m05_n006/clustered=172.20.66.35:9945,
> m03_n005/clustered=172.20.66.33:9938,
> m05_n004/clustered=172.20.66.35:9931,
> m04_n003/clustered=172.20.66.34:9924,
> m04_n007/clustered=172.20.66.34:9952,
> m05_n002/clustered=172.20.66.35:9917,
> m05_n003/clustered=172.20.66.35:9924,
> m04_n004/clustered=172.20.66.34:9931,
> m06_n001/clustered=172.20.66.36:9910,
> m06_n007/clustered=172.20.66.36:9952,
> m04_n005/clustered=172.20.66.34:9938,
> m04_n001/clustered=172.20.66.34:9910,
> m05_n007/clustered=172.20.66.35:9952,
> m06_n002/clustered=172.20.66.36:9917,
> m06_n006/clustered=172.20.66.36:9945,
> m04_n002/clustered=172.20.66.34:9917,
> m06_n003/clustered=172.20.66.36:9924}
> {noformat}
> While pingable_mbrs has all 28 members which is from the current available cluster view.
> {noformat}
> [n007] 14:40:50,472 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: ping_dest is m03_n005/clustered, pingable_mbrs=[
> m03_n005/clustered,
> m03_n007/clustered,
> m03_n001/clustered,
> m03_n002/clustered,
> m03_n003/clustered,
> m03_n004/clustered,
> m03_n006/clustered,
> m06_n001/clustered,
> m06_n002/clustered,
> m06_n003/clustered,
> m06_n004/clustered,
> m06_n005/clustered,
> m06_n006/clustered,
> m06_n007/clustered,
> m05_n001/clustered,
> m05_n002/clustered,
> m05_n003/clustered,
> m05_n004/clustered,
> m05_n005/clustered,
> m05_n006/clustered,
> m05_n007/clustered,
> m04_n001/clustered,
> m04_n002/clustered,
> m04_n003/clustered,
> m04_n004/clustered,
> m04_n005/clustered,
> m04_n006/clustered,
> m04_n007/clustered]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
9 years, 7 months