[JBoss JIRA] (JGRP-2183) (7.0.z) DELIVERY_TIME: protocol to measure delivery times
by Bartosz Spyrko-Śmietanko (JIRA)
Bartosz Spyrko-Śmietanko created JGRP-2183:
----------------------------------------------
Summary: (7.0.z) DELIVERY_TIME: protocol to measure delivery times
Key: JGRP-2183
URL: https://issues.jboss.org/browse/JGRP-2183
Project: JGroups
Issue Type: Feature Request
Reporter: Bartosz Spyrko-Śmietanko
Assignee: Bela Ban
Priority: Minor
Fix For: 3.6.12, 4.0
This protocol should be placed at the top of the stack. It measure delivery times:
* Average times for single messages to get delivered. This returns when {{receive()}} returns
* Average times for message batches: the delivery time is computed as time to deliver the batch divided by batch size
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (JGRP-2182) (7.0.z) FD_SOCK is keep trying to create a new socket to the killed server
by Bartosz Spyrko-Śmietanko (JIRA)
Bartosz Spyrko-Śmietanko created JGRP-2182:
----------------------------------------------
Summary: (7.0.z) FD_SOCK is keep trying to create a new socket to the killed server
Key: JGRP-2182
URL: https://issues.jboss.org/browse/JGRP-2182
Project: JGroups
Issue Type: Bug
Affects Versions: 3.6.3
Environment: JDG 6.6.0 (jgroups-3.6.3.Final-redhat-4.jar)
Reporter: Bartosz Spyrko-Śmietanko
Assignee: Bela Ban
Fix For: 3.6.11, 4.0
In most cases FD_SOCK can detect a killed server immediately. But for unknown reason, FD_SOCK is keep trying to create a new socket to the killed server. As a consequence, installing a new cluster view is delayed until FD_ALL is triggered.
m04_n007_server.log is showing the behaviour. There is 28 nodes (4 machines (m03, ..., m06) and 7 nodes (n001, ..., n007) on each) and all nodes on m03 are killed at the same time on 15:07:34,543. FD_SOCK is keep trying to connect to a killed node saying "socket address for m03_n001/clustered could not be fetched, retrying".
{noformat}
[n007] 15:07:39,543 TRACE [org.jgroups.protocols.FD_SOCK] (Timer-8,shared=udp) m04_n007/clustered: broadcasting SUSPECT message (suspected_mbrs=[m03_n005/clustered, m03_n007/clustered])
[n007] 15:07:39,544 TRACE [org.jgroups.protocols.FD_SOCK] (INT-20,shared=udp) m04_n007/clustered: received SUSPECT message from m04_n007/clustered: suspects=[m03_n005/clustered, m03_n007/clustered]
[n007] 15:07:39,546 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: socket address for m03_n001/clustered could not be fetched, retrying
[n007] 15:07:40,546 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: ping_dest is m03_n001/clustered, pingable_mbrs=[m03_n001/clustered, m03_n002/clustered, m03_n003/clustered, m03_n004/clustered, m03_n006/clustered, m06_n001/clustered, m06_n002/clustered, m06_n003/clustered, m06_n004/clustered, m06_n005/clustered, m06_n006/clustered, m06_n007/clustered, m05_n001/clustered, m05_n002/clustered, m05_n003/clustered, m05_n004/clustered, m05_n005/clustered, m05_n006/clustered, m05_n007/clustered, m04_n001/clustered, m04_n002/clustered, m04_n003/clustered, m04_n004/clustered, m04_n005/clustered, m04_n006/clustered, m04_n007/clustered]
[n007] 15:07:41,546 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: socket address for m03_n001/clustered could not be fetched, retrying
[n007] 15:07:42,546 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: ping_dest is m03_n001/clustered, pingable_mbrs=[m03_n001/clustered, m03_n002/clustered, m03_n003/clustered, m03_n004/clustered, m03_n006/clustered, m06_n001/clustered, m06_n002/clustered, m06_n003/clustered, m06_n004/clustered, m06_n005/clustered, m06_n006/clustered, m06_n007/clustered, m05_n001/clustered, m05_n002/clustered, m05_n003/clustered, m05_n004/clustered, m05_n005/clustered, m05_n006/clustered, m05_n007/clustered, m04_n001/clustered, m04_n002/clustered, m04_n003/clustered, m04_n004/clustered, m04_n005/clustered, m04_n006/clustered, m04_n007/clustered]
[n007] 15:07:43,547 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: socket address for m03_n001/clustered could not be fetched, retrying
...
[n007] 15:10:53,700 DEBUG [org.jgroups.protocols.FD_ALL] (Timer-26,shared=udp) haven't received a heartbeat from m03_n005/clustered for 200059 ms, adding it to suspect list
{noformat}
>From the TRACE log, you can find an address cache of FD_SOCK has only 23 members.
{noformat}
[n007] 14:40:50,471 TRACE [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: got cache from m03_n005/clustered: cache is {
m04_n006/clustered=172.20.66.34:9945,
m05_n005/clustered=172.20.66.35:9938,
m06_n004/clustered=172.20.66.36:9931,
m03_n007/clustered=172.20.66.33:9952,
m05_n001/clustered=172.20.66.35:9910,
m06_n005/clustered=172.20.66.36:9938,
m05_n006/clustered=172.20.66.35:9945,
m03_n005/clustered=172.20.66.33:9938,
m05_n004/clustered=172.20.66.35:9931,
m04_n003/clustered=172.20.66.34:9924,
m04_n007/clustered=172.20.66.34:9952,
m05_n002/clustered=172.20.66.35:9917,
m05_n003/clustered=172.20.66.35:9924,
m04_n004/clustered=172.20.66.34:9931,
m06_n001/clustered=172.20.66.36:9910,
m06_n007/clustered=172.20.66.36:9952,
m04_n005/clustered=172.20.66.34:9938,
m04_n001/clustered=172.20.66.34:9910,
m05_n007/clustered=172.20.66.35:9952,
m06_n002/clustered=172.20.66.36:9917,
m06_n006/clustered=172.20.66.36:9945,
m04_n002/clustered=172.20.66.34:9917,
m06_n003/clustered=172.20.66.36:9924}
{noformat}
While pingable_mbrs has all 28 members which is from the current available cluster view.
{noformat}
[n007] 14:40:50,472 DEBUG [org.jgroups.protocols.FD_SOCK] (FD_SOCK pinger,m04_n007/clustered) m04_n007/clustered: ping_dest is m03_n005/clustered, pingable_mbrs=[
m03_n005/clustered,
m03_n007/clustered,
m03_n001/clustered,
m03_n002/clustered,
m03_n003/clustered,
m03_n004/clustered,
m03_n006/clustered,
m06_n001/clustered,
m06_n002/clustered,
m06_n003/clustered,
m06_n004/clustered,
m06_n005/clustered,
m06_n006/clustered,
m06_n007/clustered,
m05_n001/clustered,
m05_n002/clustered,
m05_n003/clustered,
m05_n004/clustered,
m05_n005/clustered,
m05_n006/clustered,
m05_n007/clustered,
m04_n001/clustered,
m04_n002/clustered,
m04_n003/clustered,
m04_n004/clustered,
m04_n005/clustered,
m04_n006/clustered,
m04_n007/clustered]
{noformat}
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (JGRP-2181) (7.0.z) MERGE3: merge never happens
by Bartosz Spyrko-Śmietanko (JIRA)
Bartosz Spyrko-Śmietanko created JGRP-2181:
----------------------------------------------
Summary: (7.0.z) MERGE3: merge never happens
Key: JGRP-2181
URL: https://issues.jboss.org/browse/JGRP-2181
Project: JGroups
Issue Type: Bug
Reporter: Bartosz Spyrko-Śmietanko
Assignee: Bela Ban
Fix For: 3.6.11, 4.0
(Reported by Neal Dillman)
In the case below, a merge doesn't seem to happen. Write a unit test to reprodue this.
{noformat}
Host A view: B, X, Y, Z, A (where B should be coordinator)
Host B view: C, Q, R, S, B (where C should be coordinator)
Host C view: A, M, N, O, C (where A should be coordinator)
{noformat}
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (JGRP-2180) (7.0.z) UNICAST3: bypass or remove when running over TCP
by Bartosz Spyrko-Śmietanko (JIRA)
Bartosz Spyrko-Śmietanko created JGRP-2180:
----------------------------------------------
Summary: (7.0.z) UNICAST3: bypass or remove when running over TCP
Key: JGRP-2180
URL: https://issues.jboss.org/browse/JGRP-2180
Project: JGroups
Issue Type: Enhancement
Reporter: Bartosz Spyrko-Śmietanko
Assignee: Bela Ban
Priority: Minor
Fix For: 3.6.11, 4.0
When running over TCP as transport, UNICAST3 is still required: while TCP/IP retransmits messages reliably and also provides sender-FIFO ordering, the receiver's thread pool might be exhausted and thus the message might get rejected.
However, *if* the regular and OOB thread pools are disabled, we could actually bypass (or completely remove) UNICAST3. If messages get dropped by a protocol further up the stack, however, there will be no retransmission in this case.
SOLUTION:
* Document this behavior
* Emit an INFO message (or automatically bypass UNICAST3) when run over a TCP transport and both OOB and regular pools are disabled
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (JGRP-2179) (7.0.z) SYM/ASYM_ENCRYPT: don't use WeakHashMap for old ciphers
by Bartosz Spyrko-Śmietanko (JIRA)
Bartosz Spyrko-Śmietanko created JGRP-2179:
----------------------------------------------
Summary: (7.0.z) SYM/ASYM_ENCRYPT: don't use WeakHashMap for old ciphers
Key: JGRP-2179
URL: https://issues.jboss.org/browse/JGRP-2179
Project: JGroups
Issue Type: Task
Reporter: Bartosz Spyrko-Śmietanko
Assignee: Bela Ban
Priority: Minor
Fix For: 3.6.11, 4.0
Currently we use WeakHashMap, but should not, reasons outlined below. We could replace it with a LazyRemovalCache. Andrew's email refers to SecretKeys but this probably also applies to Ciphers.
Andrew Haley's email:
{quote}
TL/DR: Please don't use WeakReferences, SoftReferences, etc. to cache
any data which might point to native memory. In particular, never do
this with instances of java.security.Key. Instead, implement either
some kind of ageing strategy or a fixed-size cache.
...
This is a warning to anybody who might cache crypto keys.
A customer has been having problems with the exhaustion of native
memory before the Java heap is full. It was fun trying to track down
the cause, but it's now happened several times to several customers,
and it's a serious problem for real-world usage in app servers.
PKCS#11 is a standard way to communicate between applications and
crypto libraries. There is a Java crypto provider which supports
PKCS#11. Some of our customers must use this provider in order to get
FIPS certification.
The problem is this:
A crypto key is a buffer in memory, allocated by the PKCS#11 native
library. It's accessed via a handle which is stored as an integer
field in a Java object. This Java object is a PhantomReference, so
when the garbage collector detects that a crypto key is no longer
reachable it is closed and the associated native memory is freed.
Modern garbage collectors don't much bother to process objects in the
old generation because it's not usually worthwhile. Thus, crypto keys
don't get recycled very quickly. They can pile up in the old
generation. This isn't a problem for the Java heap because the
objects containing the references to crypto keys are very small.
Unfortunately, the native side of a crypto key is much bigger, maybe
up to a thousand times bigger. So if we have 4000 stale crypto keys
in the heap that's not a problem, a few kbytes. But the native memory
may be a megabyte.
This problem is made even worse by Tomcat because it uses
SoftReferences to cache crypto keys. SoftReferences are processed
lazily, and maybe not at all until the Java heap runs out of memory.
Unfortunately it doesn't, but the machine runs out of native memory
instead.
We could solve this simply by making instances of PKCS#11 keys really
big Java objects by padding with dummy fields. Then, the GC would
collect them quickly. This does work but it seriously impacts
performance. Also, we could tweak the garbage collectors to clear out
stale references more enthusiastically, but this impacts performance
even more. There are some controls with the G1 collector which
process SoftReferences more aggressively and these help, but again at
the cost of performance.
Finally: the Shanandoah collector we're working on handles this
problem much better than the older collectors, but it's some
way off.
{quote}
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (JGRP-2178) (7.0.z) Add convenience method Rsp.readIn
by Bartosz Spyrko-Śmietanko (JIRA)
Bartosz Spyrko-Śmietanko created JGRP-2178:
----------------------------------------------
Summary: (7.0.z) Add convenience method Rsp.readIn
Key: JGRP-2178
URL: https://issues.jboss.org/browse/JGRP-2178
Project: JGroups
Issue Type: Enhancement
Affects Versions: 3.6.10, 4.0
Reporter: Bartosz Spyrko-Śmietanko
Assignee: Radim Vansa
Priority: Minor
Fix For: 3.6.11, 4.0
In Infinispan, during a staggered get we prepare several {{Rsp}} s in {{RspList}} and then for each {{Rsp}} we send one message. As the {{RspList}} can be accessed by multiple threads but we don't want to synchronize the access, we just get the {{Rsp}} and fill it from the (other) received {{Rsp}}. However the fill requires several ifs:
{code}
if (rsp.hasException()) {
futureRsp.setException(rsp.getException());
} else if (rsp.wasSuspected()) {
futureRsp.setSuspected();
} else if (rsp.wasUnreachable()) {
futureRsp.setUnreachable();
} else {
futureRsp.setValue(rsp.getValue());
}
{code}
Let's add a convenience method that will just read in the flags and value.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (JGRP-2177) (7.0.z) TYPE_STRING does not handle unicode
by Bartosz Spyrko-Śmietanko (JIRA)
Bartosz Spyrko-Śmietanko created JGRP-2177:
----------------------------------------------
Summary: (7.0.z) TYPE_STRING does not handle unicode
Key: JGRP-2177
URL: https://issues.jboss.org/browse/JGRP-2177
Project: JGroups
Issue Type: Bug
Reporter: Bartosz Spyrko-Śmietanko
Assignee: Bela Ban
Priority: Minor
Fix For: 3.6.11, 4.0
In several places throughout the org.jgroups.util.Util class, it is assumed that Strings are one byte per character.
For example, see objectToByteBuffer lines 561-567:
https://github.com/belaban/JGroups/blob/master/src/org/jgroups/util/Util....
{code:java}
case TYPE_STRING:
String str=(String)obj;
int len=str.length();
ByteBuffer retval=ByteBuffer.allocate(Global.BYTE_SIZE + len).put(TYPE_STRING);
for(int i=0; i < len; i++)
retval.put((byte)str.charAt(i));
return retval.array();
{code}
This code will incorrectly encode any String with non ASCII encoding.
There are several options to fix. You could use str.getBytes(StandardCharsets.UTF_8) to get a proper byte encoding, or you could use the existing TYPE_SERIALIZABLE code path.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (JGRP-2175) (7.0.z) IndexOutOfBoundsException when trace logging
by Bartosz Spyrko-Śmietanko (JIRA)
Bartosz Spyrko-Śmietanko created JGRP-2175:
----------------------------------------------
Summary: (7.0.z) IndexOutOfBoundsException when trace logging
Key: JGRP-2175
URL: https://issues.jboss.org/browse/JGRP-2175
Project: JGroups
Issue Type: Bug
Affects Versions: 3.6.9
Reporter: Bartosz Spyrko-Śmietanko
Assignee: Bela Ban
Priority: Minor
Fix For: 3.6.11
When running with trace logging, I got couple of these STs:
{code}
Exception in thread "OOB-1,test-NodeE-13479" java.lang.IndexOutOfBoundsException: Index: 4, Size: 2
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at org.jgroups.protocols.pbcast.NAKACK2.handleMessages(NAKACK2.java:868)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:705)
at org.jgroups.stack.Protocol.up(Protocol.java:425)
at org.jgroups.protocols.TP.passBatchUp(TP.java:1600)
at org.jgroups.protocols.TP$BatchHandler.run(TP.java:1820)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
Seems that part of the list of received messages is removed in handleMessages:864 in
{code}
boolean added=loopback || buf.add(msgs, oob, oob? DUMMY_OOB_MSG : null);
{code}
But the {{size}} is not recomputed afterwards.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months
[JBoss JIRA] (WFCORE-2766) Application server must be reloaded when is updated credential reference of credential store. There isn't any information that it needs reload.
by Yeray Borges (JIRA)
[ https://issues.jboss.org/browse/WFCORE-2766?page=com.atlassian.jira.plugi... ]
Yeray Borges commented on WFCORE-2766:
--------------------------------------
The was discussed with [~pskopek] arriving at the following conclusions:
* There might be different implementations of CS API which could be dynamically changed from outside and the reload not be required from WF point
* Mark as reload-required resources which are referring other CS will create a mess between resources (e. g. resources being referred by other CS, which are being referred by other CS ...), even if they are fine because contain the same passwords.
* Reloading on each alias update/remove of any CS, even if they are not being referred, is not an ideal situation
For these reasons, is left to the user to take the decision if he needs to reload or doesn't after update an alias.
This issue will be resolved once these two issues are merged: WFCORE-2426 and WFCORE-2867
Once those issues are merged, at least if the user updates the credential-reference of one CS, a reload will be required.
> Application server must be reloaded when is updated credential reference of credential store. There isn't any information that it needs reload.
> -----------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: WFCORE-2766
> URL: https://issues.jboss.org/browse/WFCORE-2766
> Project: WildFly Core
> Issue Type: Bug
> Components: Security
> Reporter: Hynek Švábek
> Assignee: Yeray Borges
>
> Application server must be reloaded when is updated credential reference of credential store. There isn't any information that it needs reload.
> In model is "restart-required" => "no-services" and credential-reference update operation ends with success message without any information about reload.
> {code:collapse}
> "credential-reference" => {
> "type" => OBJECT,
> "description" => "Credential reference to be used to create protection parameter.",
> "expressions-allowed" => false,
> "required" => true,
> "nillable" => false,
> "access-constraints" => {"sensitive" => {"credential" => {"type" => "core"}}},
> "value-type" => {
> "store" => {
> "type" => STRING,
> "description" => "The name of the credential store holding the alias to credential.",
> "expressions-allowed" => false,
> "required" => false,
> "nillable" => true,
> "capability-reference" => "org.wildfly.security.credential-store",
> "min-length" => 1L,
> "max-length" => 2147483647L
> },
> "alias" => {
> "type" => STRING,
> "description" => "The alias which denotes stored secret or credential in the store.",
> "expressions-allowed" => true,
> "required" => false,
> "nillable" => true,
> "min-length" => 1L,
> "max-length" => 2147483647L
> },
> "type" => {
> "type" => STRING,
> "description" => "The type of credential this reference is denoting.",
> "expressions-allowed" => true,
> "required" => false,
> "nillable" => true,
> "min-length" => 1L,
> "max-length" => 2147483647L
> },
> "clear-text" => {
> "type" => STRING,
> "description" => "Secret specified using clear text. Check credential store way of supplying credential/secrets to services.",
> "expressions-allowed" => true,
> "required" => false,
> "nillable" => true,
> "min-length" => 1L,
> "max-length" => 2147483647L
> }
> },
> "access-type" => "read-write",
> "storage" => "configuration",
> "restart-required" => "no-services"
> },
> {code}
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 11 months