[JBoss JIRA] (JGRP-2135) OOM with JGroups 3.6.11.
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2135?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2135:
--------------------------------
I thought about adding a max_msg_size attribute to TCP, but didn't do it because this is a band-aid over some root cause that needs to be discovered.
I'd rather like to find out what the root cause is.
> OOM with JGroups 3.6.11.
> ------------------------
>
> Key: JGRP-2135
> URL: https://issues.jboss.org/browse/JGRP-2135
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.11
> Reporter: Zoltan Farkas
> Assignee: Bela Ban
> Fix For: 3.6.12
>
>
> We are running our JVMs with : -XX:OnOutOfMemoryError="kill -9 %p"
> we have been experiencing OOMs fairly often, and the OOMs happen at:
> {code}
> Object / Stack Frame |Name | Shallow Heap | Retained Heap |Context Class Loader |Is Daemon
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> java.lang.Thread @ 0x81bdf838 |Connection.Receiver [144.77.77.53:50363 - 144.77.77.53:50363],sis-cluster.service,prodpmwsv5-6461| 120 | 456 |sun.misc.Launcher$AppClassLoader @ 0x800175a8|false
> |- at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48) | | | | |
> |- at org.jgroups.blocks.cs.TcpConnection$Receiver.run()V (TcpConnection.java:310)| | | | |
> |- at java.lang.Thread.run()V (Thread.java:745) | | | | |
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> {code}
> the Code where it happens is in TcpConnection.java:
> {code}
> while(canRun()) {
> try {
> int len=in.readInt();
> if(buffer == null || buffer.length < len)
> buffer=new byte[len];
> in.readFully(buffer, 0, len);
> updateLastAccessed();
> server.receive(peer_addr, buffer, 0, len);
> }
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> catch(IOException io_ex) {
> t=io_ex;
> break;
> }
> catch(Throwable e) {
> }
> }
> {code}
> when allocating: buffer=new byte[len];
> it looks to me that some invalid large value is received and the process OOMs when allocating a huge byte array
> Running JVMs without kill on OOM would make this issue "dissapear" in the sense that it is swallowed by:
> {code}
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> {code}
> Handling OutOfMemoryError is a strange implementation choice...
> instead a size limit should be employed to protect from receiving invalid sizes...
> My heap limit is 1GB and my heap dumps are 50Mb so the attempted allocation size is huge...
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years
[JBoss JIRA] (JGRP-2135) OOM with JGroups 3.6.11.
by Zoltan Farkas (JIRA)
[ https://issues.jboss.org/browse/JGRP-2135?page=com.atlassian.jira.plugin.... ]
Zoltan Farkas commented on JGRP-2135:
-------------------------------------
Hi Bela, I patched the lib like:
{code}
private static final int MAX_MESSAGE_SIZE = Integer.getInteger("jgroups.maxMessageSize", 1048576);
public void run() {
Throwable t=null;
while(canRun()) {
try {
int len=in.readInt();
if (len > MAX_MESSAGE_SIZE || len < 0) {
t = new IOException("Received message size invalid " + len);
in.close();
break;
}
if(buffer == null || buffer.length < len)
buffer=new byte[len];
in.readFully(buffer, 0, len);
updateLastAccessed();
server.receive(peer_addr, buffer, 0, len);
} catch(IOException io_ex) {
t=io_ex;
break;
}
}
server.notifyConnectionClosed(TcpConnection.this, String.format(Locale.US, "%s: %s", getClass().getSimpleName(),
t != null? t.toString() : "n/a"));
}
{code}
I am not sure where the invalid message sizes were coming, but this stopped my processes from crashing...
I find this a better solution than catching OOM...
Reliably recovering from OOM is impossible... which is why we run our processes with "kill on OOM"
> OOM with JGroups 3.6.11.
> ------------------------
>
> Key: JGRP-2135
> URL: https://issues.jboss.org/browse/JGRP-2135
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.11
> Reporter: Zoltan Farkas
> Assignee: Bela Ban
> Fix For: 3.6.12
>
>
> We are running our JVMs with : -XX:OnOutOfMemoryError="kill -9 %p"
> we have been experiencing OOMs fairly often, and the OOMs happen at:
> {code}
> Object / Stack Frame |Name | Shallow Heap | Retained Heap |Context Class Loader |Is Daemon
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> java.lang.Thread @ 0x81bdf838 |Connection.Receiver [144.77.77.53:50363 - 144.77.77.53:50363],sis-cluster.service,prodpmwsv5-6461| 120 | 456 |sun.misc.Launcher$AppClassLoader @ 0x800175a8|false
> |- at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48) | | | | |
> |- at org.jgroups.blocks.cs.TcpConnection$Receiver.run()V (TcpConnection.java:310)| | | | |
> |- at java.lang.Thread.run()V (Thread.java:745) | | | | |
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> {code}
> the Code where it happens is in TcpConnection.java:
> {code}
> while(canRun()) {
> try {
> int len=in.readInt();
> if(buffer == null || buffer.length < len)
> buffer=new byte[len];
> in.readFully(buffer, 0, len);
> updateLastAccessed();
> server.receive(peer_addr, buffer, 0, len);
> }
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> catch(IOException io_ex) {
> t=io_ex;
> break;
> }
> catch(Throwable e) {
> }
> }
> {code}
> when allocating: buffer=new byte[len];
> it looks to me that some invalid large value is received and the process OOMs when allocating a huge byte array
> Running JVMs without kill on OOM would make this issue "dissapear" in the sense that it is swallowed by:
> {code}
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> {code}
> Handling OutOfMemoryError is a strange implementation choice...
> instead a size limit should be employed to protect from receiving invalid sizes...
> My heap limit is 1GB and my heap dumps are 50Mb so the attempted allocation size is huge...
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years
[JBoss JIRA] (JGRP-2135) OOM with JGroups 3.6.11.
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2135?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-2135:
---------------------------
Fix Version/s: (was: 4.0)
> OOM with JGroups 3.6.11.
> ------------------------
>
> Key: JGRP-2135
> URL: https://issues.jboss.org/browse/JGRP-2135
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.11
> Reporter: Zoltan Farkas
> Assignee: Bela Ban
> Fix For: 3.6.12
>
>
> We are running our JVMs with : -XX:OnOutOfMemoryError="kill -9 %p"
> we have been experiencing OOMs fairly often, and the OOMs happen at:
> {code}
> Object / Stack Frame |Name | Shallow Heap | Retained Heap |Context Class Loader |Is Daemon
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> java.lang.Thread @ 0x81bdf838 |Connection.Receiver [144.77.77.53:50363 - 144.77.77.53:50363],sis-cluster.service,prodpmwsv5-6461| 120 | 456 |sun.misc.Launcher$AppClassLoader @ 0x800175a8|false
> |- at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48) | | | | |
> |- at org.jgroups.blocks.cs.TcpConnection$Receiver.run()V (TcpConnection.java:310)| | | | |
> |- at java.lang.Thread.run()V (Thread.java:745) | | | | |
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> {code}
> the Code where it happens is in TcpConnection.java:
> {code}
> while(canRun()) {
> try {
> int len=in.readInt();
> if(buffer == null || buffer.length < len)
> buffer=new byte[len];
> in.readFully(buffer, 0, len);
> updateLastAccessed();
> server.receive(peer_addr, buffer, 0, len);
> }
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> catch(IOException io_ex) {
> t=io_ex;
> break;
> }
> catch(Throwable e) {
> }
> }
> {code}
> when allocating: buffer=new byte[len];
> it looks to me that some invalid large value is received and the process OOMs when allocating a huge byte array
> Running JVMs without kill on OOM would make this issue "dissapear" in the sense that it is swallowed by:
> {code}
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> {code}
> Handling OutOfMemoryError is a strange implementation choice...
> instead a size limit should be employed to protect from receiving invalid sizes...
> My heap limit is 1GB and my heap dumps are 50Mb so the attempted allocation size is huge...
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years
[JBoss JIRA] (JGRP-2143) TP: use only one thread per member to pass up regular messages
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2143?page=com.atlassian.jira.plugin.... ]
Bela Ban resolved JGRP-2143.
----------------------------
Resolution: Done
> TP: use only one thread per member to pass up regular messages
> --------------------------------------------------------------
>
> Key: JGRP-2143
> URL: https://issues.jboss.org/browse/JGRP-2143
> Project: JGroups
> Issue Type: Enhancement
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 4.0
>
>
> This applies only to _regular_ messages; OOB and internal messages are processed by passing them to the thread pool directly when they they're received.
> The processing of a message received from B is as follows:
> * A regular message (or message batch) is assigned a thread from the thread pool and passed up to the reliability protocol, e.g. NAKACK2 or UNICAST3.
> * There is is added to the table for B.
> * The thread sees if another thread is already delivering messages from B to the application. If not, it grabs as many consecutive (ordered) messages from the table as it can and delivers them to the application. Otherwise, it returns and can be assigned other tasks.
> The problem here is that more than one thread may be passing up messages from a given sender B; only at the NAKACK2 or UNICAST3 level will a single thread be selected to deliver the messages to the application.
> This causes higher thread pool usage than required, with all of its drawbacks, e.g. more context switching, higher contention on adding messages to the table for B, and possibly exhaustion of the thread pool.
> An example of where service is denied or delayed:
> * We have a cluster of \{A,B,C,D\}
> * A receives 10 messages from B, 4 from C and 1 from D
> * The thread pool's max size is 20
> * The 10 messages from B are processed; all 10 threads add their messages to the table, but only 1 delivers them to the application and the other 9 return to the pool
> * 4 messages from C are added to C's table, 1 thread delivers them and 3 return
> * The 1 message from D is added to D's table and the same thread is used to deliver the message up the stack to the application
> So while we receive 15 messages, effectively only 3 threads are needed to deliver them to the application: as these are regular messages, they need to be delivered in _sender order_.
> The 9 threads which process messages from B are only adding them to B's table and then return immediately. This causes increased context switching, plus more contention on B's table (which is synchronized), and possibly exhaustion of the thread pool. For example, if the pool's max size was only 10, then processing the first 10 messages from B would exhaust the table, and the other messages from C and D would be processed in newly spawned threads.
> SOLUTION
> * (Only applicable to _regular_ messages)
> * When a message (or batch) from sender P is received, we check if another thread is already passing up messages from B. If not, we pass the message up by grabbing a thread from the thread pool. This will add the message to P's table and deliver as many messages (from from the table) as possible to the application.
> * If there's currently a thread delivering P's message, we simply add the message (or batch) to a queue for P and return.
> * When the delivery thread returns, it checks the queue for P and delivers all queued messages, or returns if the queue is empty.
> * (The queue is actually a MessageBatch, and new messages are simply appended to it. On delivery, the batch is cleared)
> The effects of this for regular messages are
> * Fewer threads: the thread pool only has a max of <cluster-members> threads for regular messages where <cluster-members> is the number of members in the cluster from whom we are concurrently receiving messages. E.g. for a cluster \{A,B,C,D\}, if we're receiving messages at the same time from all members, then the max size is 4.
> ** Of course, OOB and internal messages, plus timer tasks will add to this number.
> * Less contention on the table for a given member: instead of 10 threads all adding their messages to B's table (contention on the table lock) and then CASing a boolean, only 1 thread ever adds and removes messages to/from the table. This means uncontended (= fast) lock acquisition for regular messages (of course, if we use OOB messages, then we do have contention).
> * Appending to a batch is much faster then adding to a table
> * The downside is that we're storing messages actually twice: once in the batch for P and once in P's table. But these are arrays of pointers, so not a lot of memory required.
> Example: the 10 threads for messages from B above, will create a batch of 9 messages in B's queue and grab 1 thread from the pool to deliver its message. When the thread is done, it will grab the message batch of 9 and also add it to the table and deliver it.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years
[JBoss JIRA] (WFLY-7575) Elytron mechanism-provider-filtering-sasl-server-factory cannot be added without filters attribute in CLI
by Ilia Vassilev (JIRA)
[ https://issues.jboss.org/browse/WFLY-7575?page=com.atlassian.jira.plugin.... ]
Ilia Vassilev commented on WFLY-7575:
-------------------------------------
Implemented the option:
* {{filters}} should not be required, then {{mechanism-provider-filtering-sasl-server-factory}} must be able to be added without {{filters}} attribute - no exception should occur and {{mechanism-provider-filtering-sasl-server-factory}} should be stored in configuration
> Elytron mechanism-provider-filtering-sasl-server-factory cannot be added without filters attribute in CLI
> ---------------------------------------------------------------------------------------------------------
>
> Key: WFLY-7575
> URL: https://issues.jboss.org/browse/WFLY-7575
> Project: WildFly
> Issue Type: Bug
> Components: Security
> Affects Versions: 11.0.0.Alpha1
> Reporter: Ondrej Lukas
> Assignee: Ilia Vassilev
>
> Adding {{mechanism-provider-filtering-sasl-server-factory}} without {{filters}} attribute through CLI causes IllegalArgumentException. Exception is not thrown when {{filters}} attribute is used.
> It has to be decided whether:
> * {{filters}} should be required, then its nillable should be changed to false and related part of XSD should not include {{minOccurs="0"}}
> * {{filters}} should not be required, then {{mechanism-provider-filtering-sasl-server-factory}} must be able to be added without {{filters}} attribute - no exception should occur and {{mechanism-provider-filtering-sasl-server-factory}} should be stored in configuration
> Exception in server log:
> {code}
> ERROR [org.jboss.as.controller.management-operation] (management-handler-thread - 6) WFLYCTL0013: Operation ("add") failed - address: ([
> ("subsystem" => "elytron"),
> ("mechanism-provider-filtering-sasl-server-factory" => "factory")
> ]): java.lang.IllegalArgumentException
> at org.jboss.dmr.ModelValue.asList(ModelValue.java:143)
> at org.jboss.dmr.ModelNode.asList(ModelNode.java:1389)
> at org.wildfly.extension.elytron.SaslServerDefinitions$4.installService(SaslServerDefinitions.java:362)
> at org.wildfly.extension.elytron.SaslServerDefinitions$SaslServerAddHandler.performRuntime(SaslServerDefinitions.java:470)
> at org.jboss.as.controller.AbstractAddStepHandler.performRuntime(AbstractAddStepHandler.java:337)
> at org.jboss.as.controller.AbstractAddStepHandler$1.execute(AbstractAddStepHandler.java:151)
> at org.jboss.as.controller.AbstractOperationContext.executeStep(AbstractOperationContext.java:921)
> at org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:664)
> at org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:383)
> at org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1364)
> at org.jboss.as.controller.ModelControllerImpl.internalExecute(ModelControllerImpl.java:416)
> at org.jboss.as.controller.ModelControllerImpl.lambda$execute$1(ModelControllerImpl.java:237)
> at org.wildfly.security.auth.client.PeerIdentity.runAsAll(PeerIdentity.java:431)
> at org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:206)
> at org.jboss.as.controller.ModelControllerImpl.execute(ModelControllerImpl.java:237)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler.doExecute(ModelControllerClientOperationHandler.java:217)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler.access$400(ModelControllerClientOperationHandler.java:137)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler$1$1.run(ModelControllerClientOperationHandler.java:161)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler$1$1.run(ModelControllerClientOperationHandler.java:157)
> at org.wildfly.security.auth.client.PeerIdentity.runAsAll(PeerIdentity.java:464)
> at org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:225)
> at org.jboss.as.controller.AccessAuditContext.doAs(AccessAuditContext.java:185)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler$1.execute(ModelControllerClientOperationHandler.java:157)
> at org.jboss.as.protocol.mgmt.ManagementRequestContextImpl$1.doExecute(ManagementRequestContextImpl.java:70)
> at org.jboss.as.protocol.mgmt.ManagementRequestContextImpl$AsyncTaskRunner.run(ManagementRequestContextImpl.java:160)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> at org.jboss.threads.JBossThread.run(JBossThread.java:320)
> {code}
> This issue is caused by fix of JBEAP-6381.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years
[JBoss JIRA] (DROOLS-470) Decision Table (XLS) should support fixed conditions, such as SeatDesignation(isNeighborOf($guest))
by Michael Biarnes Kiefer (JIRA)
[ https://issues.jboss.org/browse/DROOLS-470?page=com.atlassian.jira.plugin... ]
Michael Biarnes Kiefer updated DROOLS-470:
------------------------------------------
Fix Version/s: 7.0.0.CR1
(was: 7.0.0.Beta5)
> Decision Table (XLS) should support fixed conditions, such as SeatDesignation(isNeighborOf($guest))
> ---------------------------------------------------------------------------------------------------
>
> Key: DROOLS-470
> URL: https://issues.jboss.org/browse/DROOLS-470
> Project: Drools
> Issue Type: Enhancement
> Components: decision tables
> Affects Versions: 6.1.0.Beta2
> Reporter: Geoffrey De Smet
> Assignee: Michael Anstis
> Labels: optaplanner-request-for-drools
> Fix For: 7.0.0.CR1
>
>
> This DT should work:
> ||CONDITION||CONDITION||ACTION|
> |$guest : SeatDesignation()|$neighbor : SeatDesignation(isNeighborOf($guest))||
> |guestName == "$param"|guestName == "$param"|doSomething();|
> It crashes because of the "SeatDesignation(isNeighborOf($guest))". Only empty parenthesis are allowed.
> Failing workaround 1: This workaround (as specified by the docs), does NOT work well, because it adds the same condition (isNeighborOf($guest)) multiple times in the same rule:
> ||CONDITION||CONDITION||CONDITION||ACTION|
> |$guest : SeatDesignation()|$neighbor : SeatDesignation()|||
> |guestName == "$param"|isNeighborOf($guest), guestName == "$param"|isNeighborOf($guest), guestAge == "$param"|doSomething();|
> Failing workaround 2: Adding an extra, hidden column with that condition does not work when new rows are added because condition columns with an empty cell are ignored.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years
[JBoss JIRA] (DROOLS-669) Create "behavesAs" operator to support traiting
by Michael Biarnes Kiefer (JIRA)
[ https://issues.jboss.org/browse/DROOLS-669?page=com.atlassian.jira.plugin... ]
Michael Biarnes Kiefer updated DROOLS-669:
------------------------------------------
Fix Version/s: 7.0.0.CR1
(was: 7.0.0.Beta5)
> Create "behavesAs" operator to support traiting
> -----------------------------------------------
>
> Key: DROOLS-669
> URL: https://issues.jboss.org/browse/DROOLS-669
> Project: Drools
> Issue Type: Feature Request
> Affects Versions: 6.2.0.CR3
> Reporter: Davide Sottara
> Assignee: Davide Sottara
> Fix For: 7.0.0.CR1
>
>
> Define an operator that would check if an object can provide *natively* all the methods required by an interface. This would ensure that, upon donning the interface as a trait, no soft field would be required, and that all methods in the interface would have a concrete implementation.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years