[JBoss JIRA] (WFLY-7835) StatefulSessionSynchronizationInterceptor can`t see that current tx replaced by arjuna`s tx reaper and can`t aquire it`s own component lock
by kostd kostd (JIRA)
[ https://issues.jboss.org/browse/WFLY-7835?page=com.atlassian.jira.plugin.... ]
kostd kostd updated WFLY-7835:
------------------------------
Attachment: log-fragment-7835
> StatefulSessionSynchronizationInterceptor can`t see that current tx replaced by arjuna`s tx reaper and can`t aquire it`s own component lock
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-7835
> URL: https://issues.jboss.org/browse/WFLY-7835
> Project: WildFly
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 10.0.0.Final
> Environment: wildfly 10.0.0.Final
> java 8u101
> narayana-jts-integration-5.2.12.Final
> SFSB with CMT
> Reporter: kostd kostd
> Assignee: Tom Jenkinson
> Priority: Minor
> Attachments: log-fragment-7835
>
>
> full log will be attached, now short chronology:
> 1. ~02:20 default task-186 starts request processing. During request where was call to ExtendedPersistenceContainer (SFSB with CMT).
> So StatefulSessionSynchronizationInterceptor#processInvocation called and component lock aquired (owner = current tx, see StatefulSessionSynchronizationInterceptor#getLockOwner)
> 2. 02:39:26: tx timeout reached (20min), so arjuna`s reaper thread trying cancel tx. Now current tx == null ( ?):
> {quote}
> 02:39:26,882 WARN [arjuna] (Transaction Reaper Worker 73) ARJUNA012381: Action id 0:ffff0aa010c9:22fba49e:5848e8ad:446f10c completed with multiple threads - thread default task-186 was in progress with java.net.SocketInputStream.socketRead0(Native Method)
> {quote}
> 3. 02:39:32: query timeout in task thread reached:
> 02:39:32,806 ERROR [sql] (default task-186) SQL query error: 2c344b99 (executed in 1 205 917 ms): java.sql.SQLTimeoutException: ORA-01013: пользователем запрошена отмена текущей операции
> 4. 02:39:32: arjuna`s reaper thread trying to release component lock but fails because does not own`em:
> {quote}
> 02:39:32,808 WARN [txn] (Transaction Reaper Worker 73) WFLYTX0027: The pre-jca synchronization org.jboss.as.ejb3.component.stateful.StatefulSessionSynchronizationInterceptor$StatefulSessionSynchronization@58e72087 associated with tx TransactionImple < ac, BasicAction: 0:ffff0aa010c9:22fba49e:5848e8ad:446f10c status: ActionStatus.ABORTED > failed during after completion: java.lang.IllegalMonitorStateException
> {quote}
> 5. 02:39: task-thread contiunes request processing: hibernate noticed that current tx replaced(removed) by arjuna`s reaper:
> {quote}
> 02:39:32,822 WARN [lifecycle] (default task-186) #{viewModel.save}: javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!: javax.faces.FacesException: #{viewModel.save}: javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!
> {quote}
> 6. but StatefulSessionSynchronizationInterceptor not noticed. StatefulSessionSynchronizationInterceptor trying to aquire same lock, but owner changed, was current tx and now current thread (?):
> {quote}
> 02:39:37,827 ERROR [invocation] (default task-186) WFLYEJB0034: EJB Invocation failed on component ExtendedPersistenceContainer for method public javax.persistence.EntityManager ru.argustelecom.system.inf.transaction.ExtendedPersistenceContainer.getEntityManager(): javax.ejb.ConcurrentAccessTimeoutException: WFLYEJB0228: EJB 3.1 FR 4.3.14.1 concurrent access timeout on ExtendedPersistenceContainer - could not obtain lock within 5000 MILLISECONDS
> {quote}
> So, next two clauses is bug-behavour or by-desighn behavour:
> c1. arjuna`s reaper thread trying release component lock through StatefulSessionSynchronizationInterceptor, but does not owns this lock.
> c2. StatefulSessionSynchronizationInterceptor can`t see that current tx replaced and trying to aquire it`s own component lock with different owners. In second try lock cannot be aquired during any timeout
> ?
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 10 months
[JBoss JIRA] (WFLY-7835) StatefulSessionSynchronizationInterceptor can`t see that current tx replaced by arjuna`s tx reaper and can`t aquire it`s own component lock
by kostd kostd (JIRA)
[ https://issues.jboss.org/browse/WFLY-7835?page=com.atlassian.jira.plugin.... ]
kostd kostd updated WFLY-7835:
------------------------------
Description:
full log will be attached, now short chronology:
1. ~02:20 default task-186 starts request processing. During request where was call to ExtendedPersistenceContainer (SFSB with CMT).
So StatefulSessionSynchronizationInterceptor#processInvocation called and component lock aquired (owner = current tx, see StatefulSessionSynchronizationInterceptor#getLockOwner)
2. 02:39:26: tx timeout reached (20min), so arjuna`s reaper thread trying cancel tx. Now current tx == null ( ?):
{quote}
02:39:26,882 WARN [arjuna] (Transaction Reaper Worker 73) ARJUNA012381: Action id 0:ffff0aa010c9:22fba49e:5848e8ad:446f10c completed with multiple threads - thread default task-186 was in progress with java.net.SocketInputStream.socketRead0(Native Method)
{quote}
3. 02:39:32: query timeout in task thread reached:
02:39:32,806 ERROR [sql] (default task-186) SQL query error: 2c344b99 (executed in 1 205 917 ms): java.sql.SQLTimeoutException: ORA-01013: пользователем запрошена отмена текущей операции
4. 02:39:32: arjuna`s reaper thread trying to release component lock but fails because does not own`em:
{quote}
02:39:32,808 WARN [txn] (Transaction Reaper Worker 73) WFLYTX0027: The pre-jca synchronization org.jboss.as.ejb3.component.stateful.StatefulSessionSynchronizationInterceptor$StatefulSessionSynchronization@58e72087 associated with tx TransactionImple < ac, BasicAction: 0:ffff0aa010c9:22fba49e:5848e8ad:446f10c status: ActionStatus.ABORTED > failed during after completion: java.lang.IllegalMonitorStateException
{quote}
5. 02:39: task-thread contiunes request processing: hibernate noticed that current tx replaced(removed) by arjuna`s reaper:
{quote}
02:39:32,822 WARN [lifecycle] (default task-186) #{viewModel.save}: javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!: javax.faces.FacesException: #{viewModel.save}: javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!
{quote}
6. but StatefulSessionSynchronizationInterceptor not noticed. StatefulSessionSynchronizationInterceptor trying to aquire same lock, but owner changed, was current tx and now current thread (?):
{quote}
02:39:37,827 ERROR [invocation] (default task-186) WFLYEJB0034: EJB Invocation failed on component ExtendedPersistenceContainer for method public javax.persistence.EntityManager ru.argustelecom.system.inf.transaction.ExtendedPersistenceContainer.getEntityManager(): javax.ejb.ConcurrentAccessTimeoutException: WFLYEJB0228: EJB 3.1 FR 4.3.14.1 concurrent access timeout on ExtendedPersistenceContainer - could not obtain lock within 5000 MILLISECONDS
{quote}
So, next two clauses is bug-behavour or by-desighn behavour:
c1. arjuna`s reaper thread trying release component lock through StatefulSessionSynchronizationInterceptor, but does not owns this lock.
c2. StatefulSessionSynchronizationInterceptor can`t see that current tx replaced and trying to aquire it`s own component lock with different owners. In second try lock cannot be aquired during any timeout
?
was:
full log will be attached, now short chronology:
1. ~02:20 default task-186 starts request processing. During request where was call to ExtendedPersistenceContainer (SFSB with CMT).
So StatefulSessionSynchronizationInterceptor#processInvocation called and component lock aquired (owner = current tx, see StatefulSessionSynchronizationInterceptor#getLockOwner)
2. 02:39:26: tx timeout reached (20min), so arjuna`s reaper thread trying cancel tx. Now current tx == null (?):
{quote}
02:39:26,882 WARN [arjuna] (Transaction Reaper Worker 73) ARJUNA012381: Action id 0:ffff0aa010c9:22fba49e:5848e8ad:446f10c completed with multiple threads - thread default task-186 was in progress with java.net.SocketInputStream.socketRead0(Native Method)
{quote}
3. 02:39:32: query timeout in task thread reached:
02:39:32,806 ERROR [sql] (default task-186) SQL query error: 2c344b99 (executed in 1 205 917 ms): java.sql.SQLTimeoutException: ORA-01013: пользователем запрошена отмена текущей операции
4. 02:39:32: arjuna`s reaper thread trying to release component lock but fails because does not own`em:
{quote}
02:39:32,808 WARN [txn] (Transaction Reaper Worker 73) WFLYTX0027: The pre-jca synchronization org.jboss.as.ejb3.component.stateful.StatefulSessionSynchronizationInterceptor$StatefulSessionSynchronization@58e72087 associated with tx TransactionImple < ac, BasicAction: 0:ffff0aa010c9:22fba49e:5848e8ad:446f10c status: ActionStatus.ABORTED > failed during after completion: java.lang.IllegalMonitorStateException
{quote}
5. 02:39: task-thread contiunes request processing: hibernate noticed that current tx replaced(removed) by arjuna`s reaper:
{quote}
02:39:32,822 WARN [lifecycle] (default task-186) #{viewModel.save}: javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!: javax.faces.FacesException: #{viewModel.save}: javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!
{quote}
6. but StatefulSessionSynchronizationInterceptor not noticed. StatefulSessionSynchronizationInterceptor trying to aquire same lock, but owner changed, was current tx and now current thread (?):
{quote}
02:39:37,827 ERROR [invocation] (default task-186) WFLYEJB0034: EJB Invocation failed on component ExtendedPersistenceContainer for method public javax.persistence.EntityManager ru.argustelecom.system.inf.transaction.ExtendedPersistenceContainer.getEntityManager(): javax.ejb.ConcurrentAccessTimeoutException: WFLYEJB0228: EJB 3.1 FR 4.3.14.1 concurrent access timeout on ExtendedPersistenceContainer - could not obtain lock within 5000 MILLISECONDS
{quote}
So, next two clauses is bug-behavour or by-desighn behavour:
c1. arjuna`s reaper thread trying release component lock through StatefulSessionSynchronizationInterceptor, but does not owns this lock.
c2. StatefulSessionSynchronizationInterceptor can`t see that current tx replaced and trying to aquire it`s own component lock with different owners. In second try lock cannot be aquired during any timeout
?
> StatefulSessionSynchronizationInterceptor can`t see that current tx replaced by arjuna`s tx reaper and can`t aquire it`s own component lock
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: WFLY-7835
> URL: https://issues.jboss.org/browse/WFLY-7835
> Project: WildFly
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 10.0.0.Final
> Environment: wildfly 10.0.0.Final
> java 8u101
> narayana-jts-integration-5.2.12.Final
> SFSB with CMT
> Reporter: kostd kostd
> Assignee: Tom Jenkinson
> Priority: Minor
>
> full log will be attached, now short chronology:
> 1. ~02:20 default task-186 starts request processing. During request where was call to ExtendedPersistenceContainer (SFSB with CMT).
> So StatefulSessionSynchronizationInterceptor#processInvocation called and component lock aquired (owner = current tx, see StatefulSessionSynchronizationInterceptor#getLockOwner)
> 2. 02:39:26: tx timeout reached (20min), so arjuna`s reaper thread trying cancel tx. Now current tx == null ( ?):
> {quote}
> 02:39:26,882 WARN [arjuna] (Transaction Reaper Worker 73) ARJUNA012381: Action id 0:ffff0aa010c9:22fba49e:5848e8ad:446f10c completed with multiple threads - thread default task-186 was in progress with java.net.SocketInputStream.socketRead0(Native Method)
> {quote}
> 3. 02:39:32: query timeout in task thread reached:
> 02:39:32,806 ERROR [sql] (default task-186) SQL query error: 2c344b99 (executed in 1 205 917 ms): java.sql.SQLTimeoutException: ORA-01013: пользователем запрошена отмена текущей операции
> 4. 02:39:32: arjuna`s reaper thread trying to release component lock but fails because does not own`em:
> {quote}
> 02:39:32,808 WARN [txn] (Transaction Reaper Worker 73) WFLYTX0027: The pre-jca synchronization org.jboss.as.ejb3.component.stateful.StatefulSessionSynchronizationInterceptor$StatefulSessionSynchronization@58e72087 associated with tx TransactionImple < ac, BasicAction: 0:ffff0aa010c9:22fba49e:5848e8ad:446f10c status: ActionStatus.ABORTED > failed during after completion: java.lang.IllegalMonitorStateException
> {quote}
> 5. 02:39: task-thread contiunes request processing: hibernate noticed that current tx replaced(removed) by arjuna`s reaper:
> {quote}
> 02:39:32,822 WARN [lifecycle] (default task-186) #{viewModel.save}: javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!: javax.faces.FacesException: #{viewModel.save}: javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!
> {quote}
> 6. but StatefulSessionSynchronizationInterceptor not noticed. StatefulSessionSynchronizationInterceptor trying to aquire same lock, but owner changed, was current tx and now current thread (?):
> {quote}
> 02:39:37,827 ERROR [invocation] (default task-186) WFLYEJB0034: EJB Invocation failed on component ExtendedPersistenceContainer for method public javax.persistence.EntityManager ru.argustelecom.system.inf.transaction.ExtendedPersistenceContainer.getEntityManager(): javax.ejb.ConcurrentAccessTimeoutException: WFLYEJB0228: EJB 3.1 FR 4.3.14.1 concurrent access timeout on ExtendedPersistenceContainer - could not obtain lock within 5000 MILLISECONDS
> {quote}
> So, next two clauses is bug-behavour or by-desighn behavour:
> c1. arjuna`s reaper thread trying release component lock through StatefulSessionSynchronizationInterceptor, but does not owns this lock.
> c2. StatefulSessionSynchronizationInterceptor can`t see that current tx replaced and trying to aquire it`s own component lock with different owners. In second try lock cannot be aquired during any timeout
> ?
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 10 months
[JBoss JIRA] (WFLY-7835) StatefulSessionSynchronizationInterceptor can`t see that current tx replaced by arjuna`s tx reaper and can`t aquire it`s own component lock
by kostd kostd (JIRA)
kostd kostd created WFLY-7835:
---------------------------------
Summary: StatefulSessionSynchronizationInterceptor can`t see that current tx replaced by arjuna`s tx reaper and can`t aquire it`s own component lock
Key: WFLY-7835
URL: https://issues.jboss.org/browse/WFLY-7835
Project: WildFly
Issue Type: Bug
Components: Transactions
Affects Versions: 10.0.0.Final
Environment: wildfly 10.0.0.Final
java 8u101
narayana-jts-integration-5.2.12.Final
SFSB with CMT
Reporter: kostd kostd
Assignee: Tom Jenkinson
Priority: Minor
full log will be attached, now short chronology:
1. ~02:20 default task-186 starts request processing. During request where was call to ExtendedPersistenceContainer (SFSB with CMT).
So StatefulSessionSynchronizationInterceptor#processInvocation called and component lock aquired (owner = current tx, see StatefulSessionSynchronizationInterceptor#getLockOwner)
2. 02:39:26: tx timeout reached (20min), so arjuna`s reaper thread trying cancel tx. Now current tx == null (?):
{quote}
02:39:26,882 WARN [arjuna] (Transaction Reaper Worker 73) ARJUNA012381: Action id 0:ffff0aa010c9:22fba49e:5848e8ad:446f10c completed with multiple threads - thread default task-186 was in progress with java.net.SocketInputStream.socketRead0(Native Method)
{quote}
3. 02:39:32: query timeout in task thread reached:
02:39:32,806 ERROR [sql] (default task-186) SQL query error: 2c344b99 (executed in 1 205 917 ms): java.sql.SQLTimeoutException: ORA-01013: пользователем запрошена отмена текущей операции
4. 02:39:32: arjuna`s reaper thread trying to release component lock but fails because does not own`em:
{quote}
02:39:32,808 WARN [txn] (Transaction Reaper Worker 73) WFLYTX0027: The pre-jca synchronization org.jboss.as.ejb3.component.stateful.StatefulSessionSynchronizationInterceptor$StatefulSessionSynchronization@58e72087 associated with tx TransactionImple < ac, BasicAction: 0:ffff0aa010c9:22fba49e:5848e8ad:446f10c status: ActionStatus.ABORTED > failed during after completion: java.lang.IllegalMonitorStateException
{quote}
5. 02:39: task-thread contiunes request processing: hibernate noticed that current tx replaced(removed) by arjuna`s reaper:
{quote}
02:39:32,822 WARN [lifecycle] (default task-186) #{viewModel.save}: javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!: javax.faces.FacesException: #{viewModel.save}: javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!
{quote}
6. but StatefulSessionSynchronizationInterceptor not noticed. StatefulSessionSynchronizationInterceptor trying to aquire same lock, but owner changed, was current tx and now current thread (?):
{quote}
02:39:37,827 ERROR [invocation] (default task-186) WFLYEJB0034: EJB Invocation failed on component ExtendedPersistenceContainer for method public javax.persistence.EntityManager ru.argustelecom.system.inf.transaction.ExtendedPersistenceContainer.getEntityManager(): javax.ejb.ConcurrentAccessTimeoutException: WFLYEJB0228: EJB 3.1 FR 4.3.14.1 concurrent access timeout on ExtendedPersistenceContainer - could not obtain lock within 5000 MILLISECONDS
{quote}
So, next two clauses is bug-behavour or by-desighn behavour:
c1. arjuna`s reaper thread trying release component lock through StatefulSessionSynchronizationInterceptor, but does not owns this lock.
c2. StatefulSessionSynchronizationInterceptor can`t see that current tx replaced and trying to aquire it`s own component lock with different owners. In second try lock cannot be aquired during any timeout
?
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 10 months
[JBoss JIRA] (JGRP-2135) OOM with JGroups 3.6.11.
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2135?page=com.atlassian.jira.plugin.... ]
Bela Ban commented on JGRP-2135:
--------------------------------
I thought about adding a max_msg_size attribute to TCP, but didn't do it because this is a band-aid over some root cause that needs to be discovered.
I'd rather like to find out what the root cause is.
> OOM with JGroups 3.6.11.
> ------------------------
>
> Key: JGRP-2135
> URL: https://issues.jboss.org/browse/JGRP-2135
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.11
> Reporter: Zoltan Farkas
> Assignee: Bela Ban
> Fix For: 3.6.12
>
>
> We are running our JVMs with : -XX:OnOutOfMemoryError="kill -9 %p"
> we have been experiencing OOMs fairly often, and the OOMs happen at:
> {code}
> Object / Stack Frame |Name | Shallow Heap | Retained Heap |Context Class Loader |Is Daemon
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> java.lang.Thread @ 0x81bdf838 |Connection.Receiver [144.77.77.53:50363 - 144.77.77.53:50363],sis-cluster.service,prodpmwsv5-6461| 120 | 456 |sun.misc.Launcher$AppClassLoader @ 0x800175a8|false
> |- at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48) | | | | |
> |- at org.jgroups.blocks.cs.TcpConnection$Receiver.run()V (TcpConnection.java:310)| | | | |
> |- at java.lang.Thread.run()V (Thread.java:745) | | | | |
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> {code}
> the Code where it happens is in TcpConnection.java:
> {code}
> while(canRun()) {
> try {
> int len=in.readInt();
> if(buffer == null || buffer.length < len)
> buffer=new byte[len];
> in.readFully(buffer, 0, len);
> updateLastAccessed();
> server.receive(peer_addr, buffer, 0, len);
> }
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> catch(IOException io_ex) {
> t=io_ex;
> break;
> }
> catch(Throwable e) {
> }
> }
> {code}
> when allocating: buffer=new byte[len];
> it looks to me that some invalid large value is received and the process OOMs when allocating a huge byte array
> Running JVMs without kill on OOM would make this issue "dissapear" in the sense that it is swallowed by:
> {code}
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> {code}
> Handling OutOfMemoryError is a strange implementation choice...
> instead a size limit should be employed to protect from receiving invalid sizes...
> My heap limit is 1GB and my heap dumps are 50Mb so the attempted allocation size is huge...
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 10 months
[JBoss JIRA] (JGRP-2135) OOM with JGroups 3.6.11.
by Zoltan Farkas (JIRA)
[ https://issues.jboss.org/browse/JGRP-2135?page=com.atlassian.jira.plugin.... ]
Zoltan Farkas commented on JGRP-2135:
-------------------------------------
Hi Bela, I patched the lib like:
{code}
private static final int MAX_MESSAGE_SIZE = Integer.getInteger("jgroups.maxMessageSize", 1048576);
public void run() {
Throwable t=null;
while(canRun()) {
try {
int len=in.readInt();
if (len > MAX_MESSAGE_SIZE || len < 0) {
t = new IOException("Received message size invalid " + len);
in.close();
break;
}
if(buffer == null || buffer.length < len)
buffer=new byte[len];
in.readFully(buffer, 0, len);
updateLastAccessed();
server.receive(peer_addr, buffer, 0, len);
} catch(IOException io_ex) {
t=io_ex;
break;
}
}
server.notifyConnectionClosed(TcpConnection.this, String.format(Locale.US, "%s: %s", getClass().getSimpleName(),
t != null? t.toString() : "n/a"));
}
{code}
I am not sure where the invalid message sizes were coming, but this stopped my processes from crashing...
I find this a better solution than catching OOM...
Reliably recovering from OOM is impossible... which is why we run our processes with "kill on OOM"
> OOM with JGroups 3.6.11.
> ------------------------
>
> Key: JGRP-2135
> URL: https://issues.jboss.org/browse/JGRP-2135
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.11
> Reporter: Zoltan Farkas
> Assignee: Bela Ban
> Fix For: 3.6.12
>
>
> We are running our JVMs with : -XX:OnOutOfMemoryError="kill -9 %p"
> we have been experiencing OOMs fairly often, and the OOMs happen at:
> {code}
> Object / Stack Frame |Name | Shallow Heap | Retained Heap |Context Class Loader |Is Daemon
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> java.lang.Thread @ 0x81bdf838 |Connection.Receiver [144.77.77.53:50363 - 144.77.77.53:50363],sis-cluster.service,prodpmwsv5-6461| 120 | 456 |sun.misc.Launcher$AppClassLoader @ 0x800175a8|false
> |- at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48) | | | | |
> |- at org.jgroups.blocks.cs.TcpConnection$Receiver.run()V (TcpConnection.java:310)| | | | |
> |- at java.lang.Thread.run()V (Thread.java:745) | | | | |
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> {code}
> the Code where it happens is in TcpConnection.java:
> {code}
> while(canRun()) {
> try {
> int len=in.readInt();
> if(buffer == null || buffer.length < len)
> buffer=new byte[len];
> in.readFully(buffer, 0, len);
> updateLastAccessed();
> server.receive(peer_addr, buffer, 0, len);
> }
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> catch(IOException io_ex) {
> t=io_ex;
> break;
> }
> catch(Throwable e) {
> }
> }
> {code}
> when allocating: buffer=new byte[len];
> it looks to me that some invalid large value is received and the process OOMs when allocating a huge byte array
> Running JVMs without kill on OOM would make this issue "dissapear" in the sense that it is swallowed by:
> {code}
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> {code}
> Handling OutOfMemoryError is a strange implementation choice...
> instead a size limit should be employed to protect from receiving invalid sizes...
> My heap limit is 1GB and my heap dumps are 50Mb so the attempted allocation size is huge...
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 10 months
[JBoss JIRA] (JGRP-2135) OOM with JGroups 3.6.11.
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2135?page=com.atlassian.jira.plugin.... ]
Bela Ban updated JGRP-2135:
---------------------------
Fix Version/s: (was: 4.0)
> OOM with JGroups 3.6.11.
> ------------------------
>
> Key: JGRP-2135
> URL: https://issues.jboss.org/browse/JGRP-2135
> Project: JGroups
> Issue Type: Bug
> Affects Versions: 3.6.11
> Reporter: Zoltan Farkas
> Assignee: Bela Ban
> Fix For: 3.6.12
>
>
> We are running our JVMs with : -XX:OnOutOfMemoryError="kill -9 %p"
> we have been experiencing OOMs fairly often, and the OOMs happen at:
> {code}
> Object / Stack Frame |Name | Shallow Heap | Retained Heap |Context Class Loader |Is Daemon
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> java.lang.Thread @ 0x81bdf838 |Connection.Receiver [144.77.77.53:50363 - 144.77.77.53:50363],sis-cluster.service,prodpmwsv5-6461| 120 | 456 |sun.misc.Launcher$AppClassLoader @ 0x800175a8|false
> |- at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48) | | | | |
> |- at org.jgroups.blocks.cs.TcpConnection$Receiver.run()V (TcpConnection.java:310)| | | | |
> |- at java.lang.Thread.run()V (Thread.java:745) | | | | |
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> {code}
> the Code where it happens is in TcpConnection.java:
> {code}
> while(canRun()) {
> try {
> int len=in.readInt();
> if(buffer == null || buffer.length < len)
> buffer=new byte[len];
> in.readFully(buffer, 0, len);
> updateLastAccessed();
> server.receive(peer_addr, buffer, 0, len);
> }
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> catch(IOException io_ex) {
> t=io_ex;
> break;
> }
> catch(Throwable e) {
> }
> }
> {code}
> when allocating: buffer=new byte[len];
> it looks to me that some invalid large value is received and the process OOMs when allocating a huge byte array
> Running JVMs without kill on OOM would make this issue "dissapear" in the sense that it is swallowed by:
> {code}
> catch(OutOfMemoryError mem_ex) {
> t=mem_ex;
> break; // continue;
> }
> {code}
> Handling OutOfMemoryError is a strange implementation choice...
> instead a size limit should be employed to protect from receiving invalid sizes...
> My heap limit is 1GB and my heap dumps are 50Mb so the attempted allocation size is huge...
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 10 months
[JBoss JIRA] (JGRP-2143) TP: use only one thread per member to pass up regular messages
by Bela Ban (JIRA)
[ https://issues.jboss.org/browse/JGRP-2143?page=com.atlassian.jira.plugin.... ]
Bela Ban resolved JGRP-2143.
----------------------------
Resolution: Done
> TP: use only one thread per member to pass up regular messages
> --------------------------------------------------------------
>
> Key: JGRP-2143
> URL: https://issues.jboss.org/browse/JGRP-2143
> Project: JGroups
> Issue Type: Enhancement
> Reporter: Bela Ban
> Assignee: Bela Ban
> Fix For: 4.0
>
>
> This applies only to _regular_ messages; OOB and internal messages are processed by passing them to the thread pool directly when they they're received.
> The processing of a message received from B is as follows:
> * A regular message (or message batch) is assigned a thread from the thread pool and passed up to the reliability protocol, e.g. NAKACK2 or UNICAST3.
> * There is is added to the table for B.
> * The thread sees if another thread is already delivering messages from B to the application. If not, it grabs as many consecutive (ordered) messages from the table as it can and delivers them to the application. Otherwise, it returns and can be assigned other tasks.
> The problem here is that more than one thread may be passing up messages from a given sender B; only at the NAKACK2 or UNICAST3 level will a single thread be selected to deliver the messages to the application.
> This causes higher thread pool usage than required, with all of its drawbacks, e.g. more context switching, higher contention on adding messages to the table for B, and possibly exhaustion of the thread pool.
> An example of where service is denied or delayed:
> * We have a cluster of \{A,B,C,D\}
> * A receives 10 messages from B, 4 from C and 1 from D
> * The thread pool's max size is 20
> * The 10 messages from B are processed; all 10 threads add their messages to the table, but only 1 delivers them to the application and the other 9 return to the pool
> * 4 messages from C are added to C's table, 1 thread delivers them and 3 return
> * The 1 message from D is added to D's table and the same thread is used to deliver the message up the stack to the application
> So while we receive 15 messages, effectively only 3 threads are needed to deliver them to the application: as these are regular messages, they need to be delivered in _sender order_.
> The 9 threads which process messages from B are only adding them to B's table and then return immediately. This causes increased context switching, plus more contention on B's table (which is synchronized), and possibly exhaustion of the thread pool. For example, if the pool's max size was only 10, then processing the first 10 messages from B would exhaust the table, and the other messages from C and D would be processed in newly spawned threads.
> SOLUTION
> * (Only applicable to _regular_ messages)
> * When a message (or batch) from sender P is received, we check if another thread is already passing up messages from B. If not, we pass the message up by grabbing a thread from the thread pool. This will add the message to P's table and deliver as many messages (from from the table) as possible to the application.
> * If there's currently a thread delivering P's message, we simply add the message (or batch) to a queue for P and return.
> * When the delivery thread returns, it checks the queue for P and delivers all queued messages, or returns if the queue is empty.
> * (The queue is actually a MessageBatch, and new messages are simply appended to it. On delivery, the batch is cleared)
> The effects of this for regular messages are
> * Fewer threads: the thread pool only has a max of <cluster-members> threads for regular messages where <cluster-members> is the number of members in the cluster from whom we are concurrently receiving messages. E.g. for a cluster \{A,B,C,D\}, if we're receiving messages at the same time from all members, then the max size is 4.
> ** Of course, OOB and internal messages, plus timer tasks will add to this number.
> * Less contention on the table for a given member: instead of 10 threads all adding their messages to B's table (contention on the table lock) and then CASing a boolean, only 1 thread ever adds and removes messages to/from the table. This means uncontended (= fast) lock acquisition for regular messages (of course, if we use OOB messages, then we do have contention).
> * Appending to a batch is much faster then adding to a table
> * The downside is that we're storing messages actually twice: once in the batch for P and once in P's table. But these are arrays of pointers, so not a lot of memory required.
> Example: the 10 threads for messages from B above, will create a batch of 9 messages in B's queue and grab 1 thread from the pool to deliver its message. When the thread is done, it will grab the message batch of 9 and also add it to the table and deliver it.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 10 months
[JBoss JIRA] (WFLY-7575) Elytron mechanism-provider-filtering-sasl-server-factory cannot be added without filters attribute in CLI
by Ilia Vassilev (JIRA)
[ https://issues.jboss.org/browse/WFLY-7575?page=com.atlassian.jira.plugin.... ]
Ilia Vassilev commented on WFLY-7575:
-------------------------------------
Implemented the option:
* {{filters}} should not be required, then {{mechanism-provider-filtering-sasl-server-factory}} must be able to be added without {{filters}} attribute - no exception should occur and {{mechanism-provider-filtering-sasl-server-factory}} should be stored in configuration
> Elytron mechanism-provider-filtering-sasl-server-factory cannot be added without filters attribute in CLI
> ---------------------------------------------------------------------------------------------------------
>
> Key: WFLY-7575
> URL: https://issues.jboss.org/browse/WFLY-7575
> Project: WildFly
> Issue Type: Bug
> Components: Security
> Affects Versions: 11.0.0.Alpha1
> Reporter: Ondrej Lukas
> Assignee: Ilia Vassilev
>
> Adding {{mechanism-provider-filtering-sasl-server-factory}} without {{filters}} attribute through CLI causes IllegalArgumentException. Exception is not thrown when {{filters}} attribute is used.
> It has to be decided whether:
> * {{filters}} should be required, then its nillable should be changed to false and related part of XSD should not include {{minOccurs="0"}}
> * {{filters}} should not be required, then {{mechanism-provider-filtering-sasl-server-factory}} must be able to be added without {{filters}} attribute - no exception should occur and {{mechanism-provider-filtering-sasl-server-factory}} should be stored in configuration
> Exception in server log:
> {code}
> ERROR [org.jboss.as.controller.management-operation] (management-handler-thread - 6) WFLYCTL0013: Operation ("add") failed - address: ([
> ("subsystem" => "elytron"),
> ("mechanism-provider-filtering-sasl-server-factory" => "factory")
> ]): java.lang.IllegalArgumentException
> at org.jboss.dmr.ModelValue.asList(ModelValue.java:143)
> at org.jboss.dmr.ModelNode.asList(ModelNode.java:1389)
> at org.wildfly.extension.elytron.SaslServerDefinitions$4.installService(SaslServerDefinitions.java:362)
> at org.wildfly.extension.elytron.SaslServerDefinitions$SaslServerAddHandler.performRuntime(SaslServerDefinitions.java:470)
> at org.jboss.as.controller.AbstractAddStepHandler.performRuntime(AbstractAddStepHandler.java:337)
> at org.jboss.as.controller.AbstractAddStepHandler$1.execute(AbstractAddStepHandler.java:151)
> at org.jboss.as.controller.AbstractOperationContext.executeStep(AbstractOperationContext.java:921)
> at org.jboss.as.controller.AbstractOperationContext.processStages(AbstractOperationContext.java:664)
> at org.jboss.as.controller.AbstractOperationContext.executeOperation(AbstractOperationContext.java:383)
> at org.jboss.as.controller.OperationContextImpl.executeOperation(OperationContextImpl.java:1364)
> at org.jboss.as.controller.ModelControllerImpl.internalExecute(ModelControllerImpl.java:416)
> at org.jboss.as.controller.ModelControllerImpl.lambda$execute$1(ModelControllerImpl.java:237)
> at org.wildfly.security.auth.client.PeerIdentity.runAsAll(PeerIdentity.java:431)
> at org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:206)
> at org.jboss.as.controller.ModelControllerImpl.execute(ModelControllerImpl.java:237)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler.doExecute(ModelControllerClientOperationHandler.java:217)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler.access$400(ModelControllerClientOperationHandler.java:137)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler$1$1.run(ModelControllerClientOperationHandler.java:161)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler$1$1.run(ModelControllerClientOperationHandler.java:157)
> at org.wildfly.security.auth.client.PeerIdentity.runAsAll(PeerIdentity.java:464)
> at org.wildfly.security.auth.server.SecurityIdentity.runAs(SecurityIdentity.java:225)
> at org.jboss.as.controller.AccessAuditContext.doAs(AccessAuditContext.java:185)
> at org.jboss.as.controller.remote.ModelControllerClientOperationHandler$ExecuteRequestHandler$1.execute(ModelControllerClientOperationHandler.java:157)
> at org.jboss.as.protocol.mgmt.ManagementRequestContextImpl$1.doExecute(ManagementRequestContextImpl.java:70)
> at org.jboss.as.protocol.mgmt.ManagementRequestContextImpl$AsyncTaskRunner.run(ManagementRequestContextImpl.java:160)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> at org.jboss.threads.JBossThread.run(JBossThread.java:320)
> {code}
> This issue is caused by fix of JBEAP-6381.
--
This message was sent by Atlassian JIRA
(v7.2.3#72005)
8 years, 10 months