[JBoss JIRA] (ISPN-8554) Automatically interrupt hung tests
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-8554?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-8554:
-------------------------------
Summary: Automatically interrupt hung tests (was: Automatically interrupt hanged tests)
> Automatically interrupt hung tests
> ----------------------------------
>
> Key: ISPN-8554
> URL: https://issues.jboss.org/browse/ISPN-8554
> Project: Infinispan
> Issue Type: Task
> Components: Test Suite - Core
> Affects Versions: 9.2.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 9.2.0.Beta2
>
>
> Tests should always use timed wait, e.g. {{future.get(10, SECONDS)}} instead of {{future.get()}}. However, it's really easy to use the wrong method, and if a deadlock happens the test suite will never finish.
> Jenkins builds do have a timeout, but the timeout kill procedure is not at all investigation-friendly: only the console output survives, the logs and test results don't.
> We can add a listener ({{IHookable}} in TestNG, because regular listeners are not always invoked on the same thread as the test method) to detect when a test has been running for too long and interrupt the test thread. We can even save a thread dump to make the investigation easier.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 1 month
[JBoss JIRA] (ISPN-8550) Try to estimate malloc overhead and add to memory based eviction
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-8550?page=com.atlassian.jira.plugin.... ]
Dan Berindei commented on ISPN-8550:
------------------------------------
[~william.burns] how did you use Valgrind? I wanted to reproduce your results and I tried the massif tool with {{--pages-as-heap=yes}} to capture the real allocation size, but the output was kind of hard to parse.
Eventually I found that {{$(which time) -f "%E %M"}} writes the peak RSS and I've been testing with that. After modifying the test to skip the warmup and take the size and count of allocations as parameters, this is what I got on my machine:
||alloc size||alloc count||time %M kbytes||est. entries kbytes||est. overhead 8 aligned at 16||diff||est. overhead 0 aligned at 32||diff||est.baseline kbytes||kbyte size||
|100000000|1|170336|97656.25|97656.265625|-0.015625|97656.25|0|72679.75|1024|
|1000000|1000|1053108|976562.5|976578.125|3850.125|976562.5|3865.75|
|1|1000000|104200|976.5625|15625|15895.25|31250|270.25|
|10|1000000|104160|9765.625|31250|230.25|31250|230.25|
|100|1000000|182524|97656.25|109375|469.25|125000|-15155.75|
|104|1000000|182180|101562.5|109375|125.25|125000|-15499.75|
|108|1000000|198268|105468.75|125000|588.25|125000|588.25|
|116|1000000|198060|113281.25|125000|380.25|125000|380.25|
|1000|1000000|1057348|976562.5|984375|293.25|1000000|-15331.75|
|1004|1000000|1073024|980468.75|1000000|344.25|1000000|344.25|
|1008|1000000|1073068|984375|1000000|388.25|1000000|388.25|
|1012|1000000|1073128|988281.25|1000000|448.25|1000000|448.25|
|1016|1000000|1072636|992187.5|1000000|-43.75|1000000|-43.75|
|1020|1000000|1088316|996093.75|1015625|11.25|1000000|15636.25|
|1024|1000000|1088564|1000000|1015625|259.25|1000000|15884.25|
So it looks like on my machine malloc adds 8 bytes and then rounds up the allocation to a multiple of 16, except the minimum size is 32 bytes.
This is my modified test:
{code:java}
public static void main(String[] args) throws InterruptedException {
int allocationSize = Integer.parseInt(args[0]);
int allocationCount = Integer.parseInt(args[1]);
System.out.println(allocationSize + " " + allocationCount);
for (int i = 0; i < allocationCount; ++i) {
long address = OffHeapMemory.INSTANCE.allocate(allocationSize);
for (int j = 0; j < allocationSize; j += 1024) {
OffHeapMemory.INSTANCE.putByte(address, j, (byte) 0);
}
}
// Make sure Valgrind/Massif sees a the peak here
long address = OffHeapMemory.INSTANCE.allocate(1);
OffHeapMemory.INSTANCE.free(address);
Thread.sleep(200);
}
{code}
> Try to estimate malloc overhead and add to memory based eviction
> ----------------------------------------------------------------
>
> Key: ISPN-8550
> URL: https://issues.jboss.org/browse/ISPN-8550
> Project: Infinispan
> Issue Type: Sub-task
> Reporter: William Burns
> Assignee: William Burns
> Fix For: 9.2.0.Beta2, 9.1.4.Final
>
>
> We should try to also estimate malloc overhead. We could do something like Dan mentioned at https://github.com/infinispan/infinispan/pull/5590#pullrequestreview-7805...
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 1 month
[JBoss JIRA] (ISPN-8550) Try to estimate malloc overhead and add to memory based eviction
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-8550?page=com.atlassian.jira.plugin.... ]
Dan Berindei edited comment on ISPN-8550 at 11/24/17 3:46 AM:
--------------------------------------------------------------
[~william.burns] how did you use Valgrind? I wanted to reproduce your results and I tried the massif tool with {{--pages-as-heap=yes}} to capture the real allocation size, but the output was kind of hard to parse.
Eventually I found that {{$(which time) -f "%E %M"}} writes the peak RSS and I've been testing with that. After modifying the test to skip the warmup and take the size and count of allocations as parameters, this is what I got on my machine:
||alloc size||alloc count||time %M kbytes||est. entries kbytes||est. overhead 8 aligned at 16||diff||est. overhead 0 aligned at 32||diff||est.baseline kbytes||kbyte size||
|100000000|1|170336|97656.25|97656.265625|-0.015625|97656.25|0|72679.75|1024|
|1000000|1000|1053108|976562.5|976578.125|3850.125|976562.5|3865.75|
|1|1000000|104200|976.5625|15625|15895.25|31250|270.25|
|10|1000000|104160|9765.625|31250|230.25|31250|230.25|
|100|1000000|182524|97656.25|109375|469.25|125000|-15155.75|
|104|1000000|182180|101562.5|109375|125.25|125000|-15499.75|
|108|1000000|198268|105468.75|125000|588.25|125000|588.25|
|116|1000000|198060|113281.25|125000|380.25|125000|380.25|
|1000|1000000|1057348|976562.5|984375|293.25|1000000|-15331.75|
|1004|1000000|1073024|980468.75|1000000|344.25|1000000|344.25|
|1008|1000000|1073068|984375|1000000|388.25|1000000|388.25|
|1012|1000000|1073128|988281.25|1000000|448.25|1000000|448.25|
|1016|1000000|1072636|992187.5|1000000|-43.75|1000000|-43.75|
|1020|1000000|1088316|996093.75|1015625|11.25|1000000|15636.25|
|1024|1000000|1088564|1000000|1015625|259.25|1000000|15884.25|
So it looks like on my machine malloc adds 8 bytes and then rounds up the allocation to a multiple of 16, except the minimum size is 32 bytes, and really big allocations are treated differently.
This is my modified test:
{code:java}
public static void main(String[] args) throws InterruptedException {
int allocationSize = Integer.parseInt(args[0]);
int allocationCount = Integer.parseInt(args[1]);
System.out.println(allocationSize + " " + allocationCount);
for (int i = 0; i < allocationCount; ++i) {
long address = OffHeapMemory.INSTANCE.allocate(allocationSize);
for (int j = 0; j < allocationSize; j += 1024) {
OffHeapMemory.INSTANCE.putByte(address, j, (byte) 0);
}
}
// Make sure Valgrind/Massif sees a the peak here
long address = OffHeapMemory.INSTANCE.allocate(1);
OffHeapMemory.INSTANCE.free(address);
Thread.sleep(200);
}
{code}
was (Author: dan.berindei):
[~william.burns] how did you use Valgrind? I wanted to reproduce your results and I tried the massif tool with {{--pages-as-heap=yes}} to capture the real allocation size, but the output was kind of hard to parse.
Eventually I found that {{$(which time) -f "%E %M"}} writes the peak RSS and I've been testing with that. After modifying the test to skip the warmup and take the size and count of allocations as parameters, this is what I got on my machine:
||alloc size||alloc count||time %M kbytes||est. entries kbytes||est. overhead 8 aligned at 16||diff||est. overhead 0 aligned at 32||diff||est.baseline kbytes||kbyte size||
|100000000|1|170336|97656.25|97656.265625|-0.015625|97656.25|0|72679.75|1024|
|1000000|1000|1053108|976562.5|976578.125|3850.125|976562.5|3865.75|
|1|1000000|104200|976.5625|15625|15895.25|31250|270.25|
|10|1000000|104160|9765.625|31250|230.25|31250|230.25|
|100|1000000|182524|97656.25|109375|469.25|125000|-15155.75|
|104|1000000|182180|101562.5|109375|125.25|125000|-15499.75|
|108|1000000|198268|105468.75|125000|588.25|125000|588.25|
|116|1000000|198060|113281.25|125000|380.25|125000|380.25|
|1000|1000000|1057348|976562.5|984375|293.25|1000000|-15331.75|
|1004|1000000|1073024|980468.75|1000000|344.25|1000000|344.25|
|1008|1000000|1073068|984375|1000000|388.25|1000000|388.25|
|1012|1000000|1073128|988281.25|1000000|448.25|1000000|448.25|
|1016|1000000|1072636|992187.5|1000000|-43.75|1000000|-43.75|
|1020|1000000|1088316|996093.75|1015625|11.25|1000000|15636.25|
|1024|1000000|1088564|1000000|1015625|259.25|1000000|15884.25|
So it looks like on my machine malloc adds 8 bytes and then rounds up the allocation to a multiple of 16, except the minimum size is 32 bytes.
This is my modified test:
{code:java}
public static void main(String[] args) throws InterruptedException {
int allocationSize = Integer.parseInt(args[0]);
int allocationCount = Integer.parseInt(args[1]);
System.out.println(allocationSize + " " + allocationCount);
for (int i = 0; i < allocationCount; ++i) {
long address = OffHeapMemory.INSTANCE.allocate(allocationSize);
for (int j = 0; j < allocationSize; j += 1024) {
OffHeapMemory.INSTANCE.putByte(address, j, (byte) 0);
}
}
// Make sure Valgrind/Massif sees a the peak here
long address = OffHeapMemory.INSTANCE.allocate(1);
OffHeapMemory.INSTANCE.free(address);
Thread.sleep(200);
}
{code}
> Try to estimate malloc overhead and add to memory based eviction
> ----------------------------------------------------------------
>
> Key: ISPN-8550
> URL: https://issues.jboss.org/browse/ISPN-8550
> Project: Infinispan
> Issue Type: Sub-task
> Reporter: William Burns
> Assignee: William Burns
> Fix For: 9.2.0.Beta2, 9.1.4.Final
>
>
> We should try to also estimate malloc overhead. We could do something like Dan mentioned at https://github.com/infinispan/infinispan/pull/5590#pullrequestreview-7805...
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 1 month
[JBoss JIRA] (ISPN-8554) Automatically interrupt hanged tests
by Dan Berindei (JIRA)
[ https://issues.jboss.org/browse/ISPN-8554?page=com.atlassian.jira.plugin.... ]
Dan Berindei updated ISPN-8554:
-------------------------------
Status: Open (was: New)
> Automatically interrupt hanged tests
> ------------------------------------
>
> Key: ISPN-8554
> URL: https://issues.jboss.org/browse/ISPN-8554
> Project: Infinispan
> Issue Type: Task
> Components: Test Suite - Core
> Affects Versions: 9.2.0.Beta1
> Reporter: Dan Berindei
> Assignee: Dan Berindei
> Fix For: 9.2.0.Beta2
>
>
> Tests should always use timed wait, e.g. {{future.get(10, SECONDS)}} instead of {{future.get()}}. However, it's really easy to use the wrong method, and if a deadlock happens the test suite will never finish.
> Jenkins builds do have a timeout, but the timeout kill procedure is not at all investigation-friendly: only the console output survives, the logs and test results don't.
> We can add a listener ({{IHookable}} in TestNG, because regular listeners are not always invoked on the same thread as the test method) to detect when a test has been running for too long and interrupt the test thread. We can even save a thread dump to make the investigation easier.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 1 month
[JBoss JIRA] (ISPN-8554) Automatically interrupt hanged tests
by Dan Berindei (JIRA)
Dan Berindei created ISPN-8554:
----------------------------------
Summary: Automatically interrupt hanged tests
Key: ISPN-8554
URL: https://issues.jboss.org/browse/ISPN-8554
Project: Infinispan
Issue Type: Task
Components: Test Suite - Core
Affects Versions: 9.2.0.Beta1
Reporter: Dan Berindei
Assignee: Dan Berindei
Fix For: 9.2.0.Beta2
Tests should always use timed wait, e.g. {{future.get(10, SECONDS)}} instead of {{future.get()}}. However, it's really easy to use the wrong method, and if a deadlock happens the test suite will never finish.
Jenkins builds do have a timeout, but the timeout kill procedure is not at all investigation-friendly: only the console output survives, the logs and test results don't.
We can add a listener ({{IHookable}} in TestNG, because regular listeners are not always invoked on the same thread as the test method) to detect when a test has been running for too long and interrupt the test thread. We can even save a thread dump to make the investigation easier.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 1 month
[JBoss JIRA] (ISPN-8452) KUBE_PING doesn't work with JDG 7.2 OpenShift image
by Ryan Emerson (JIRA)
[ https://issues.jboss.org/browse/ISPN-8452?page=com.atlassian.jira.plugin.... ]
Ryan Emerson resolved ISPN-8452.
--------------------------------
Resolution: Done
> KUBE_PING doesn't work with JDG 7.2 OpenShift image
> ---------------------------------------------------
>
> Key: ISPN-8452
> URL: https://issues.jboss.org/browse/ISPN-8452
> Project: Infinispan
> Issue Type: Bug
> Components: Cloud Integrations
> Reporter: Sebastian Łaskawiec
> Assignee: Sebastian Łaskawiec
> Priority: Blocker
>
> It seems that KUBE_PING doesn't work with the latest version of JDG 7.2 OpenShift image. The reason for this is that JDG 7.2 is shipped with upstream KUBE_PING which conflicts with the one provided in the openshift layer.
> The easiest solution is to revert layer ordering {{layers.conf}} file:
> {code}
> echo "layers=base,openshift" > /opt/datagrid/modules/layers.conf
> {code}
> This solution has additional advantage - we can fallback to previous KUBE_PING version (the one provided by the CE Team) by just changing this small line.
> However if the upstream KUBE_PING works fine and we are happy with it, we should remove the old version from the server. I think that's a solution we should take for JDG 8 or maybe sooner - 7.3.
--
This message was sent by Atlassian JIRA
(v7.5.0#75005)
7 years, 1 month