[JBoss JIRA] (ISPN-6706) Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
by Krzysztof Sobolewski (JIRA)
[ https://issues.jboss.org/browse/ISPN-6706?page=com.atlassian.jira.plugin.... ]
Krzysztof Sobolewski commented on ISPN-6706:
--------------------------------------------
Well, an index alone is not sufficient, the query needs to be replaced with a DELETE or SELECT that picks just the expired nodes, but I'll check it out :)
> Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
> --------------------------------------------------------------------------------------------
>
> Key: ISPN-6706
> URL: https://issues.jboss.org/browse/ISPN-6706
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Affects Versions: 8.2.2.Final
> Reporter: Krzysztof Sobolewski
>
> This issue arised when I was testing a cluster with about 16 million entries. Our configuration is that all the data is also kept in memory, so eviction is disabled in this cache. But expiration is enabled. During the test I noticed pauses that started small but increased while the test was progressing, reaching more than 20 seconds at one point. After ruling out maintenance tasks in MySQL that could interfere, I discovered that the pause is caused by the expiration thread purging the database for expired entries. This was a huge and unnecessary drag so I hacked Infinispan to skip the purge of persistent state in cases when it's likely to be redundant with purging the transient state. I say "likely" because entries evicted maually via the evict() call poke a huge hole in the underlying assumptions :) Anyway, our cluster no longer regularly pauses for half a minute, so here's something for your consideration.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years
[JBoss JIRA] (ISPN-6706) Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-6706?page=com.atlassian.jira.plugin.... ]
William Burns edited comment on ISPN-6706 at 5/25/16 9:51 AM:
--------------------------------------------------------------
Yeah that looks like the underlying issue. Can you try adding the index to the column and see if it fixes your issue? Either way it looks like we need a JIRA to add the index when creating the table.
was (Author: william.burns):
Yeah that looks like the underlying issue. Can you try adding the index to the column and see if it fixes your issue?
> Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
> --------------------------------------------------------------------------------------------
>
> Key: ISPN-6706
> URL: https://issues.jboss.org/browse/ISPN-6706
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Affects Versions: 8.2.2.Final
> Reporter: Krzysztof Sobolewski
>
> This issue arised when I was testing a cluster with about 16 million entries. Our configuration is that all the data is also kept in memory, so eviction is disabled in this cache. But expiration is enabled. During the test I noticed pauses that started small but increased while the test was progressing, reaching more than 20 seconds at one point. After ruling out maintenance tasks in MySQL that could interfere, I discovered that the pause is caused by the expiration thread purging the database for expired entries. This was a huge and unnecessary drag so I hacked Infinispan to skip the purge of persistent state in cases when it's likely to be redundant with purging the transient state. I say "likely" because entries evicted maually via the evict() call poke a huge hole in the underlying assumptions :) Anyway, our cluster no longer regularly pauses for half a minute, so here's something for your consideration.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years
[JBoss JIRA] (ISPN-6706) Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-6706?page=com.atlassian.jira.plugin.... ]
William Burns commented on ISPN-6706:
-------------------------------------
Yeah that looks like the underlying issue. Can you try adding the index to the column and see if it fixes your issue?
> Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
> --------------------------------------------------------------------------------------------
>
> Key: ISPN-6706
> URL: https://issues.jboss.org/browse/ISPN-6706
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Affects Versions: 8.2.2.Final
> Reporter: Krzysztof Sobolewski
>
> This issue arised when I was testing a cluster with about 16 million entries. Our configuration is that all the data is also kept in memory, so eviction is disabled in this cache. But expiration is enabled. During the test I noticed pauses that started small but increased while the test was progressing, reaching more than 20 seconds at one point. After ruling out maintenance tasks in MySQL that could interfere, I discovered that the pause is caused by the expiration thread purging the database for expired entries. This was a huge and unnecessary drag so I hacked Infinispan to skip the purge of persistent state in cases when it's likely to be redundant with purging the transient state. I say "likely" because entries evicted maually via the evict() call poke a huge hole in the underlying assumptions :) Anyway, our cluster no longer regularly pauses for half a minute, so here's something for your consideration.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years
[JBoss JIRA] (ISPN-6706) Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
by Krzysztof Sobolewski (JIRA)
[ https://issues.jboss.org/browse/ISPN-6706?page=com.atlassian.jira.plugin.... ]
Krzysztof Sobolewski commented on ISPN-6706:
--------------------------------------------
Right. An index on the "timestamp" column would help a lot, I think. Looks like a low-hanging fruit, too.
> Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
> --------------------------------------------------------------------------------------------
>
> Key: ISPN-6706
> URL: https://issues.jboss.org/browse/ISPN-6706
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Affects Versions: 8.2.2.Final
> Reporter: Krzysztof Sobolewski
>
> This issue arised when I was testing a cluster with about 16 million entries. Our configuration is that all the data is also kept in memory, so eviction is disabled in this cache. But expiration is enabled. During the test I noticed pauses that started small but increased while the test was progressing, reaching more than 20 seconds at one point. After ruling out maintenance tasks in MySQL that could interfere, I discovered that the pause is caused by the expiration thread purging the database for expired entries. This was a huge and unnecessary drag so I hacked Infinispan to skip the purge of persistent state in cases when it's likely to be redundant with purging the transient state. I say "likely" because entries evicted maually via the evict() call poke a huge hole in the underlying assumptions :) Anyway, our cluster no longer regularly pauses for half a minute, so here's something for your consideration.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years
[JBoss JIRA] (ISPN-6706) Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-6706?page=com.atlassian.jira.plugin.... ]
William Burns edited comment on ISPN-6706 at 5/25/16 9:42 AM:
--------------------------------------------------------------
If you are running a cluster I just noticed we don't have a very simple optimization of only running purge on the coordinator node when a shared store is used. This means that without this all nodes may be hitting the DB server at approximately the same time, which is probably why it is killing your DB response times. Another thing I would suggest is to limit how often the cache writer purge is performed in comparison to in memory (it could be ran every 5th memory purge for example).
And looking closer at the JPAStore and Jdbc*Store implementations they should be using a simple select that only returns expired entries. I am not sure if we have indexes on these fields or not though, I haven't looked too closely at these classes (which may be causing a full table scan).
Also the optimization you have right now could still be used but we would have to limit it as the following:
# Only supported with 1 cache loader/writer (note we can have multiple configured)
# Store state in the persistence manager to guarantee evict was never called
was (Author: william.burns):
If you are running a cluster I just noticed we don't have a very simple optimization of only running purge on the coordinator node when a shared store is used. This means that without this all nodes may be hitting the DB server at approximately the same time, which is probably why it is killing your DB response times. Another thing I would suggest is to limit how often the cache writer purge is performed in comparison to in memory (it could be ran every 5th memory purge for example).
And looking closer at the JPAStore and Jdbc*Store implementations they should be using a simple select that only returns expired entries. I am not sure if we have indexes on these fields or not though, I haven't looked too closely at these classes (which may be causing your full table scan).
Also the optimization you have right now could still be used but we would have to limit it as the following:
# Only supported with 1 cache loader/writer (note we can have multiple configured)
# Store state in the persistence manager to guarantee evict was never called
> Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
> --------------------------------------------------------------------------------------------
>
> Key: ISPN-6706
> URL: https://issues.jboss.org/browse/ISPN-6706
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Affects Versions: 8.2.2.Final
> Reporter: Krzysztof Sobolewski
>
> This issue arised when I was testing a cluster with about 16 million entries. Our configuration is that all the data is also kept in memory, so eviction is disabled in this cache. But expiration is enabled. During the test I noticed pauses that started small but increased while the test was progressing, reaching more than 20 seconds at one point. After ruling out maintenance tasks in MySQL that could interfere, I discovered that the pause is caused by the expiration thread purging the database for expired entries. This was a huge and unnecessary drag so I hacked Infinispan to skip the purge of persistent state in cases when it's likely to be redundant with purging the transient state. I say "likely" because entries evicted maually via the evict() call poke a huge hole in the underlying assumptions :) Anyway, our cluster no longer regularly pauses for half a minute, so here's something for your consideration.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years
[JBoss JIRA] (ISPN-6706) Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-6706?page=com.atlassian.jira.plugin.... ]
William Burns commented on ISPN-6706:
-------------------------------------
If you are running a cluster I just noticed we don't have a very simple optimization of only running purge on the coordinator node when a shared store is used. This means that without this all nodes may be hitting the DB server at approximately the same time, which is probably why it is killing your DB response times. Another thing I would suggest is to limit how often the cache writer purge is performed in comparison to in memory (it could be ran every 5th memory purge for example).
And looking closer at the JPAStore and Jdbc*Store implementations they should be using a simple select that only returns expired entries. I am not sure if we have indexes on these fields or not though, I haven't looked too closely at these classes (which may be causing your full table scan).
Also the optimization you have right now could still be used but we would have to limit it as the following:
# Only supported with 1 cache loader/writer (note we can have multiple configured)
# Store state in the persistence manager to guarantee evict was never called
> Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
> --------------------------------------------------------------------------------------------
>
> Key: ISPN-6706
> URL: https://issues.jboss.org/browse/ISPN-6706
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Affects Versions: 8.2.2.Final
> Reporter: Krzysztof Sobolewski
>
> This issue arised when I was testing a cluster with about 16 million entries. Our configuration is that all the data is also kept in memory, so eviction is disabled in this cache. But expiration is enabled. During the test I noticed pauses that started small but increased while the test was progressing, reaching more than 20 seconds at one point. After ruling out maintenance tasks in MySQL that could interfere, I discovered that the pause is caused by the expiration thread purging the database for expired entries. This was a huge and unnecessary drag so I hacked Infinispan to skip the purge of persistent state in cases when it's likely to be redundant with purging the transient state. I say "likely" because entries evicted maually via the evict() call poke a huge hole in the underlying assumptions :) Anyway, our cluster no longer regularly pauses for half a minute, so here's something for your consideration.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years
[JBoss JIRA] (ISPN-6706) Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
by Krzysztof Sobolewski (JIRA)
[ https://issues.jboss.org/browse/ISPN-6706?page=com.atlassian.jira.plugin.... ]
Krzysztof Sobolewski commented on ISPN-6706:
--------------------------------------------
OK, the configuration...
{code:java}
ConfigurationBuilder builder = new ConfigurationBuilder();
builder
.transaction().transactionMode(TransactionMode.NON_TRANSACTIONAL)
.clustering()
.cacheMode(CacheMode.DIST_ASYNC)
.hash()
.numOwners(2)
.unsafe().unreliableReturnValues(true)
.storeAsBinary().disable();
JdbcStringBasedStoreConfigurationBuilder storeBuilder = builder
.persistence()
.addStore(JdbcStringBasedStoreConfigurationBuilder.class)
.preload(true)
.dialect(DatabaseType.MYSQL)
.key2StringMapper(FooToStringMapper.class);
storeBuilder
.table()
.tableNamePrefix("foo")
.idColumnName("id")
.idColumnType("VARCHAR(255)")
.dataColumnName("value")
.dataColumnType("BLOB")
.timestampColumnName("stamp")
.timestampColumnType("BIGINT");
storeBuilder
.connectionPool()
.driverClass(com.mysql.jdbc.Driver.class.getName())
.connectionUrl("jdbc:mysql://localhost/bar?connectTimeout=4000&socketTimeout=1000&autoReconnectForPools=true")
.username("baz")
.password("quux");
{code}
There is only one node :) After the cluster is initialized the test is supposed to add more. But one's enough.
> Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
> --------------------------------------------------------------------------------------------
>
> Key: ISPN-6706
> URL: https://issues.jboss.org/browse/ISPN-6706
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Affects Versions: 8.2.2.Final
> Reporter: Krzysztof Sobolewski
>
> This issue arised when I was testing a cluster with about 16 million entries. Our configuration is that all the data is also kept in memory, so eviction is disabled in this cache. But expiration is enabled. During the test I noticed pauses that started small but increased while the test was progressing, reaching more than 20 seconds at one point. After ruling out maintenance tasks in MySQL that could interfere, I discovered that the pause is caused by the expiration thread purging the database for expired entries. This was a huge and unnecessary drag so I hacked Infinispan to skip the purge of persistent state in cases when it's likely to be redundant with purging the transient state. I say "likely" because entries evicted maually via the evict() call poke a huge hole in the underlying assumptions :) Anyway, our cluster no longer regularly pauses for half a minute, so here's something for your consideration.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years
[JBoss JIRA] (ISPN-6706) Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-6706?page=com.atlassian.jira.plugin.... ]
William Burns commented on ISPN-6706:
-------------------------------------
I am not trying to imply there isn't something we can do here, I think there is :)
If you could provide the configuration that might help me think of some things we could do here in a general way. Also how many nodes do you have in the cluster?
> Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
> --------------------------------------------------------------------------------------------
>
> Key: ISPN-6706
> URL: https://issues.jboss.org/browse/ISPN-6706
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Affects Versions: 8.2.2.Final
> Reporter: Krzysztof Sobolewski
>
> This issue arised when I was testing a cluster with about 16 million entries. Our configuration is that all the data is also kept in memory, so eviction is disabled in this cache. But expiration is enabled. During the test I noticed pauses that started small but increased while the test was progressing, reaching more than 20 seconds at one point. After ruling out maintenance tasks in MySQL that could interfere, I discovered that the pause is caused by the expiration thread purging the database for expired entries. This was a huge and unnecessary drag so I hacked Infinispan to skip the purge of persistent state in cases when it's likely to be redundant with purging the transient state. I say "likely" because entries evicted maually via the evict() call poke a huge hole in the underlying assumptions :) Anyway, our cluster no longer regularly pauses for half a minute, so here's something for your consideration.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years
[JBoss JIRA] (ISPN-6706) Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
by Krzysztof Sobolewski (JIRA)
[ https://issues.jboss.org/browse/ISPN-6706?page=com.atlassian.jira.plugin.... ]
Krzysztof Sobolewski commented on ISPN-6706:
--------------------------------------------
Well, I think that's the thing: it's IO-bound, and the test is write-heavy (basically I'm trying to initialize the cluster with a large amount of data) so it's IO-bound as well. The result is that the write operations wait on the cache writer until the database connection throws a socket timeout exception.
Now that I think of it, I should give MySQL some bigger buffers so that the database fits into memory. It's not terribly large (~2.5 GB) so this could do it. It's a test machine and I'm not a DBA so it didn't occur to me ;)
> Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
> --------------------------------------------------------------------------------------------
>
> Key: ISPN-6706
> URL: https://issues.jboss.org/browse/ISPN-6706
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Affects Versions: 8.2.2.Final
> Reporter: Krzysztof Sobolewski
>
> This issue arised when I was testing a cluster with about 16 million entries. Our configuration is that all the data is also kept in memory, so eviction is disabled in this cache. But expiration is enabled. During the test I noticed pauses that started small but increased while the test was progressing, reaching more than 20 seconds at one point. After ruling out maintenance tasks in MySQL that could interfere, I discovered that the pause is caused by the expiration thread purging the database for expired entries. This was a huge and unnecessary drag so I hacked Infinispan to skip the purge of persistent state in cases when it's likely to be redundant with purging the transient state. I say "likely" because entries evicted maually via the evict() call poke a huge hole in the underlying assumptions :) Anyway, our cluster no longer regularly pauses for half a minute, so here's something for your consideration.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years
[JBoss JIRA] (ISPN-6706) Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
by William Burns (JIRA)
[ https://issues.jboss.org/browse/ISPN-6706?page=com.atlassian.jira.plugin.... ]
William Burns commented on ISPN-6706:
-------------------------------------
Rereading what you wrote, I more understand your case. I would like to ask some more information though.
You said the cluster is pausing due to the purge, this seems problematic of itself. The purge shouldn't stop other concurrent operations and in most cases it will be very heavily IO bound so it shouldn't be taking up CPU. How did the pause adversely affect your test? Can you also provide your configuration?
> Purging cache writers is [mostly] redundant when eviction is disabled and preload is enabled
> --------------------------------------------------------------------------------------------
>
> Key: ISPN-6706
> URL: https://issues.jboss.org/browse/ISPN-6706
> Project: Infinispan
> Issue Type: Enhancement
> Components: Loaders and Stores
> Affects Versions: 8.2.2.Final
> Reporter: Krzysztof Sobolewski
>
> This issue arised when I was testing a cluster with about 16 million entries. Our configuration is that all the data is also kept in memory, so eviction is disabled in this cache. But expiration is enabled. During the test I noticed pauses that started small but increased while the test was progressing, reaching more than 20 seconds at one point. After ruling out maintenance tasks in MySQL that could interfere, I discovered that the pause is caused by the expiration thread purging the database for expired entries. This was a huge and unnecessary drag so I hacked Infinispan to skip the purge of persistent state in cases when it's likely to be redundant with purging the transient state. I say "likely" because entries evicted maually via the evict() call poke a huge hole in the underlying assumptions :) Anyway, our cluster no longer regularly pauses for half a minute, so here's something for your consideration.
--
This message was sent by Atlassian JIRA
(v6.4.11#64026)
8 years