[infinispan-dev] Infinispan and change data capture

Thu Dec 15 12:15:49 EST 2016

> On 15 Dec 2016, at 15:59, Gustavo Fernandes <gustavo at infinispan.org> wrote:
> 
> 
> 
> On Thu, Dec 15, 2016 at 2:53 PM, Emmanuel Bernard <emmanuel at hibernate.org <mailto:emmanuel at hibernate.org>> wrote:
> 
>> On 15 Dec 2016, at 11:18, Gustavo Fernandes <gustavo at infinispan.org <mailto:gustavo at infinispan.org>> wrote:
>> 
>> On Thu, Dec 15, 2016 at 9:54 AM, Emmanuel Bernard <emmanuel at hibernate.org <mailto:emmanuel at hibernate.org>> wrote:
>> The goal is as followed: allow to collect all changes to push them to Debezium and thus Kafka.
>> 
>> This need does not require to remember all changes since the beginning of time in Infinispan. Just enough to:
>> - let Kafka catchup assuming it is the bottleneck
>> - let us not lose a change in Kafka when it happened in Infinispan (coordinator, owner, replicas dying)
>> 
>> The ability to read back history would then be handled by the Debezium / Kafka tail, not infinispan itself.
>> 
>> 
>> Having an embedded Debezium connector pushing everything to Kafka sounds cool, but what impact would it bring to the other stream consumers:
>> 
>> * Remote listeners, which is supported in several clients apart from Java
>> * Continuous Queries (the same)
>> * Spark Stream
>> * Other eventual 3rd party stream processors: Apache Flick, Storm, etc.
>> 
>>  
> 
> Impact as in perf impact? Potential redesign impact? Or are you thinking of another question?
> 
> 
> You mentioned that "The ability to read back history would then be handled by the Debezium / Kafka tail, not infinispan itself", my question
> was how the other consumers would get access to that history.

Yes that’s an interesting point.

First off here we are describing an ad-hoc model where we push changes to Debezium and then Kafka.
But the underlying temp queue mechanism I described on the Dec 9th email might be used to harden the code pushing changes to the sources you describe and that even improve the continuous queries engine and the Spark DStream integration I suppose.
Maybe we want a more generic mechanism relying on that temp queue system to plug a list of consumers. And focus on Spark Stream, Continuous queries and Debezium as a first set of “clients”.

For the ability to read back in history, I am happy to force consumers to go through a Kafka queue. As others pointed out, if we make Infinispan a durable queue system, we are making a different Infinispan than what it is today and this is probably undesirable.

Emmanuel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.jboss.org/pipermail/infinispan-dev/attachments/20161215/4d2a5bc2/attachment.html