On 15 Dec 2016, at 15:59, Gustavo Fernandes
<gustavo(a)infinispan.org> wrote:
On Thu, Dec 15, 2016 at 2:53 PM, Emmanuel Bernard <emmanuel(a)hibernate.org
<mailto:emmanuel@hibernate.org>> wrote:
> On 15 Dec 2016, at 11:18, Gustavo Fernandes <gustavo(a)infinispan.org
<mailto:gustavo@infinispan.org>> wrote:
>
> On Thu, Dec 15, 2016 at 9:54 AM, Emmanuel Bernard <emmanuel(a)hibernate.org
<mailto:emmanuel@hibernate.org>> wrote:
> The goal is as followed: allow to collect all changes to push them to Debezium and
thus Kafka.
>
> This need does not require to remember all changes since the beginning of time in
Infinispan. Just enough to:
> - let Kafka catchup assuming it is the bottleneck
> - let us not lose a change in Kafka when it happened in Infinispan (coordinator,
owner, replicas dying)
>
> The ability to read back history would then be handled by the Debezium / Kafka tail,
not infinispan itself.
>
>
> Having an embedded Debezium connector pushing everything to Kafka sounds cool, but
what impact would it bring to the other stream consumers:
>
> * Remote listeners, which is supported in several clients apart from Java
> * Continuous Queries (the same)
> * Spark Stream
> * Other eventual 3rd party stream processors: Apache Flick, Storm, etc.
>
>
Impact as in perf impact? Potential redesign impact? Or are you thinking of another
question?
You mentioned that "The ability to read back history would then be handled by the
Debezium / Kafka tail, not infinispan itself", my question
was how the other consumers would get access to that history.
Yes that’s an interesting point.
First off here we are describing an ad-hoc model where we push changes to Debezium and
then Kafka.
But the underlying temp queue mechanism I described on the Dec 9th email might be used to
harden the code pushing changes to the sources you describe and that even improve the
continuous queries engine and the Spark DStream integration I suppose.
Maybe we want a more generic mechanism relying on that temp queue system to plug a list of
consumers. And focus on Spark Stream, Continuous queries and Debezium as a first set of
“clients”.
For the ability to read back in history, I am happy to force consumers to go through a
Kafka queue. As others pointed out, if we make Infinispan a durable queue system, we are
making a different Infinispan than what it is today and this is probably undesirable.
Emmanuel