Reliability of return values

Re: [infinispan-dev] CD datastore...

stabilize test suite

Radim Vansa

Monday, 12 May 2014 Mon, 12 May '14

2:21 a.m.

Hi, recently I've stumbled upon one already expected behaviour (one instance is [1]), but which did not got much attention. In non-tx cache, when the primary owner fails after the request has been replicated to backup owner, the request is retried in the new topology. Then, the operation is executed on the new primary (the previous backup). The outcome has been already fixed in [2], but the return value may be wrong. For example, when we do a put, the return value for the second attempt will be the currently inserted value (although the entry was just created). Same situation may happen for other operations. Currently, it's not possible to return the correct value (because it has already been overwritten and we don't keep a history of values), but shouldn't we rather throw an exception if we were not able to fulfil the API contract? Radim [1] https://issues.jboss.org/browse/ISPN-2956 [2] https://issues.jboss.org/browse/ISPN-3422 -- Radim Vansa <rvansa(a)redhat.com> JBoss DataGrid QA

Show replies by date

Dan Berindei

Monday, 12 May Mon, 12 May

3:37 a.m.

...

Sanne Grinovero

5:02 a.m.

...

Radim, I would contend that the first and foremost guarantee that put() makes is to leave the cache in a consistent state. So we can't just throw an exception and give up, leaving k=v on one owner and k=null on another. Secondly, put(k, v) being atomic means that it either succeeds, it writes k=v in the cache, and it returns the previous value, or it doesn't succeed, and it doesn't write k=v in the cache. Returning the wrong previous value is bad, but leaving k=v in the cache is just as bad, even if the all the owners have the same value. And last, we can't have one node seeing k=null, then k=v, then k=null again, when the only write we did on the cache was a put(k, v). So trying to undo the write would not help. In the end, we have to make a compromise, and I think returning the wrong value in some of the cases is a reasonable compromise. Of course, we should document that :) I also believe ISPN-2956 could be fixed so that HotRod behaves just like embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the HotRod protocol and to the cache itself. Incidentally, transactional caches have a similar problem when the originator leaves the cluster: ISPN-3421 [1] And we can't handle transactional caches any better than non-transactional caches until we expose transactions to the HotRod client. [1] https://issues.jboss.org/browse/ISPN-2956 Cheers Dan On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rvansa(a)redhat.com> wrote: > > Hi, > > recently I've stumbled upon one already expected behaviour (one instance > is [1]), but which did not got much attention. > > In non-tx cache, when the primary owner fails after the request has been > replicated to backup owner, the request is retried in the new topology. > Then, the operation is executed on the new primary (the previous > backup). The outcome has been already fixed in [2], but the return value > may be wrong. For example, when we do a put, the return value for the > second attempt will be the currently inserted value (although the entry > was just created). Same situation may happen for other operations. > > Currently, it's not possible to return the correct value (because it has > already been overwritten and we don't keep a history of values), but > shouldn't we rather throw an exception if we were not able to fulfil the > API contract? > > Radim > > [1] https://issues.jboss.org/browse/ISPN-2956 > [2] https://issues.jboss.org/browse/ISPN-3422 > > -- > Radim Vansa <rvansa(a)redhat.com> > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Radim Vansa

5:54 a.m.

@Dan: It's absolutely correct to do the further writes in order to make the cache consistent, I am not arguing against that. You've fixed the outcome (state of cache) well. My point was that we should let the user know that the value he gets is not 100% correct when we already know that - and given the API, the only option to do that seems to me as throwing an exception. @Sanne: I was not suggesting that for now - sure, value versioning is (I hope) on the roadmap. But that's more complicated, I though just about making an adjustment to the current implementation. Radim On 05/12/2014 12:02 PM, Sanne Grinovero wrote:

...

I don't think we are in a position to decide what is a reasonable compromise; we can do better. For example - as Radim suggested - it might seem reasonable to have the older value around for a little while. We'll need a little bit of history of values and tombstones anyway for many other reasons. Sanne On 12 May 2014 09:37, Dan Berindei <dan.berindei(a)gmail.com> wrote: > Radim, I would contend that the first and foremost guarantee that put() > makes is to leave the cache in a consistent state. So we can't just throw an > exception and give up, leaving k=v on one owner and k=null on another. > > Secondly, put(k, v) being atomic means that it either succeeds, it writes > k=v in the cache, and it returns the previous value, or it doesn't succeed, > and it doesn't write k=v in the cache. Returning the wrong previous value is > bad, but leaving k=v in the cache is just as bad, even if the all the owners > have the same value. > > And last, we can't have one node seeing k=null, then k=v, then k=null again, > when the only write we did on the cache was a put(k, v). So trying to undo > the write would not help. > > In the end, we have to make a compromise, and I think returning the wrong > value in some of the cases is a reasonable compromise. Of course, we should > document that :) > > I also believe ISPN-2956 could be fixed so that HotRod behaves just like > embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the HotRod > protocol and to the cache itself. > > Incidentally, transactional caches have a similar problem when the > originator leaves the cluster: ISPN-3421 [1] > And we can't handle transactional caches any better than non-transactional > caches until we expose transactions to the HotRod client. > > [1] https://issues.jboss.org/browse/ISPN-2956 > > Cheers > Dan > > > > > On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rvansa(a)redhat.com> wrote: >> Hi, >> >> recently I've stumbled upon one already expected behaviour (one instance >> is [1]), but which did not got much attention. >> >> In non-tx cache, when the primary owner fails after the request has been >> replicated to backup owner, the request is retried in the new topology. >> Then, the operation is executed on the new primary (the previous >> backup). The outcome has been already fixed in [2], but the return value >> may be wrong. For example, when we do a put, the return value for the >> second attempt will be the currently inserted value (although the entry >> was just created). Same situation may happen for other operations. >> >> Currently, it's not possible to return the correct value (because it has >> already been overwritten and we don't keep a history of values), but >> shouldn't we rather throw an exception if we were not able to fulfil the >> API contract? >> >> Radim >> >> [1] https://issues.jboss.org/browse/ISPN-2956 >> [2] https://issues.jboss.org/browse/ISPN-3422 >> >> -- >> Radim Vansa <rvansa(a)redhat.com> >> JBoss DataGrid QA >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- Radim Vansa <rvansa(a)redhat.com> JBoss DataGrid QA

Dan Berindei

Tuesday, 13 May Tue, 13 May

8:58 a.m.

On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rvansa(a)redhat.com> wrote:

...

The problem, as I see it, is that users also expect methods that throw an exception to *not* modify the cache. So we would break some of the users' expectations anyway.

...

@Sanne: I was not suggesting that for now - sure, value versioning is (I hope) on the roadmap. But that's more complicated, I though just about making an adjustment to the current implementation.

Actually, just keeping a history of values would not fix the the return value in all cases. When retrying a put on the new primary owner, the primary owner would still have to compare our value with the latest value, and return the previous value if they are equal. So we could have something like this: A is the originator, B is the primary owner, k = v0 A -> B: put(k, v1) B dies before writing v, C is now primary owner D -> C: put(k, v1) // another put operation from D, with the same value C -> D: null A -> C: retry_put(k, v1) C -> A: v0 // C assumes A is overwriting its own value, so it's returning the previous one To fix that, we'd need a unique version generated by the originator - kind of like a transaction id ;) And to fix the HotRod use case, the HotRod client would have to be the one generating the version. Cheers Dan

...

Radim On 05/12/2014 12:02 PM, Sanne Grinovero wrote: > I don't think we are in a position to decide what is a reasonable > compromise; we can do better. > For example - as Radim suggested - it might seem reasonable to have > the older value around for a little while. We'll need a little bit of > history of values and tombstones anyway for many other reasons. > > > Sanne > > On 12 May 2014 09:37, Dan Berindei <dan.berindei(a)gmail.com> wrote: >> Radim, I would contend that the first and foremost guarantee that put() >> makes is to leave the cache in a consistent state. So we can't just throw an >> exception and give up, leaving k=v on one owner and k=null on another. >> >> Secondly, put(k, v) being atomic means that it either succeeds, it writes >> k=v in the cache, and it returns the previous value, or it doesn't succeed, >> and it doesn't write k=v in the cache. Returning the wrong previous value is >> bad, but leaving k=v in the cache is just as bad, even if the all the owners >> have the same value. >> >> And last, we can't have one node seeing k=null, then k=v, then k=null again, >> when the only write we did on the cache was a put(k, v). So trying to undo >> the write would not help. >> >> In the end, we have to make a compromise, and I think returning the wrong >> value in some of the cases is a reasonable compromise. Of course, we should >> document that :) >> >> I also believe ISPN-2956 could be fixed so that HotRod behaves just like >> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the HotRod >> protocol and to the cache itself. >> >> Incidentally, transactional caches have a similar problem when the >> originator leaves the cluster: ISPN-3421 [1] >> And we can't handle transactional caches any better than non-transactional >> caches until we expose transactions to the HotRod client. >> >> [1] https://issues.jboss.org/browse/ISPN-2956 >> >> Cheers >> Dan >> >> >> >> >> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rvansa(a)redhat.com> wrote: >>> Hi, >>> >>> recently I've stumbled upon one already expected behaviour (one instance >>> is [1]), but which did not got much attention. >>> >>> In non-tx cache, when the primary owner fails after the request has been >>> replicated to backup owner, the request is retried in the new topology. >>> Then, the operation is executed on the new primary (the previous >>> backup). The outcome has been already fixed in [2], but the return value >>> may be wrong. For example, when we do a put, the return value for the >>> second attempt will be the currently inserted value (although the entry >>> was just created). Same situation may happen for other operations. >>> >>> Currently, it's not possible to return the correct value (because it has >>> already been overwritten and we don't keep a history of values), but >>> shouldn't we rather throw an exception if we were not able to fulfil the >>> API contract? >>> >>> Radim >>> >>> [1] https://issues.jboss.org/browse/ISPN-2956 >>> [2] https://issues.jboss.org/browse/ISPN-3422 >>> >>> -- >>> Radim Vansa <rvansa(a)redhat.com> >>> JBoss DataGrid QA >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa <rvansa(a)redhat.com> JBoss DataGrid QA _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Radim Vansa

10:40 a.m.

On 05/13/2014 03:58 PM, Dan Berindei wrote:

...

On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rvansa(a)redhat.com <mailto:rvansa@redhat.com>> wrote: @Dan: It's absolutely correct to do the further writes in order to make the cache consistent, I am not arguing against that. You've fixed the outcome (state of cache) well. My point was that we should let the user know that the value he gets is not 100% correct when we already know that - and given the API, the only option to do that seems to me as throwing an exception. The problem, as I see it, is that users also expect methods that throw an exception to *not* modify the cache. So we would break some of the users' expectations anyway.

When the response from primary owner does not arrive soon, we throw timeout exception and the cache is modified anyway, isn't it? If we throw ~ReturnValueUnreliableException, the user has at least some chance to react. Currently, for code requiring 100% reliable value, you can't do anything but ignore the return value, even for CAS operations.

...

Is it such a problem to associate unique ID with each write? History implementation seems to me like the more complicated part.

...

And to fix the HotRod use case, the HotRod client would have to be the one generating the version.

I agree. Radim

...

Cheers Dan Radim On 05/12/2014 12:02 PM, Sanne Grinovero wrote: > I don't think we are in a position to decide what is a reasonable > compromise; we can do better. > For example - as Radim suggested - it might seem reasonable to have > the older value around for a little while. We'll need a little bit of > history of values and tombstones anyway for many other reasons. > > > Sanne > > On 12 May 2014 09:37, Dan Berindei <dan.berindei(a)gmail.com <mailto:dan.berindei@gmail.com>> wrote: >> Radim, I would contend that the first and foremost guarantee that put() >> makes is to leave the cache in a consistent state. So we can't just throw an >> exception and give up, leaving k=v on one owner and k=null on another. >> >> Secondly, put(k, v) being atomic means that it either succeeds, it writes >> k=v in the cache, and it returns the previous value, or it doesn't succeed, >> and it doesn't write k=v in the cache. Returning the wrong previous value is >> bad, but leaving k=v in the cache is just as bad, even if the all the owners >> have the same value. >> >> And last, we can't have one node seeing k=null, then k=v, then k=null again, >> when the only write we did on the cache was a put(k, v). So trying to undo >> the write would not help. >> >> In the end, we have to make a compromise, and I think returning the wrong >> value in some of the cases is a reasonable compromise. Of course, we should >> document that :) >> >> I also believe ISPN-2956 could be fixed so that HotRod behaves just like >> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the HotRod >> protocol and to the cache itself. >> >> Incidentally, transactional caches have a similar problem when the >> originator leaves the cluster: ISPN-3421 [1] >> And we can't handle transactional caches any better than non-transactional >> caches until we expose transactions to the HotRod client. >> >> [1] https://issues.jboss.org/browse/ISPN-2956 >> >> Cheers >> Dan >> >> >> >> >> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rvansa(a)redhat.com <mailto:rvansa@redhat.com>> wrote: >>> Hi, >>> >>> recently I've stumbled upon one already expected behaviour (one instance >>> is [1]), but which did not got much attention. >>> >>> In non-tx cache, when the primary owner fails after the request has been >>> replicated to backup owner, the request is retried in the new topology. >>> Then, the operation is executed on the new primary (the previous >>> backup). The outcome has been already fixed in [2], but the return value >>> may be wrong. For example, when we do a put, the return value for the >>> second attempt will be the currently inserted value (although the entry >>> was just created). Same situation may happen for other operations. >>> >>> Currently, it's not possible to return the correct value (because it has >>> already been overwritten and we don't keep a history of values), but >>> shouldn't we rather throw an exception if we were not able to fulfil the >>> API contract? >>> >>> Radim >>> >>> [1] https://issues.jboss.org/browse/ISPN-2956 >>> [2] https://issues.jboss.org/browse/ISPN-3422 >>> >>> -- >>> Radim Vansa <rvansa(a)redhat.com <mailto:rvansa@redhat.com>> >>> JBoss DataGrid QA >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org <mailto:infinispan-dev@lists.jboss.org> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org <mailto:infinispan-dev@lists.jboss.org> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org <mailto:infinispan-dev@lists.jboss.org> > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa <rvansa(a)redhat.com <mailto:rvansa@redhat.com>> JBoss DataGrid QA _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org <mailto:infinispan-dev@lists.jboss.org> https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- Radim Vansa <rvansa(a)redhat.com> JBoss DataGrid QA

Dan Berindei

Wednesday, 14 May Wed, 14 May

2:36 a.m.

On Tue, May 13, 2014 at 6:40 PM, Radim Vansa <rvansa(a)redhat.com> wrote:

...

On 05/13/2014 03:58 PM, Dan Berindei wrote: On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rvansa(a)redhat.com> wrote: > @Dan: It's absolutely correct to do the further writes in order to make > the cache consistent, I am not arguing against that. You've fixed the > outcome (state of cache) well. My point was that we should let the user > know that the value he gets is not 100% correct when we already know > that - and given the API, the only option to do that seems to me as > throwing an exception. > The problem, as I see it, is that users also expect methods that throw an exception to *not* modify the cache. So we would break some of the users' expectations anyway. When the response from primary owner does not arrive soon, we throw timeout exception and the cache is modified anyway, isn't it? If we throw ~ReturnValueUnreliableException, the user has at least some chance to react. Currently, for code requiring 100% reliable value, you can't do anything but ignore the return value, even for CAS operations.

Yes, but we don't expect the user to handle a TimeoutException in any meaningful way. Instead, we expect the user to choose his hardware and configuration to avoid timeouts, if he cares about consistency. How could you handle an exception that tells you "I may have written the value you asked me to in the cache, or maybe not. Either way, you will never know what the previous value was. Muahahaha!" in an application that cares about consistency? But the proposed ReturnValueUnreliableException can't be avoided by the user, it has to be handled every time the cluster membership changes. So it would be more like WriteSkewException than TimeoutException. And when we throw a WriteSkewException, we don't write anything to the cache. Remember, most users do not care about the previous value at all - that's the reason why JCache and our HotRod client don't return the previous value by default. Those that do care about the previous value, use the conditional write operations, and those already work (well, except for the scenario below). So you would force everyone to handle an exception that they don't care about. It would make sense to throw an exception if we didn't return the previous value by default, and the user requested the return value explicitly. But we do return the value by default, so I don't think it would be a good idea for us.

...

> > @Sanne: I was not suggesting that for now - sure, value versioning is (I > hope) on the roadmap. But that's more complicated, I though just about > making an adjustment to the current implementation. > Actually, just keeping a history of values would not fix the the return value in all cases. When retrying a put on the new primary owner, the primary owner would still have to compare our value with the latest value, and return the previous value if they are equal. So we could have something like this: A is the originator, B is the primary owner, k = v0 A -> B: put(k, v1) B dies before writing v, C is now primary owner D -> C: put(k, v1) // another put operation from D, with the same value C -> D: null A -> C: retry_put(k, v1) C -> A: v0 // C assumes A is overwriting its own value, so it's returning the previous one To fix that, we'd need a unique version generated by the originator - kind of like a transaction id ;) Is it such a problem to associate unique ID with each write? History implementation seems to me like the more complicated part.

I also think maintaining a version history would be quite complicated, and it also would make it harder for users to estimate their cache's memory usage. That's why I was trying to show that it's not a panacea.

...

And to fix the HotRod use case, the HotRod client would have to be the one generating the version. I agree. Radim Cheers Dan > Radim > > On 05/12/2014 12:02 PM, Sanne Grinovero wrote: > > I don't think we are in a position to decide what is a reasonable > > compromise; we can do better. > > For example - as Radim suggested - it might seem reasonable to have > > the older value around for a little while. We'll need a little bit of > > history of values and tombstones anyway for many other reasons. > > > > > > Sanne > > > > On 12 May 2014 09:37, Dan Berindei <dan.berindei(a)gmail.com> wrote: > >> Radim, I would contend that the first and foremost guarantee that put() > >> makes is to leave the cache in a consistent state. So we can't just > throw an > >> exception and give up, leaving k=v on one owner and k=null on another. > >> > >> Secondly, put(k, v) being atomic means that it either succeeds, it > writes > >> k=v in the cache, and it returns the previous value, or it doesn't > succeed, > >> and it doesn't write k=v in the cache. Returning the wrong previous > value is > >> bad, but leaving k=v in the cache is just as bad, even if the all the > owners > >> have the same value. > >> > >> And last, we can't have one node seeing k=null, then k=v, then k=null > again, > >> when the only write we did on the cache was a put(k, v). So trying to > undo > >> the write would not help. > >> > >> In the end, we have to make a compromise, and I think returning the > wrong > >> value in some of the cases is a reasonable compromise. Of course, we > should > >> document that :) > >> > >> I also believe ISPN-2956 could be fixed so that HotRod behaves just > like > >> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the > HotRod > >> protocol and to the cache itself. > >> > >> Incidentally, transactional caches have a similar problem when the > >> originator leaves the cluster: ISPN-3421 [1] > >> And we can't handle transactional caches any better than > non-transactional > >> caches until we expose transactions to the HotRod client. > >> > >> [1] https://issues.jboss.org/browse/ISPN-2956 > >> > >> Cheers > >> Dan > >> > >> > >> > >> > >> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rvansa(a)redhat.com> > wrote: > >>> Hi, > >>> > >>> recently I've stumbled upon one already expected behaviour (one > instance > >>> is [1]), but which did not got much attention. > >>> > >>> In non-tx cache, when the primary owner fails after the request has > been > >>> replicated to backup owner, the request is retried in the new > topology. > >>> Then, the operation is executed on the new primary (the previous > >>> backup). The outcome has been already fixed in [2], but the return > value > >>> may be wrong. For example, when we do a put, the return value for the > >>> second attempt will be the currently inserted value (although the > entry > >>> was just created). Same situation may happen for other operations. > >>> > >>> Currently, it's not possible to return the correct value (because it > has > >>> already been overwritten and we don't keep a history of values), but > >>> shouldn't we rather throw an exception if we were not able to fulfil > the > >>> API contract? > >>> > >>> Radim > >>> > >>> [1] https://issues.jboss.org/browse/ISPN-2956 > >>> [2] https://issues.jboss.org/browse/ISPN-3422 > >>> > >>> -- > >>> Radim Vansa <rvansa(a)redhat.com> > >>> JBoss DataGrid QA > >>> > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> infinispan-dev(a)lists.jboss.org > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev(a)lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev(a)lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa <rvansa(a)redhat.com> > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ infinispan-dev mailing listinfinispan-dev@lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa <rvansa(a)redhat.com> <rvansa(a)redhat.com> JBoss DataGrid QA _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Galder Zamarreño

Monday, 26 May Mon, 26 May

11:11 a.m.

...

On Tue, May 13, 2014 at 6:40 PM, Radim Vansa <rvansa(a)redhat.com> wrote: On 05/13/2014 03:58 PM, Dan Berindei wrote: > > > On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rvansa(a)redhat.com> wrote: > @Dan: It's absolutely correct to do the further writes in order to make > the cache consistent, I am not arguing against that. You've fixed the > outcome (state of cache) well. My point was that we should let the user > know that the value he gets is not 100% correct when we already know > that - and given the API, the only option to do that seems to me as > throwing an exception. > > The problem, as I see it, is that users also expect methods that throw an exception to *not* modify the cache. > So we would break some of the users' expectations anyway. When the response from primary owner does not arrive soon, we throw timeout exception and the cache is modified anyway, isn't it? If we throw ~ReturnValueUnreliableException, the user has at least some chance to react. Currently, for code requiring 100% reliable value, you can't do anything but ignore the return value, even for CAS operations. Yes, but we don't expect the user to handle a TimeoutException in any meaningful way. Instead, we expect the user to choose his hardware and configuration to avoid timeouts, if he cares about consistency. How could you handle an exception that tells you "I may have written the value you asked me to in the cache, or maybe not. Either way, you will never know what the previous value was. Muahahaha!" in an application that cares about consistency? But the proposed ReturnValueUnreliableException can't be avoided by the user, it has to be handled every time the cluster membership changes. So it would be more like WriteSkewException than TimeoutException. And when we throw a WriteSkewException, we don't write anything to the cache. Remember, most users do not care about the previous value at all - that's the reason why JCache and our HotRod client don't return the previous value by default. Those that do care about the previous value, use the conditional write operations, and those already work (well, except for the scenario below). So you would force everyone to handle an exception that they don't care about. It would make sense to throw an exception if we didn't return the previous value by default, and the user requested the return value explicitly. But we do return the value by default, so I don't think it would be a good idea for us. > > > @Sanne: I was not suggesting that for now - sure, value versioning is (I > hope) on the roadmap. But that's more complicated, I though just about > making an adjustment to the current implementation. > > > Actually, just keeping a history of values would not fix the the return value in all cases. > > When retrying a put on the new primary owner, the primary owner would still have to compare our value with the latest value, and return the previous value if they are equal. So we could have something like this: > > A is the originator, B is the primary owner, k = v0 > A -> B: put(k, v1) > B dies before writing v, C is now primary owner > D -> C: put(k, v1) // another put operation from D, with the same value > C -> D: null > A -> C: retry_put(k, v1) > C -> A: v0 // C assumes A is overwriting its own value, so it's returning the previous one > > To fix that, we'd need a unique version generated by the originator - kind of like a transaction id ;) Is it such a problem to associate unique ID with each write? History implementation seems to me like the more complicated part. I also think maintaining a version history would be quite complicated, and it also would make it harder for users to estimate their cache's memory usage. That's why I was trying to show that it's not a panacea. > And to fix the HotRod use case, the HotRod client would have to be the one generating the version. I agree. Radim > > Cheers > Dan > > > > Radim > > On 05/12/2014 12:02 PM, Sanne Grinovero wrote: > > I don't think we are in a position to decide what is a reasonable > > compromise; we can do better. > > For example - as Radim suggested - it might seem reasonable to have > > the older value around for a little while. We'll need a little bit of > > history of values and tombstones anyway for many other reasons. > > > > > > Sanne > > > > On 12 May 2014 09:37, Dan Berindei <dan.berindei(a)gmail.com> wrote: > >> Radim, I would contend that the first and foremost guarantee that put() > >> makes is to leave the cache in a consistent state. So we can't just throw an > >> exception and give up, leaving k=v on one owner and k=null on another. > >> > >> Secondly, put(k, v) being atomic means that it either succeeds, it writes > >> k=v in the cache, and it returns the previous value, or it doesn't succeed, > >> and it doesn't write k=v in the cache. Returning the wrong previous value is > >> bad, but leaving k=v in the cache is just as bad, even if the all the owners > >> have the same value. > >> > >> And last, we can't have one node seeing k=null, then k=v, then k=null again, > >> when the only write we did on the cache was a put(k, v). So trying to undo > >> the write would not help. > >> > >> In the end, we have to make a compromise, and I think returning the wrong > >> value in some of the cases is a reasonable compromise. Of course, we should > >> document that :) > >> > >> I also believe ISPN-2956 could be fixed so that HotRod behaves just like > >> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the HotRod > >> protocol and to the cache itself. > >> > >> Incidentally, transactional caches have a similar problem when the > >> originator leaves the cluster: ISPN-3421 [1] > >> And we can't handle transactional caches any better than non-transactional > >> caches until we expose transactions to the HotRod client. > >> > >> [1] https://issues.jboss.org/browse/ISPN-2956 > >> > >> Cheers > >> Dan > >> > >> > >> > >> > >> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rvansa(a)redhat.com> wrote: > >>> Hi, > >>> > >>> recently I've stumbled upon one already expected behaviour (one instance > >>> is [1]), but which did not got much attention. > >>> > >>> In non-tx cache, when the primary owner fails after the request has been > >>> replicated to backup owner, the request is retried in the new topology. > >>> Then, the operation is executed on the new primary (the previous > >>> backup). The outcome has been already fixed in [2], but the return value > >>> may be wrong. For example, when we do a put, the return value for the > >>> second attempt will be the currently inserted value (although the entry > >>> was just created). Same situation may happen for other operations. > >>> > >>> Currently, it's not possible to return the correct value (because it has > >>> already been overwritten and we don't keep a history of values), but > >>> shouldn't we rather throw an exception if we were not able to fulfil the > >>> API contract? > >>> > >>> Radim > >>> > >>> [1] https://issues.jboss.org/browse/ISPN-2956 > >>> [2] https://issues.jboss.org/browse/ISPN-3422 > >>> > >>> -- > >>> Radim Vansa <rvansa(a)redhat.com> > >>> JBoss DataGrid QA > >>> > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> infinispan-dev(a)lists.jboss.org > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev(a)lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev(a)lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa <rvansa(a)redhat.com> > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa <rvansa(a)redhat.com> JBoss DataGrid QA _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- Galder Zamarreño galder(a)redhat.com twitter.com/galderz

Galder Zamarreño

Thursday, 12 June Thu, 12 Jun

8:54 a.m.

Hi all, I’m working on the implementation of this, and the solution noted in the JIRA does not work for situations where you have to return a previous value that might have been overriden due to partial operation application. Example (assuming 1 owner only): 1. remote client does a put(“key”, 1) in Node-A 2. remote client does a replace(“key”, 2) on Node-A but the operation fails to replicate and gets partially applieed in Node-A only. 3. remote client, seeing the replace failed, retries the replace(“key”, 2) in Node-B. replace underneath finds that the previous value of “key” is 2 (since it got partially applied in Node-A), so it succeeds but the previous value returned is 2, which is wrong. The previous value should have been 1, but this value is gone… In my view, there is two ways to solve this issue: 1. Make Hot Rod caches transactional. By doign so, operations are not partially applied. They’re done fully cluster wide or they’re rolled back. I’ve verified this and the test passes once you make the cache transactional. The downside is of course performance. IOW, anyone using conditional operations, or relying on previous values, would need transactional caches. This should work well with the retry mechanism being implemented for ISPN-2956, which is still needed. A big question here is whether Hot Rod caches should be transactional by default or viceversa. If they’re transactional, our performance will go down for sure but we won’t have this type of issues (with retry in place). If you’re not transactional, you are faster but you’re exposed to these edge cases, and you need to consider them when deploying your app, something people might miss, although we could provide WARN messages when conditional operations or Flag.FORCE_RETURN_VALUE is used with non-transactional caches. 2. Get rid of returning previous value in the Hot Rod protocol for modifying operations. For conditional operations, returning true/false is at least enough to see if the condition was applied. So, replaceIfUnmodified/replace/remove(conditional), they would only return true/false. This would be complicated due to reliance on Map/ConcurrentMap APIs. Maybe something to consider for when we stop relying on JDK APIs. I also considered applying corrective actions but that’s very messy and prone to concurrency issues, so I quickly discarded that. Any other options? Thoughts on the options above? Cheers, On 26 May 2014, at 18:11, Galder Zamarreño <galder(a)redhat.com> wrote:

...

Hi all, I’ve been looking into ISPN-2956 last week and I think we have a solution for it which requires a protocol change [1] Since we’re in the middle of the Hot Rod 2.0 development, this is a good opportunity to implement it. Cheers, [1] https://issues.jboss.org/browse/ISPN-2956?focusedCommentId=12970541&p... On 14 May 2014, at 09:36, Dan Berindei <dan.berindei(a)gmail.com> wrote: > > > > On Tue, May 13, 2014 at 6:40 PM, Radim Vansa <rvansa(a)redhat.com> wrote: > On 05/13/2014 03:58 PM, Dan Berindei wrote: >> >> >> On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rvansa(a)redhat.com> wrote: >> @Dan: It's absolutely correct to do the further writes in order to make >> the cache consistent, I am not arguing against that. You've fixed the >> outcome (state of cache) well. My point was that we should let the user >> know that the value he gets is not 100% correct when we already know >> that - and given the API, the only option to do that seems to me as >> throwing an exception. >> >> The problem, as I see it, is that users also expect methods that throw an exception to *not* modify the cache. >> So we would break some of the users' expectations anyway. > > When the response from primary owner does not arrive soon, we throw timeout exception and the cache is modified anyway, isn't it? > If we throw ~ReturnValueUnreliableException, the user has at least some chance to react. Currently, for code requiring 100% reliable value, you can't do anything but ignore the return value, even for CAS operations. > > > Yes, but we don't expect the user to handle a TimeoutException in any meaningful way. Instead, we expect the user to choose his hardware and configuration to avoid timeouts, if he cares about consistency. How could you handle an exception that tells you "I may have written the value you asked me to in the cache, or maybe not. Either way, you will never know what the previous value was. Muahahaha!" in an application that cares about consistency? > > But the proposed ReturnValueUnreliableException can't be avoided by the user, it has to be handled every time the cluster membership changes. So it would be more like WriteSkewException than TimeoutException. And when we throw a WriteSkewException, we don't write anything to the cache. > > Remember, most users do not care about the previous value at all - that's the reason why JCache and our HotRod client don't return the previous value by default. Those that do care about the previous value, use the conditional write operations, and those already work (well, except for the scenario below). So you would force everyone to handle an exception that they don't care about. > > It would make sense to throw an exception if we didn't return the previous value by default, and the user requested the return value explicitly. But we do return the value by default, so I don't think it would be a good idea for us. > >> >> >> @Sanne: I was not suggesting that for now - sure, value versioning is (I >> hope) on the roadmap. But that's more complicated, I though just about >> making an adjustment to the current implementation. >> >> >> Actually, just keeping a history of values would not fix the the return value in all cases. >> >> When retrying a put on the new primary owner, the primary owner would still have to compare our value with the latest value, and return the previous value if they are equal. So we could have something like this: >> >> A is the originator, B is the primary owner, k = v0 >> A -> B: put(k, v1) >> B dies before writing v, C is now primary owner >> D -> C: put(k, v1) // another put operation from D, with the same value >> C -> D: null >> A -> C: retry_put(k, v1) >> C -> A: v0 // C assumes A is overwriting its own value, so it's returning the previous one >> >> To fix that, we'd need a unique version generated by the originator - kind of like a transaction id ;) > > Is it such a problem to associate unique ID with each write? History implementation seems to me like the more complicated part. > > I also think maintaining a version history would be quite complicated, and it also would make it harder for users to estimate their cache's memory usage. That's why I was trying to show that it's not a panacea. > > > >> And to fix the HotRod use case, the HotRod client would have to be the one generating the version. > > I agree. > > Radim > > >> >> Cheers >> Dan >> >> >> >> Radim >> >> On 05/12/2014 12:02 PM, Sanne Grinovero wrote: >>> I don't think we are in a position to decide what is a reasonable >>> compromise; we can do better. >>> For example - as Radim suggested - it might seem reasonable to have >>> the older value around for a little while. We'll need a little bit of >>> history of values and tombstones anyway for many other reasons. >>> >>> >>> Sanne >>> >>> On 12 May 2014 09:37, Dan Berindei <dan.berindei(a)gmail.com> wrote: >>>> Radim, I would contend that the first and foremost guarantee that put() >>>> makes is to leave the cache in a consistent state. So we can't just throw an >>>> exception and give up, leaving k=v on one owner and k=null on another. >>>> >>>> Secondly, put(k, v) being atomic means that it either succeeds, it writes >>>> k=v in the cache, and it returns the previous value, or it doesn't succeed, >>>> and it doesn't write k=v in the cache. Returning the wrong previous value is >>>> bad, but leaving k=v in the cache is just as bad, even if the all the owners >>>> have the same value. >>>> >>>> And last, we can't have one node seeing k=null, then k=v, then k=null again, >>>> when the only write we did on the cache was a put(k, v). So trying to undo >>>> the write would not help. >>>> >>>> In the end, we have to make a compromise, and I think returning the wrong >>>> value in some of the cases is a reasonable compromise. Of course, we should >>>> document that :) >>>> >>>> I also believe ISPN-2956 could be fixed so that HotRod behaves just like >>>> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the HotRod >>>> protocol and to the cache itself. >>>> >>>> Incidentally, transactional caches have a similar problem when the >>>> originator leaves the cluster: ISPN-3421 [1] >>>> And we can't handle transactional caches any better than non-transactional >>>> caches until we expose transactions to the HotRod client. >>>> >>>> [1] https://issues.jboss.org/browse/ISPN-2956 >>>> >>>> Cheers >>>> Dan >>>> >>>> >>>> >>>> >>>> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rvansa(a)redhat.com> wrote: >>>>> Hi, >>>>> >>>>> recently I've stumbled upon one already expected behaviour (one instance >>>>> is [1]), but which did not got much attention. >>>>> >>>>> In non-tx cache, when the primary owner fails after the request has been >>>>> replicated to backup owner, the request is retried in the new topology. >>>>> Then, the operation is executed on the new primary (the previous >>>>> backup). The outcome has been already fixed in [2], but the return value >>>>> may be wrong. For example, when we do a put, the return value for the >>>>> second attempt will be the currently inserted value (although the entry >>>>> was just created). Same situation may happen for other operations. >>>>> >>>>> Currently, it's not possible to return the correct value (because it has >>>>> already been overwritten and we don't keep a history of values), but >>>>> shouldn't we rather throw an exception if we were not able to fulfil the >>>>> API contract? >>>>> >>>>> Radim >>>>> >>>>> [1] https://issues.jboss.org/browse/ISPN-2956 >>>>> [2] https://issues.jboss.org/browse/ISPN-3422 >>>>> >>>>> -- >>>>> Radim Vansa <rvansa(a)redhat.com> >>>>> JBoss DataGrid QA >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Radim Vansa <rvansa(a)redhat.com> >> JBoss DataGrid QA >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa > <rvansa(a)redhat.com> > > JBoss DataGrid QA > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño galder(a)redhat.com twitter.com/galderz _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- Galder Zamarreño galder(a)redhat.com twitter.com/galderz

Dan Berindei

Monday, 16 June Mon, 16 Jun

9:57 a.m.

On Thu, Jun 12, 2014 at 4:54 PM, Galder Zamarreño <galder(a)redhat.com> wrote:

...

The same problem [1] can appear in embedded caches. So if there's an argument to make HotRod caches transactional by default, it should apply to embedded caches as well. I like the idea of logging a warning when the FORCE_RETURN_VALUE or the non-versioned conditional operations are used from HotRod. But we have to be careful with the wording, because inconsistencies can appear in transactional mode as well [2]. [1] https://issues.jboss.org/browse/ISPN-4286 [2] https://issues.jboss.org/browse/ISPN-3421

...

2. Get rid of returning previous value in the Hot Rod protocol for modifying operations. For conditional operations, returning true/false is at least enough to see if the condition was applied. So, replaceIfUnmodified/replace/remove(conditional), they would only return true/false. This would be complicated due to reliance on Map/ConcurrentMap APIs. Maybe something to consider for when we stop relying on JDK APIs.

I'm not sure we can completely get rid of the return values, even though JCache doesn't extend Map it still has a getAndPut method.

...

I also considered applying corrective actions but that’s very messy and prone to concurrency issues, so I quickly discarded that.

Yeah, rolling back partial modifications is definitely not an option.

...

Any other options? Thoughts on the options above?

I was waiting for Sanne to say this, but how about keeping a version history? If we had the chain of values/versions for each key, we could look up the version of the current operation in the chain when retrying. If it's there, we could infer that the operation was already applied, and return the previous value in the chain as the previous version. Of course, there are the usual downsides: 1. Maintaining the history might be tricky, especially around state transfers. 2. Performance will also be affected, maybe getting closer to the tx performance.

...

Cheers, On 26 May 2014, at 18:11, Galder Zamarreño <galder(a)redhat.com> wrote: > Hi all, > > I’ve been looking into ISPN-2956 last week and I think we have a solution for it which requires a protocol change [1] > > Since we’re in the middle of the Hot Rod 2.0 development, this is a good opportunity to implement it. > > Cheers, > > [1] https://issues.jboss.org/browse/ISPN-2956?focusedCommentId=12970541&p... > > On 14 May 2014, at 09:36, Dan Berindei <dan.berindei(a)gmail.com> wrote: > >> >> >> >> On Tue, May 13, 2014 at 6:40 PM, Radim Vansa <rvansa(a)redhat.com> wrote: >> On 05/13/2014 03:58 PM, Dan Berindei wrote: >>> >>> >>> On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rvansa(a)redhat.com> wrote: >>> @Dan: It's absolutely correct to do the further writes in order to make >>> the cache consistent, I am not arguing against that. You've fixed the >>> outcome (state of cache) well. My point was that we should let the user >>> know that the value he gets is not 100% correct when we already know >>> that - and given the API, the only option to do that seems to me as >>> throwing an exception. >>> >>> The problem, as I see it, is that users also expect methods that throw an exception to *not* modify the cache. >>> So we would break some of the users' expectations anyway. >> >> When the response from primary owner does not arrive soon, we throw timeout exception and the cache is modified anyway, isn't it? >> If we throw ~ReturnValueUnreliableException, the user has at least some chance to react. Currently, for code requiring 100% reliable value, you can't do anything but ignore the return value, even for CAS operations. >> >> >> Yes, but we don't expect the user to handle a TimeoutException in any meaningful way. Instead, we expect the user to choose his hardware and configuration to avoid timeouts, if he cares about consistency. How could you handle an exception that tells you "I may have written the value you asked me to in the cache, or maybe not. Either way, you will never know what the previous value was. Muahahaha!" in an application that cares about consistency? >> >> But the proposed ReturnValueUnreliableException can't be avoided by the user, it has to be handled every time the cluster membership changes. So it would be more like WriteSkewException than TimeoutException. And when we throw a WriteSkewException, we don't write anything to the cache. >> >> Remember, most users do not care about the previous value at all - that's the reason why JCache and our HotRod client don't return the previous value by default. Those that do care about the previous value, use the conditional write operations, and those already work (well, except for the scenario below). So you would force everyone to handle an exception that they don't care about. >> >> It would make sense to throw an exception if we didn't return the previous value by default, and the user requested the return value explicitly. But we do return the value by default, so I don't think it would be a good idea for us. >> >>> >>> >>> @Sanne: I was not suggesting that for now - sure, value versioning is (I >>> hope) on the roadmap. But that's more complicated, I though just about >>> making an adjustment to the current implementation. >>> >>> >>> Actually, just keeping a history of values would not fix the the return value in all cases. >>> >>> When retrying a put on the new primary owner, the primary owner would still have to compare our value with the latest value, and return the previous value if they are equal. So we could have something like this: >>> >>> A is the originator, B is the primary owner, k = v0 >>> A -> B: put(k, v1) >>> B dies before writing v, C is now primary owner >>> D -> C: put(k, v1) // another put operation from D, with the same value >>> C -> D: null >>> A -> C: retry_put(k, v1) >>> C -> A: v0 // C assumes A is overwriting its own value, so it's returning the previous one >>> >>> To fix that, we'd need a unique version generated by the originator - kind of like a transaction id ;) >> >> Is it such a problem to associate unique ID with each write? History implementation seems to me like the more complicated part. >> >> I also think maintaining a version history would be quite complicated, and it also would make it harder for users to estimate their cache's memory usage. That's why I was trying to show that it's not a panacea. >> >> >> >>> And to fix the HotRod use case, the HotRod client would have to be the one generating the version. >> >> I agree. >> >> Radim >> >> >>> >>> Cheers >>> Dan >>> >>> >>> >>> Radim >>> >>> On 05/12/2014 12:02 PM, Sanne Grinovero wrote: >>>> I don't think we are in a position to decide what is a reasonable >>>> compromise; we can do better. >>>> For example - as Radim suggested - it might seem reasonable to have >>>> the older value around for a little while. We'll need a little bit of >>>> history of values and tombstones anyway for many other reasons. >>>> >>>> >>>> Sanne >>>> >>>> On 12 May 2014 09:37, Dan Berindei <dan.berindei(a)gmail.com> wrote: >>>>> Radim, I would contend that the first and foremost guarantee that put() >>>>> makes is to leave the cache in a consistent state. So we can't just throw an >>>>> exception and give up, leaving k=v on one owner and k=null on another. >>>>> >>>>> Secondly, put(k, v) being atomic means that it either succeeds, it writes >>>>> k=v in the cache, and it returns the previous value, or it doesn't succeed, >>>>> and it doesn't write k=v in the cache. Returning the wrong previous value is >>>>> bad, but leaving k=v in the cache is just as bad, even if the all the owners >>>>> have the same value. >>>>> >>>>> And last, we can't have one node seeing k=null, then k=v, then k=null again, >>>>> when the only write we did on the cache was a put(k, v). So trying to undo >>>>> the write would not help. >>>>> >>>>> In the end, we have to make a compromise, and I think returning the wrong >>>>> value in some of the cases is a reasonable compromise. Of course, we should >>>>> document that :) >>>>> >>>>> I also believe ISPN-2956 could be fixed so that HotRod behaves just like >>>>> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the HotRod >>>>> protocol and to the cache itself. >>>>> >>>>> Incidentally, transactional caches have a similar problem when the >>>>> originator leaves the cluster: ISPN-3421 [1] >>>>> And we can't handle transactional caches any better than non-transactional >>>>> caches until we expose transactions to the HotRod client. >>>>> >>>>> [1] https://issues.jboss.org/browse/ISPN-2956 >>>>> >>>>> Cheers >>>>> Dan >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rvansa(a)redhat.com> wrote: >>>>>> Hi, >>>>>> >>>>>> recently I've stumbled upon one already expected behaviour (one instance >>>>>> is [1]), but which did not got much attention. >>>>>> >>>>>> In non-tx cache, when the primary owner fails after the request has been >>>>>> replicated to backup owner, the request is retried in the new topology. >>>>>> Then, the operation is executed on the new primary (the previous >>>>>> backup). The outcome has been already fixed in [2], but the return value >>>>>> may be wrong. For example, when we do a put, the return value for the >>>>>> second attempt will be the currently inserted value (although the entry >>>>>> was just created). Same situation may happen for other operations. >>>>>> >>>>>> Currently, it's not possible to return the correct value (because it has >>>>>> already been overwritten and we don't keep a history of values), but >>>>>> shouldn't we rather throw an exception if we were not able to fulfil the >>>>>> API contract? >>>>>> >>>>>> Radim >>>>>> >>>>>> [1] https://issues.jboss.org/browse/ISPN-2956 >>>>>> [2] https://issues.jboss.org/browse/ISPN-3422 >>>>>> >>>>>> -- >>>>>> Radim Vansa <rvansa(a)redhat.com> >>>>>> JBoss DataGrid QA >>>>>> >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev(a)lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> -- >>> Radim Vansa <rvansa(a)redhat.com> >>> JBoss DataGrid QA >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Radim Vansa >> <rvansa(a)redhat.com> >> >> JBoss DataGrid QA >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarreño > galder(a)redhat.com > twitter.com/galderz > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño galder(a)redhat.com twitter.com/galderz _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Galder Zamarreño

Tuesday, 17 June Tue, 17 Jun

4:53 a.m.

On 16 Jun 2014, at 16:57, Dan Berindei <dan.berindei(a)gmail.com> wrote:

...

On Thu, Jun 12, 2014 at 4:54 PM, Galder Zamarreño <galder(a)redhat.com> wrote: Hi all, I’m working on the implementation of this, and the solution noted in the JIRA does not work for situations where you have to return a previous value that might have been overriden due to partial operation application. Example (assuming 1 owner only): 1. remote client does a put(“key”, 1) in Node-A 2. remote client does a replace(“key”, 2) on Node-A but the operation fails to replicate and gets partially applieed in Node-A only. 3. remote client, seeing the replace failed, retries the replace(“key”, 2) in Node-B. replace underneath finds that the previous value of “key” is 2 (since it got partially applied in Node-A), so it succeeds but the previous value returned is 2, which is wrong. The previous value should have been 1, but this value is gone… In my view, there is two ways to solve this issue: 1. Make Hot Rod caches transactional. By doign so, operations are not partially applied. They’re done fully cluster wide or they’re rolled back. I’ve verified this and the test passes once you make the cache transactional. The downside is of course performance. IOW, anyone using conditional operations, or relying on previous values, would need transactional caches. This should work well with the retry mechanism being implemented for ISPN-2956, which is still needed. A big question here is whether Hot Rod caches should be transactional by default or viceversa. If they’re transactional, our performance will go down for sure but we won’t have this type of issues (with retry in place). If you’re not transactional, you are faster but you’re exposed to these edge cases, and you need to consider them when deploying your app, something people might miss, although we could provide WARN messages when conditional operations or Flag.FORCE_RETURN_VALUE is used with non-transactional caches. The same problem [1] can appear in embedded caches. So if there's an argument to make HotRod caches transactional by default, it should apply to embedded caches as well. I like the idea of logging a warning when the FORCE_RETURN_VALUE or the non-versioned conditional operations are used from HotRod. But we have to be careful with the wording, because inconsistencies can appear in transactional mode as well [2].

I think this (default non-transaction, and WARN if using such methods with non-transactional cache) might be the best stop gap solution.

...

[1] https://issues.jboss.org/browse/ISPN-4286 [2] https://issues.jboss.org/browse/ISPN-3421

Have you considered other fixes for [2]?

...

It does have such method but we could potentially make it throw an exception saying that is not supported.

...

I also considered applying corrective actions but that’s very messy and prone to concurrency issues, so I quickly discarded that. Yeah, rolling back partial modifications is definitely not an option. Any other options? Thoughts on the options above? I was waiting for Sanne to say this, but how about keeping a version history? If we had the chain of values/versions for each key, we could look up the version of the current operation in the chain when retrying. If it's there, we could infer that the operation was already applied, and return the previous value in the chain as the previous version. Of course, there are the usual downsides: 1. Maintaining the history might be tricky, especially around state transfers. 2. Performance will also be affected, maybe getting closer to the tx performance.

Yeah, keeping version history could help, but it’d be quite a beast to implement for 7.0, including garbage collection of old versions...etc.

...

-- Galder Zamarreño galder(a)redhat.com twitter.com/galderz

Mircea Markus

Wednesday, 18 June Wed, 18 Jun

6:34 a.m.

On Jun 17, 2014, at 10:53, Galder Zamarreño <galder(a)redhat.com> wrote:

...

On 16 Jun 2014, at 16:57, Dan Berindei <dan.berindei(a)gmail.com> wrote: > On Thu, Jun 12, 2014 at 4:54 PM, Galder Zamarreño <galder(a)redhat.com> wrote: > Hi all, > > I’m working on the implementation of this, and the solution noted in the JIRA does not work for situations where you have to return a previous value that might have been overriden due to partial operation application. Example (assuming 1 owner only): > > 1. remote client does a put(“key”, 1) in Node-A > 2. remote client does a replace(“key”, 2) on Node-A but the operation fails to replicate and gets partially applieed in Node-A only. > 3. remote client, seeing the replace failed, retries the replace(“key”, 2) in Node-B. replace underneath finds that the previous value of “key” is 2 (since it got partially applied in Node-A), so it succeeds but the previous value returned is 2, which is wrong. The previous value should have been 1, but this value is gone… > > In my view, there is two ways to solve this issue: > > 1. Make Hot Rod caches transactional. By doign so, operations are not partially applied. They’re done fully cluster wide or they’re rolled back. I’ve verified this and the test passes once you make the cache transactional. The downside is of course performance. IOW, anyone using conditional operations, or relying on previous values, would need transactional caches. This should work well with the retry mechanism being implemented for ISPN-2956, which is still needed. A big question here is whether Hot Rod caches should be transactional by default or viceversa. If they’re transactional, our performance will go down for sure but we won’t have this type of issues (with retry in place). If you’re not transactional, you are faster but you’re exposed to these edge cases, and you need to consider them when deploying your app, something people might miss, although we could provide WARN messages when conditional operations or Flag.FORCE_RETURN_VALUE is used with non-transactional caches. > > The same problem [1] can appear in embedded caches. So if there's an argument to make HotRod caches transactional by default, it should apply to embedded caches as well. > > I like the idea of logging a warning when the FORCE_RETURN_VALUE or the non-versioned conditional operations are used from HotRod. But we have to be careful with the wording, because inconsistencies can appear in transactional mode as well [2]. I think this (default non-transaction, and WARN if using such methods with non-transactional cache) might be the best stop gap solution.

+1. Would be good to have [2] fixed as well, in case people want a workable workaround.

...

> [1] https://issues.jboss.org/browse/ISPN-4286 > [2] https://issues.jboss.org/browse/ISPN-3421 Have you considered other fixes for [2]? > > > 2. Get rid of returning previous value in the Hot Rod protocol for modifying operations. For conditional operations, returning true/false is at least enough to see if the condition was applied. So, replaceIfUnmodified/replace/remove(conditional), they would only return true/false. This would be complicated due to reliance on Map/ConcurrentMap APIs. Maybe something to consider for when we stop relying on JDK APIs. > > > I'm not sure we can completely get rid of the return values, even though JCache doesn't extend Map it still has a getAndPut method. It does have such method but we could potentially make it throw an exception saying that is not supported. > > > I also considered applying corrective actions but that’s very messy and prone to concurrency issues, so I quickly discarded that. > > Yeah, rolling back partial modifications is definitely not an option. > > > Any other options? Thoughts on the options above? > > I was waiting for Sanne to say this, but how about keeping a version history? > > If we had the chain of values/versions for each key, we could look up the version of the current operation in the chain when retrying. If it's there, we could infer that the operation was already applied, and return the previous value in the chain as the previous version. > > Of course, there are the usual downsides: > 1. Maintaining the history might be tricky, especially around state transfers. > 2. Performance will also be affected, maybe getting closer to the tx performance. Yeah, keeping version history could help, but it’d be quite a beast to implement for 7.0, including garbage collection of old versions...etc.

Yes, also option 1 seems to make things work and is almost ready. Also with versioning, I think the performance is worse, as you always need to communicate when you no longer need a version(RPC), so that it can be garbage collected.

...

> > > Cheers, > > On 26 May 2014, at 18:11, Galder Zamarreño <galder(a)redhat.com> wrote: > >> Hi all, >> >> I’ve been looking into ISPN-2956 last week and I think we have a solution for it which requires a protocol change [1] >> >> Since we’re in the middle of the Hot Rod 2.0 development, this is a good opportunity to implement it. >> >> Cheers, >> >> [1] https://issues.jboss.org/browse/ISPN-2956?focusedCommentId=12970541&p... >> >> On 14 May 2014, at 09:36, Dan Berindei <dan.berindei(a)gmail.com> wrote: >> >>> >>> >>> >>> On Tue, May 13, 2014 at 6:40 PM, Radim Vansa <rvansa(a)redhat.com> wrote: >>> On 05/13/2014 03:58 PM, Dan Berindei wrote: >>>> >>>> >>>> On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rvansa(a)redhat.com> wrote: >>>> @Dan: It's absolutely correct to do the further writes in order to make >>>> the cache consistent, I am not arguing against that. You've fixed the >>>> outcome (state of cache) well. My point was that we should let the user >>>> know that the value he gets is not 100% correct when we already know >>>> that - and given the API, the only option to do that seems to me as >>>> throwing an exception. >>>> >>>> The problem, as I see it, is that users also expect methods that throw an exception to *not* modify the cache. >>>> So we would break some of the users' expectations anyway. >>> >>> When the response from primary owner does not arrive soon, we throw timeout exception and the cache is modified anyway, isn't it? >>> If we throw ~ReturnValueUnreliableException, the user has at least some chance to react. Currently, for code requiring 100% reliable value, you can't do anything but ignore the return value, even for CAS operations. >>> >>> >>> Yes, but we don't expect the user to handle a TimeoutException in any meaningful way. Instead, we expect the user to choose his hardware and configuration to avoid timeouts, if he cares about consistency. How could you handle an exception that tells you "I may have written the value you asked me to in the cache, or maybe not. Either way, you will never know what the previous value was. Muahahaha!" in an application that cares about consistency? >>> >>> But the proposed ReturnValueUnreliableException can't be avoided by the user, it has to be handled every time the cluster membership changes. So it would be more like WriteSkewException than TimeoutException. And when we throw a WriteSkewException, we don't write anything to the cache. >>> >>> Remember, most users do not care about the previous value at all - that's the reason why JCache and our HotRod client don't return the previous value by default. Those that do care about the previous value, use the conditional write operations, and those already work (well, except for the scenario below). So you would force everyone to handle an exception that they don't care about. >>> >>> It would make sense to throw an exception if we didn't return the previous value by default, and the user requested the return value explicitly. But we do return the value by default, so I don't think it would be a good idea for us. >>> >>>> >>>> >>>> @Sanne: I was not suggesting that for now - sure, value versioning is (I >>>> hope) on the roadmap. But that's more complicated, I though just about >>>> making an adjustment to the current implementation. >>>> >>>> >>>> Actually, just keeping a history of values would not fix the the return value in all cases. >>>> >>>> When retrying a put on the new primary owner, the primary owner would still have to compare our value with the latest value, and return the previous value if they are equal. So we could have something like this: >>>> >>>> A is the originator, B is the primary owner, k = v0 >>>> A -> B: put(k, v1) >>>> B dies before writing v, C is now primary owner >>>> D -> C: put(k, v1) // another put operation from D, with the same value >>>> C -> D: null >>>> A -> C: retry_put(k, v1) >>>> C -> A: v0 // C assumes A is overwriting its own value, so it's returning the previous one >>>> >>>> To fix that, we'd need a unique version generated by the originator - kind of like a transaction id ;) >>> >>> Is it such a problem to associate unique ID with each write? History implementation seems to me like the more complicated part. >>> >>> I also think maintaining a version history would be quite complicated, and it also would make it harder for users to estimate their cache's memory usage. That's why I was trying to show that it's not a panacea. >>> >>> >>> >>>> And to fix the HotRod use case, the HotRod client would have to be the one generating the version. >>> >>> I agree. >>> >>> Radim >>> >>> >>>> >>>> Cheers >>>> Dan >>>> >>>> >>>> >>>> Radim >>>> >>>> On 05/12/2014 12:02 PM, Sanne Grinovero wrote: >>>>> I don't think we are in a position to decide what is a reasonable >>>>> compromise; we can do better. >>>>> For example - as Radim suggested - it might seem reasonable to have >>>>> the older value around for a little while. We'll need a little bit of >>>>> history of values and tombstones anyway for many other reasons. >>>>> >>>>> >>>>> Sanne >>>>> >>>>> On 12 May 2014 09:37, Dan Berindei <dan.berindei(a)gmail.com> wrote: >>>>>> Radim, I would contend that the first and foremost guarantee that put() >>>>>> makes is to leave the cache in a consistent state. So we can't just throw an >>>>>> exception and give up, leaving k=v on one owner and k=null on another. >>>>>> >>>>>> Secondly, put(k, v) being atomic means that it either succeeds, it writes >>>>>> k=v in the cache, and it returns the previous value, or it doesn't succeed, >>>>>> and it doesn't write k=v in the cache. Returning the wrong previous value is >>>>>> bad, but leaving k=v in the cache is just as bad, even if the all the owners >>>>>> have the same value. >>>>>> >>>>>> And last, we can't have one node seeing k=null, then k=v, then k=null again, >>>>>> when the only write we did on the cache was a put(k, v). So trying to undo >>>>>> the write would not help. >>>>>> >>>>>> In the end, we have to make a compromise, and I think returning the wrong >>>>>> value in some of the cases is a reasonable compromise. Of course, we should >>>>>> document that :) >>>>>> >>>>>> I also believe ISPN-2956 could be fixed so that HotRod behaves just like >>>>>> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the HotRod >>>>>> protocol and to the cache itself. >>>>>> >>>>>> Incidentally, transactional caches have a similar problem when the >>>>>> originator leaves the cluster: ISPN-3421 [1] >>>>>> And we can't handle transactional caches any better than non-transactional >>>>>> caches until we expose transactions to the HotRod client. >>>>>> >>>>>> [1] https://issues.jboss.org/browse/ISPN-2956 >>>>>> >>>>>> Cheers >>>>>> Dan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rvansa(a)redhat.com> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> recently I've stumbled upon one already expected behaviour (one instance >>>>>>> is [1]), but which did not got much attention. >>>>>>> >>>>>>> In non-tx cache, when the primary owner fails after the request has been >>>>>>> replicated to backup owner, the request is retried in the new topology. >>>>>>> Then, the operation is executed on the new primary (the previous >>>>>>> backup). The outcome has been already fixed in [2], but the return value >>>>>>> may be wrong. For example, when we do a put, the return value for the >>>>>>> second attempt will be the currently inserted value (although the entry >>>>>>> was just created). Same situation may happen for other operations. >>>>>>> >>>>>>> Currently, it's not possible to return the correct value (because it has >>>>>>> already been overwritten and we don't keep a history of values), but >>>>>>> shouldn't we rather throw an exception if we were not able to fulfil the >>>>>>> API contract? >>>>>>> >>>>>>> Radim >>>>>>> >>>>>>> [1] https://issues.jboss.org/browse/ISPN-2956 >>>>>>> [2] https://issues.jboss.org/browse/ISPN-3422 >>>>>>> >>>>>>> -- >>>>>>> Radim Vansa <rvansa(a)redhat.com> >>>>>>> JBoss DataGrid QA >>>>>>> >>>>>>> _______________________________________________ >>>>>>> infinispan-dev mailing list >>>>>>> infinispan-dev(a)lists.jboss.org >>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev(a)lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev(a)lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>>> -- >>>> Radim Vansa <rvansa(a)redhat.com> >>>> JBoss DataGrid QA >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> >>>> infinispan-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> -- >>> Radim Vansa >>> <rvansa(a)redhat.com> >>> >>> JBoss DataGrid QA >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> -- >> Galder Zamarreño >> galder(a)redhat.com >> twitter.com/galderz >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Galder Zamarreño > galder(a)redhat.com > twitter.com/galderz > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño galder(a)redhat.com twitter.com/galderz _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

Sanne Grinovero

Tuesday, 13 May Tue, 13 May

10:44 a.m.

I didn't mean to suggest any solution, just that it should be fixed .. anyway you like :) If versions are needed, so be it.. but I didn't think it through. Cheers, Sanne On 13 May 2014 14:58, Dan Berindei <dan.berindei(a)gmail.com> wrote:

...

On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rvansa(a)redhat.com> wrote: > > @Dan: It's absolutely correct to do the further writes in order to make > the cache consistent, I am not arguing against that. You've fixed the > outcome (state of cache) well. My point was that we should let the user > know that the value he gets is not 100% correct when we already know > that - and given the API, the only option to do that seems to me as > throwing an exception. The problem, as I see it, is that users also expect methods that throw an exception to *not* modify the cache. So we would break some of the users' expectations anyway. > > > @Sanne: I was not suggesting that for now - sure, value versioning is (I > hope) on the roadmap. But that's more complicated, I though just about > making an adjustment to the current implementation. Actually, just keeping a history of values would not fix the the return value in all cases. When retrying a put on the new primary owner, the primary owner would still have to compare our value with the latest value, and return the previous value if they are equal. So we could have something like this: A is the originator, B is the primary owner, k = v0 A -> B: put(k, v1) B dies before writing v, C is now primary owner D -> C: put(k, v1) // another put operation from D, with the same value C -> D: null A -> C: retry_put(k, v1) C -> A: v0 // C assumes A is overwriting its own value, so it's returning the previous one To fix that, we'd need a unique version generated by the originator - kind of like a transaction id ;) And to fix the HotRod use case, the HotRod client would have to be the one generating the version. Cheers Dan > > Radim > > On 05/12/2014 12:02 PM, Sanne Grinovero wrote: > > I don't think we are in a position to decide what is a reasonable > > compromise; we can do better. > > For example - as Radim suggested - it might seem reasonable to have > > the older value around for a little while. We'll need a little bit of > > history of values and tombstones anyway for many other reasons. > > > > > > Sanne > > > > On 12 May 2014 09:37, Dan Berindei <dan.berindei(a)gmail.com> wrote: > >> Radim, I would contend that the first and foremost guarantee that put() > >> makes is to leave the cache in a consistent state. So we can't just > >> throw an > >> exception and give up, leaving k=v on one owner and k=null on another. > >> > >> Secondly, put(k, v) being atomic means that it either succeeds, it > >> writes > >> k=v in the cache, and it returns the previous value, or it doesn't > >> succeed, > >> and it doesn't write k=v in the cache. Returning the wrong previous > >> value is > >> bad, but leaving k=v in the cache is just as bad, even if the all the > >> owners > >> have the same value. > >> > >> And last, we can't have one node seeing k=null, then k=v, then k=null > >> again, > >> when the only write we did on the cache was a put(k, v). So trying to > >> undo > >> the write would not help. > >> > >> In the end, we have to make a compromise, and I think returning the > >> wrong > >> value in some of the cases is a reasonable compromise. Of course, we > >> should > >> document that :) > >> > >> I also believe ISPN-2956 could be fixed so that HotRod behaves just > >> like > >> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the > >> HotRod > >> protocol and to the cache itself. > >> > >> Incidentally, transactional caches have a similar problem when the > >> originator leaves the cluster: ISPN-3421 [1] > >> And we can't handle transactional caches any better than > >> non-transactional > >> caches until we expose transactions to the HotRod client. > >> > >> [1] https://issues.jboss.org/browse/ISPN-2956 > >> > >> Cheers > >> Dan > >> > >> > >> > >> > >> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rvansa(a)redhat.com> > >> wrote: > >>> Hi, > >>> > >>> recently I've stumbled upon one already expected behaviour (one > >>> instance > >>> is [1]), but which did not got much attention. > >>> > >>> In non-tx cache, when the primary owner fails after the request has > >>> been > >>> replicated to backup owner, the request is retried in the new > >>> topology. > >>> Then, the operation is executed on the new primary (the previous > >>> backup). The outcome has been already fixed in [2], but the return > >>> value > >>> may be wrong. For example, when we do a put, the return value for the > >>> second attempt will be the currently inserted value (although the > >>> entry > >>> was just created). Same situation may happen for other operations. > >>> > >>> Currently, it's not possible to return the correct value (because it > >>> has > >>> already been overwritten and we don't keep a history of values), but > >>> shouldn't we rather throw an exception if we were not able to fulfil > >>> the > >>> API contract? > >>> > >>> Radim > >>> > >>> [1] https://issues.jboss.org/browse/ISPN-2956 > >>> [2] https://issues.jboss.org/browse/ISPN-3422 > >>> > >>> -- > >>> Radim Vansa <rvansa(a)redhat.com> > >>> JBoss DataGrid QA > >>> > >>> _______________________________________________ > >>> infinispan-dev mailing list > >>> infinispan-dev(a)lists.jboss.org > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > >> > >> > >> _______________________________________________ > >> infinispan-dev mailing list > >> infinispan-dev(a)lists.jboss.org > >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > > infinispan-dev mailing list > > infinispan-dev(a)lists.jboss.org > > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa <rvansa(a)redhat.com> > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Mircea Markus

Tuesday, 27 May Tue, 27 May

11:47 a.m.

On May 13, 2014, at 14:58, Dan Berindei <dan.berindei(a)gmail.com> wrote:

...

On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rvansa(a)redhat.com> wrote: @Dan: It's absolutely correct to do the further writes in order to make the cache consistent, I am not arguing against that. You've fixed the outcome (state of cache) well. My point was that we should let the user know that the value he gets is not 100% correct when we already know that - and given the API, the only option to do that seems to me as throwing an exception. The problem, as I see it, is that users also expect methods that throw an exception to *not* modify the cache. So we would break some of the users' expectations anyway.

I don't see how we can guarantee that if a method throws an exception nothing has been applies without a 2PC/TX. I think this should be a expectation for non-tx caches. i.e. if an operation throws an exception, then the state of the data is inconsistent.

...

@Sanne: I was not suggesting that for now - sure, value versioning is (I hope) on the roadmap. But that's more complicated, I though just about making an adjustment to the current implementation. Actually, just keeping a history of values would not fix the the return value in all cases. When retrying a put on the new primary owner, the primary owner would still have to compare our value with the latest value, and return the previous value if they are equal. So we could have something like this: A is the originator, B is the primary owner, k = v0 A -> B: put(k, v1) B dies before writing v, C is now primary owner D -> C: put(k, v1) // another put operation from D, with the same value C -> D: null A -> C: retry_put(k, v1) C -> A: v0 // C assumes A is overwriting its own value, so it's returning the previous one To fix that, we'd need a unique version generated by the originator - kind of like a transaction id ;) And to fix the HotRod use case, the HotRod client would have to be the one generating the version. Cheers Dan Radim On 05/12/2014 12:02 PM, Sanne Grinovero wrote: > I don't think we are in a position to decide what is a reasonable > compromise; we can do better. > For example - as Radim suggested - it might seem reasonable to have > the older value around for a little while. We'll need a little bit of > history of values and tombstones anyway for many other reasons. > > > Sanne > > On 12 May 2014 09:37, Dan Berindei <dan.berindei(a)gmail.com> wrote: >> Radim, I would contend that the first and foremost guarantee that put() >> makes is to leave the cache in a consistent state. So we can't just throw an >> exception and give up, leaving k=v on one owner and k=null on another. >> >> Secondly, put(k, v) being atomic means that it either succeeds, it writes >> k=v in the cache, and it returns the previous value, or it doesn't succeed, >> and it doesn't write k=v in the cache. Returning the wrong previous value is >> bad, but leaving k=v in the cache is just as bad, even if the all the owners >> have the same value. >> >> And last, we can't have one node seeing k=null, then k=v, then k=null again, >> when the only write we did on the cache was a put(k, v). So trying to undo >> the write would not help. >> >> In the end, we have to make a compromise, and I think returning the wrong >> value in some of the cases is a reasonable compromise. Of course, we should >> document that :) >> >> I also believe ISPN-2956 could be fixed so that HotRod behaves just like >> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the HotRod >> protocol and to the cache itself. >> >> Incidentally, transactional caches have a similar problem when the >> originator leaves the cluster: ISPN-3421 [1] >> And we can't handle transactional caches any better than non-transactional >> caches until we expose transactions to the HotRod client. >> >> [1] https://issues.jboss.org/browse/ISPN-2956 >> >> Cheers >> Dan >> >> >> >> >> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rvansa(a)redhat.com> wrote: >>> Hi, >>> >>> recently I've stumbled upon one already expected behaviour (one instance >>> is [1]), but which did not got much attention. >>> >>> In non-tx cache, when the primary owner fails after the request has been >>> replicated to backup owner, the request is retried in the new topology. >>> Then, the operation is executed on the new primary (the previous >>> backup). The outcome has been already fixed in [2], but the return value >>> may be wrong. For example, when we do a put, the return value for the >>> second attempt will be the currently inserted value (although the entry >>> was just created). Same situation may happen for other operations. >>> >>> Currently, it's not possible to return the correct value (because it has >>> already been overwritten and we don't keep a history of values), but >>> shouldn't we rather throw an exception if we were not able to fulfil the >>> API contract? >>> >>> Radim >>> >>> [1] https://issues.jboss.org/browse/ISPN-2956 >>> [2] https://issues.jboss.org/browse/ISPN-3422 >>> >>> -- >>> Radim Vansa <rvansa(a)redhat.com> >>> JBoss DataGrid QA >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa <rvansa(a)redhat.com> JBoss DataGrid QA _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

Galder Zamarreño

Wednesday, 28 May Wed, 28 May

5:48 a.m.

On 27 May 2014, at 18:47, Mircea Markus <mmarkus(a)redhat.com> wrote:

...

On May 13, 2014, at 14:58, Dan Berindei <dan.berindei(a)gmail.com> wrote: > > > > On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rvansa(a)redhat.com> wrote: > @Dan: It's absolutely correct to do the further writes in order to make > the cache consistent, I am not arguing against that. You've fixed the > outcome (state of cache) well. My point was that we should let the user > know that the value he gets is not 100% correct when we already know > that - and given the API, the only option to do that seems to me as > throwing an exception. > > The problem, as I see it, is that users also expect methods that throw an exception to *not* modify the cache. > So we would break some of the users' expectations anyway. I don't see how we can guarantee that if a method throws an exception nothing has been applies without a 2PC/TX. I think this should be a expectation for non-tx caches. i.e. if an operation throws an exception, then the state of the data is inconsistent.

If we did that, our retry logic would remain badly broken for situations like the one mentioned in ISPN-2956. Unless you want to get rid of the retry logic altogether and let the client users decide what to do, I think we should improve the retry logic to better deal with such situations, and we have ways to do so [1] [1] https://issues.jboss.org/browse/ISPN-2956?focusedCommentId=12970541&p...

...

> > > @Sanne: I was not suggesting that for now - sure, value versioning is (I > hope) on the roadmap. But that's more complicated, I though just about > making an adjustment to the current implementation. > > > Actually, just keeping a history of values would not fix the the return value in all cases. > > When retrying a put on the new primary owner, the primary owner would still have to compare our value with the latest value, and return the previous value if they are equal. So we could have something like this: > > A is the originator, B is the primary owner, k = v0 > A -> B: put(k, v1) > B dies before writing v, C is now primary owner > D -> C: put(k, v1) // another put operation from D, with the same value > C -> D: null > A -> C: retry_put(k, v1) > C -> A: v0 // C assumes A is overwriting its own value, so it's returning the previous one > > To fix that, we'd need a unique version generated by the originator - kind of like a transaction id ;) > And to fix the HotRod use case, the HotRod client would have to be the one generating the version. > > Cheers > Dan > > > > Radim > > On 05/12/2014 12:02 PM, Sanne Grinovero wrote: >> I don't think we are in a position to decide what is a reasonable >> compromise; we can do better. >> For example - as Radim suggested - it might seem reasonable to have >> the older value around for a little while. We'll need a little bit of >> history of values and tombstones anyway for many other reasons. >> >> >> Sanne >> >> On 12 May 2014 09:37, Dan Berindei <dan.berindei(a)gmail.com> wrote: >>> Radim, I would contend that the first and foremost guarantee that put() >>> makes is to leave the cache in a consistent state. So we can't just throw an >>> exception and give up, leaving k=v on one owner and k=null on another. >>> >>> Secondly, put(k, v) being atomic means that it either succeeds, it writes >>> k=v in the cache, and it returns the previous value, or it doesn't succeed, >>> and it doesn't write k=v in the cache. Returning the wrong previous value is >>> bad, but leaving k=v in the cache is just as bad, even if the all the owners >>> have the same value. >>> >>> And last, we can't have one node seeing k=null, then k=v, then k=null again, >>> when the only write we did on the cache was a put(k, v). So trying to undo >>> the write would not help. >>> >>> In the end, we have to make a compromise, and I think returning the wrong >>> value in some of the cases is a reasonable compromise. Of course, we should >>> document that :) >>> >>> I also believe ISPN-2956 could be fixed so that HotRod behaves just like >>> embedded mode after the ISPN-3422 fix, by adding a RETRY flag to the HotRod >>> protocol and to the cache itself. >>> >>> Incidentally, transactional caches have a similar problem when the >>> originator leaves the cluster: ISPN-3421 [1] >>> And we can't handle transactional caches any better than non-transactional >>> caches until we expose transactions to the HotRod client. >>> >>> [1] https://issues.jboss.org/browse/ISPN-2956 >>> >>> Cheers >>> Dan >>> >>> >>> >>> >>> On Mon, May 12, 2014 at 10:21 AM, Radim Vansa <rvansa(a)redhat.com> wrote: >>>> Hi, >>>> >>>> recently I've stumbled upon one already expected behaviour (one instance >>>> is [1]), but which did not got much attention. >>>> >>>> In non-tx cache, when the primary owner fails after the request has been >>>> replicated to backup owner, the request is retried in the new topology. >>>> Then, the operation is executed on the new primary (the previous >>>> backup). The outcome has been already fixed in [2], but the return value >>>> may be wrong. For example, when we do a put, the return value for the >>>> second attempt will be the currently inserted value (although the entry >>>> was just created). Same situation may happen for other operations. >>>> >>>> Currently, it's not possible to return the correct value (because it has >>>> already been overwritten and we don't keep a history of values), but >>>> shouldn't we rather throw an exception if we were not able to fulfil the >>>> API contract? >>>> >>>> Radim >>>> >>>> [1] https://issues.jboss.org/browse/ISPN-2956 >>>> [2] https://issues.jboss.org/browse/ISPN-3422 >>>> >>>> -- >>>> Radim Vansa <rvansa(a)redhat.com> >>>> JBoss DataGrid QA >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev(a)lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev(a)lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev(a)lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > -- > Radim Vansa <rvansa(a)redhat.com> > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev(a)lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list infinispan-dev(a)lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- Galder Zamarreño galder(a)redhat.com twitter.com/galderz

Mircea Markus

12:12 p.m.

On May 28, 2014, at 11:48, Galder Zamarreño <galder(a)redhat.com> wrote:

...

On 27 May 2014, at 18:47, Mircea Markus <mmarkus(a)redhat.com> wrote: > > On May 13, 2014, at 14:58, Dan Berindei <dan.berindei(a)gmail.com> wrote: > >> >> >> >> On Mon, May 12, 2014 at 1:54 PM, Radim Vansa <rvansa(a)redhat.com> wrote: >> @Dan: It's absolutely correct to do the further writes in order to make >> the cache consistent, I am not arguing against that. You've fixed the >> outcome (state of cache) well. My point was that we should let the user >> know that the value he gets is not 100% correct when we already know >> that - and given the API, the only option to do that seems to me as >> throwing an exception. >> >> The problem, as I see it, is that users also expect methods that throw an exception to *not* modify the cache. >> So we would break some of the users' expectations anyway. > > I don't see how we can guarantee that if a method throws an exception nothing has been applies without a 2PC/TX. I think this should be a expectation for non-tx caches. i.e. if an operation throws an exception, then the state of the data is inconsistent. If we did that, our retry logic would remain badly broken for situations like the one mentioned in ISPN-2956. Unless you want to get rid of the retry logic altogether and let the client users decide what to do, I think we should improve the retry logic to better deal with such situations, and we have ways to do so [1] [1] https://issues.jboss.org/browse/ISPN-2956?focusedCommentId=12970541&p...

You are right, retrying is better client experience than failing blindly. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org)

3788

days inactive

3825

days old

infinispan-dev@lists.jboss.org

Manage subscription

15 comments

5 participants

tags (0)

participants (5)

Dan Berindei
Galder Zamarreño
Mircea Markus
Radim Vansa
Sanne Grinovero

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Reliability of return values