[JBoss JIRA] (ISPN-2037) Map/Reduce tasks should process entries in the CacheLoader as well

Sanne Grinovero (JIRA)

Thursday, 24 May Thu, 24 May

11:41 p.m.

[ https://issues.jboss.org/browse/ISPN-2037?page=com.atlassian.jira.plugin.... ] Sanne Grinovero commented on ISPN-2037: --------------------------------------- http://lists.jboss.org/pipermail/infinispan-dev/2012-May/010634.html

...

Map/Reduce tasks should process entries in the CacheLoader as well ------------------------------------------------------------------ Key: ISPN-2037 URL: https://issues.jboss.org/browse/ISPN-2037 Project: Infinispan Issue Type: Feature Request Reporter: Sanne Grinovero Assignee: Vladimir Blagojevic Fix For: 5.2.0.FINAL

-- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply

Sanne Grinovero (JIRA)

Friday, 29 June Fri, 29 Jun

6:52 p.m.

[ https://issues.jboss.org/browse/ISPN-2037?page=com.atlassian.jira.plugin.... ] Sanne Grinovero updated ISPN-2037: ---------------------------------- Priority: Critical (was: Major)

...

Map/Reduce tasks should process entries in the CacheLoader as well ------------------------------------------------------------------ Key: ISPN-2037 URL: https://issues.jboss.org/browse/ISPN-2037 Project: Infinispan Issue Type: Feature Request Reporter: Sanne Grinovero Assignee: Vladimir Blagojevic Priority: Critical Fix For: 5.2.0.FINAL

-- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply

Sanne Grinovero (JIRA)

12:54 a.m.

[ https://issues.jboss.org/browse/ISPN-2037?page=com.atlassian.jira.plugin.... ] Sanne Grinovero commented on ISPN-2037: --------------------------------------- {quote} <vblagoje> I just pinged you sannegrinovero regarding cache loaders in map reduce. Once you get a chance provide as much details as you can so I can finish that one off quickly <sannegrinovero> vblagoje, just read your ISPN-2037 .. good timing ! let's talk about it? <jbossbot> jira [ISPN-2037] Map/Reduce tasks should process entries in the CacheLoader as well [Open (Unresolved) Feature Request, Critical, Vladimir Blagojevic] https://issues.jboss.org/browse/ISPN-2037 <vblagoje> ok sannegrinovero <vblagoje> sure, have time now <vblagoje> ? <sannegrinovero> yes <sannegrinovero> so the problem is that you have to load all entries from the cacheloader <sannegrinovero> but you don't actually know which keys exist <vblagoje> aha i see <vblagoje> side question, does keySet return stuff in cache loader? <sannegrinovero> so you have to load "Select * from database", which will very likely kill you by OOM <sannegrinovero> I don't know that <vblagoje> hmmm, how are these caches configured to begin with? any eviction, passivation, shared/single etc? <vblagoje> shared/single cache loader <vblagoje> sannegrinovero, who is most familiar with cahceloaders on our team? <sannegrinovero> mmarkus is the leader I think.. ttarrant accumulated some experience too. <sannegrinovero> sorry was away, back now. <sannegrinovero> Does it matter how caches are configured? <sannegrinovero> for sure if it's a single shared cacheloader, you'll have the problem that you only want to load entries for the current node, skipping all the rest. <sannegrinovero> otherwise you definitely kill the purpose of map reduce.. loading the whole database from a single slow store N times where N is the number of nodes in the cluster... very bad :D <sannegrinovero> vblagoje, ^ <vblagoje> so basically the problem is when map command arrives to node where it is executed and it needs to load all local keys, right now it does not load keys in cache loader - you say, is that right? <mmarkus> vblagoje: anything I can help with? <vblagoje> this is the case of map/reduce when input keys are not specified - basically do map/reduce on all keys <sannegrinovero> vblagoje, I'm not sure. maybe it works but I don't think so. we definitely need to have tests with Map/Reduce && cacheloader interactions. <sannegrinovero> yes, and for Query I never have the keys. I think even for map/reduce it's the most common case (to not have the set of keys) <sannegrinovero> on the other hand, with non-shared cacheloaders it's even trickier. <mmarkus> vblagoje: keySet ignores the cache loader interceptor <vblagoje> yeah ok <vblagoje> aha mmarkus, thanks dude; how do we get them? <mmarkus> vblagoje: only returns what's in memory <mmarkus> you need a way to get all the keys from a cache loader? <vblagoje> exactly <sannegrinovero> this is strongly related guys: https://community.jboss.org/wiki/CacheLoaderAndCacheStoreSPIRedesign#comm... <mmarkus> vblagoje: that's a hard one man <sannegrinovero> I had sketched an API proposal on the mailing list <sannegrinovero> which manik had +1'ed .. searching for a link <sannegrinovero> vblagoje: http://lists.jboss.org/pipermail/infinispan-dev/2012-May/010760.html <vblagoje> mmarkus, sanne , maybe we can use loadAllKeys on CacheLoader interface and then slowly load values for each of those keys <mmarkus> vblagoje: thinkig about it from an jdbc perspective <sannegrinovero> vblagoje, you are the map/reduce API master :) the proposal I described has a similar API in concept, that the visitor "demands" collection of entries from the CacheLoader implementation, in blocks. <sannegrinovero> so the details of how the blocks of data are loaded are delegated to the implementation, which is important <vblagoje> you mean this Processor API sanne? <sannegrinovero> but the flow control is handled by your consumer <mmarkus> sannegrinovero: +1. I was about to say the same thing :) <sannegrinovero> mmarkus, nice :) <sannegrinovero> vblagoje, yes <mmarkus> sannegrinovero: indeed, it was ur idea :) <sannegrinovero> guys am I the only one wasting my dev days by writing on design on the ML :D ? <vblagoje> No, we all are hahaha <sannegrinovero> admittedly this was assigned to Manik ;) <mmarkus> sannegrinovero: you do have a fare share though :) <vblagoje> there are so many things to do noone has time to review anyone else's design <vblagoje> so Manik was going to do these Processor callbacks? <sannegrinovero> yea I realize that, I admit I was hooked to this as I need it. <sannegrinovero> no vblagoje I think I moved it to your plate, but as you can see from [1] he was driving the proposal of a new API <sannegrinovero> https://community.jboss.org/wiki/CacheLoaderAndCacheStoreSPIRedesign <vblagoje> ok, sanne, but how can we do this if it is planned for 6.0 and you need this yesterday? <sannegrinovero> 6?? I'm not sure. <sannegrinovero> let me read Manik's email again <vblagoje> that is what the documents says <vblagoje> at the top of https://community.jboss.org/wiki/CacheLoaderAndCacheStoreSPIRedesign <sannegrinovero> right, and he doesn't mention it in the todo mails. Not sure that's a mistake? <sannegrinovero> Or maybe he means that you should make it work on the current CacheLoader SPI <sannegrinovero> which basically means, very dumbly load it all in memory, and improve later on. <sannegrinovero> vblagoje, proposal: focus on creating some good tests which cover map/reduce examples on non-trivial data, both with shared/non shared cacheloaders, passivation/no passivation, etc. and make it work without bothering too much about OOM and efficiency at the CacheLoader level. <sannegrinovero> So that will help define exactly what is best to have at SPI later for 6.0 <vblagoje> ok sannegrinovero, I looked at Manik's proposal, this is all very very rough sketches <sannegrinovero> and in terms of efficiency/performance you focus on what you have done so far (not considering cacheloaders), but add cacheloaders only as *functional* tests. <vblagoje> but <vblagoje> we can make this work without some new API redesign - I think <sannegrinovero> just forwarded Manik's last comment by email ;) <vblagoje> yes, sannegrinovero, we should not make API changes now <vblagoje> but lets make it work somehow <sannegrinovero> right. Just make sure to document it, it's better to warn people than to disappoint. <sannegrinovero> I mean in terms of maturity, I guess you're going to advertise your new Map/Reduce as it's maturing quickly, but the CacheLoader integration can't be considered usable until that design is fixed. <vblagoje> i have to think about how can this be done; and could use some help there <vblagoje> for example: i think wee need to use cacheLoader.loadAllKeys and then use that to load values and pass them to map reduce <vblagoje> use keys, to laod <vblagoje> load values <sannegrinovero> yes, seems the only way we can do it with the current SPI. <vblagoje> but can we use raw reference to cache loader outside of cache loader interceptor <vblagoje> these are some of the questions I have <vblagoje> should it be done this way? If not, then how? <vblagoje> the only person I think might help here is mmarkus; but I am afraid to ask him for any help as my tab in his pub is very very long <vblagoje> hahaha <sannegrinovero> :) <vblagoje> nothing; I'll play with it until we figure out something sannegrinovero <sannegrinovero> vblagoje, one could think of some hacks here and there, but the fact remains that with this API it's too limited to do it properly. Then let's not do it propertly, just correct and document the limitation. I wouldn't bother too much, unless you get a genius intuituion. <vblagoje> ok let me see sannegrinovero <vblagoje> i thought this is is life and death critical to you? <vblagoje> i mean this impl <sannegrinovero> I mean, let's keep it clean. just load all keys, and iterate on them. Maybe you can filter on the keys: for DIST, you keep only the keys locally owned and ignore the others. <sannegrinovero> vblagoje, I'll explain you the use case. <vblagoje> yeah something like that <sannegrinovero> the index containing all data is corrupted, or not longer valid because of an upgrade, or the disks containing it are on fire. <sannegrinovero> so indexes are lost. <sannegrinovero> you have to re-index ALL data stored in the grid, so to rebuild the indexes and be able to find your objects again. <sannegrinovero> Imagine you stored your items for sale, <vblagoje> ok <sannegrinovero> at this point if you don't load the stuff from the cacheloaders (even those for which the keys are not in memory) <sannegrinovero> those iterms for sale are lost forever :( <sannegrinovero> But you can take out indexing from the example, and think of a Map/Reduce task on all your items. <sannegrinovero> It's definitely no fun it Infinispan "forgets" to process 90% of the data you have. <sannegrinovero> there are two main use cases: <sannegrinovero> 1) memory is not enough - very likely you need to offload not-so-hot elements to disks/cassandra/wathever <sannegrinovero> 2) you powered down some nodes, and reboot them. data is in the cacheloaders, but you don't have the keys. <vblagoje> ok got it; this is pretty crucial then <sannegrinovero> so the M/R api is of no use in real world if it doesn't process passivated entries as well. <sannegrinovero> which is why I thought of you as the best person to think about it ;-) <sannegrinovero> yea simply but I don't think nor M/R nor indexing are of any use without this. <vblagoje> what if we can get M/R to load and process keys from cache loaders - as a first target of this task and then once a nice new API is in place we'll just adjust M/R? <sannegrinovero> sounds like the best plan. <vblagoje> just a sec <vblagoje> ok, so lets do that sannegrinovero; when do you need this by? working and tested? <sannegrinovero> vblagoje, it's not me needing it, but you ;) as I said, M/R is not going to be used on real world applications until you have it. Same for Query, we need it to make Query good enough to be ready. <vblagoje> hahaha, italian school of diplomacy <vblagoje> sure <sannegrinovero> so Manik listed priorities. Query and Map/Reduce are both highly requested, this is blocking both.. <vblagoje> yeah makes sense; I'll work on this full force then and after it is done back to M/R <sannegrinovero> so this is a priority, unless it interferes with cross-data center or NBST which are even more important. <vblagoje> ok sannegrinovero, nuff for today, i am fried and I cannot believe you are not asleep yet :-( <vblagoje> lets talk soon <sannegrinovero> vblagoje, cool :) I'll paste this on the JIRA. <vblagoje> ok, deal {quote}

...

Map/Reduce tasks should process entries in the CacheLoader as well ------------------------------------------------------------------ Key: ISPN-2037 URL: https://issues.jboss.org/browse/ISPN-2037 Project: Infinispan Issue Type: Feature Request Reporter: Sanne Grinovero Assignee: Vladimir Blagojevic Priority: Critical Fix For: 5.2.0.FINAL

-- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply

Galder Zamarreño (JIRA)

7:16 p.m.

[ https://issues.jboss.org/browse/ISPN-2037?page=com.atlassian.jira.plugin.... ] Galder Zamarreño resolved ISPN-2037. ------------------------------------ Fix Version/s: 5.2.0.ALPHA2 Resolution: Done

...

Map/Reduce tasks should process entries in the CacheLoader as well ------------------------------------------------------------------ Key: ISPN-2037 URL: https://issues.jboss.org/browse/ISPN-2037 Project: Infinispan Issue Type: Feature Request Reporter: Sanne Grinovero Assignee: Vladimir Blagojevic Priority: Blocker Fix For: 5.2.0.ALPHA2, 5.2.0.FINAL

-- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.jboss.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply

Vladimir Blagojevic (JIRA)

Tuesday, 11 September Tue, 11 Sep

12:36 p.m.

[ https://issues.jboss.org/browse/ISPN-2037?page=com.atlassian.jira.plugin.... ] Vladimir Blagojevic updated ISPN-2037: -------------------------------------- Git Pull Request: https://github.com/infinispan/infinispan/pull/1218 (was: https://github.com/infinispan/infinispan/pull/1218) Component/s: Distributed Execution and Map/Reduce

...

Map/Reduce tasks should process entries in the CacheLoader as well ------------------------------------------------------------------ Key: ISPN-2037 URL: https://issues.jboss.org/browse/ISPN-2037 Project: Infinispan Issue Type: Feature Request Components: Distributed Execution and Map/Reduce Reporter: Sanne Grinovero Assignee: Vladimir Blagojevic Priority: Blocker Fix For: 5.2.0.ALPHA2, 5.2.0.Final

-- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009