Steve Ebersole commented on Improvement HHH-7841

Great discussion with Gail Badner that I wanted to capture for posterity:

[10:20] <sebersole> ok, thanks
[10:20] <sebersole> tieing in jpa entity graphs will be fun
[10:21] <sebersole> one thought was to use a different visitor
[10:21] <gbadner> yeah, I was thinking that
[10:21] <sebersole> one that drives on the graph, not the model
[10:22] <gbadner> iiuc, it needs to do both
[10:22] <sebersole> you can only drive on one
[10:22] <sebersole> drive means the main driver
[10:22] <sebersole> you have to look at both sure
[10:22] <sebersole> but thats not the same
[10:22] <gbadner> I need to look at what you did for entity graphs in master
[10:23] <sebersole> its the same split we have today in hql processing and normal walk/loaders
[10:23] <gbadner> wimi, you need what's specified in the graph plus get the defaults from the model
[10:23] <sebersole> i did nothing that effects this
[10:23] <sebersole> sure
[10:23] <sebersole> but there is a different between driving based on somethign and looking at something
[10:24] <sebersole> i use driving as in the sql sense
[10:24] <sebersole> the root thing
[10:24] <sebersole> there is a reason hql, for example, does not honor mapping define dfetches
[10:24] <sebersole> a practical reason i mean
[10:24] <sebersole> aside from any philosophical reasons
[10:25] <gbadner> hmm, didn't realize that
[10:25] <sebersole> because you can only drivbe that on one set of data
[10:25] <sebersole> in hql that is the explicit join fetches in the query
[10:25] <sebersole> we possibly have the same condition here
[10:25] <sebersole> but
[10:26] <sebersole> there is a difference
[10:26] <sebersole> we could (i think) still incorporate entity graphs using the model driven visitor
[10:26] <sebersole> the eplicit fetches from the entity graph would be additional
[10:27] <gbadner> yeah, I was thinking something like that
[10:27] <sebersole> this has the added benefit that the main parts of the base loadplan could still get reused
[10:28] <sebersole> we'd say that the entity graph can only add non-join fetches
[10:28] <sebersole> which is actually more in-line with their intent anyway
[10:28] <gbadner> oh, I see; they'd be follow on fetches
[10:28] <sebersole> true the spec does not say how the fetches need to happen...
[10:29] <sebersole> but the whole graph stuff cam about because of people not being happy with hql/jpql join fetches
[10:29] <sebersole> right
[10:29] <sebersole> they'd be batch or ...
[10:29] <sebersole> still done IMMEDIATELY
[10:29] <sebersole> but subsequent
[10:29] <gbadner> right
[10:30] --> galderz has joined this channel (~galder@redhat/jboss/galderz).
[10:30] <sebersole> have you had a chance to peek at the resultset processor stuff?
[10:30] <sebersole> thats the crux
[10:30] <gbadner> no, that's next
[10:30] <sebersole> ok
[10:30] <gbadner> is there a test for that?
[10:30] <sebersole> no, not yet
[10:31] <sebersole> because its really still in dev
[10:31] <gbadner> where's a good place to start with that?
[10:31] <sebersole> whatever the main impl is
[10:31] <sebersole> ResultSetProcessorImpl i think its called
[10:31] <sebersole> the ScrollableResultSetProcessorImpl stuff is conceptual atm
[10:32] <sebersole> another option is to move the building of those to LoadPlan
[10:32] <gbadner> ok, I see it
[10:32] <sebersole> LoadPlan.buildResultSetProcessor()
[10:32] <sebersole> LoadPlan.buildScrollableResultSetProcessor()
[10:32] <sebersole> see how those "feel" while you look at it
[10:32] <gbadner> yeah, that would make sense
[10:33] <sebersole> in terms of the "reuse" stuff we discussed last night for loadplans...
[10:34] <sebersole> we might want to consider ways to allow LoadPlans to be loosely equal
[10:35] <sebersole> what i mean by that is to have a way to say that 2 loadplans, for example, are "equal" is they define the same set of join fetches
[10:35] <sebersole> and then to offload the subsequent fetches somehere, since thats the part that is likely to vary
[10:35] <sebersole> maybe chained loadplans?
[10:36] <gbadner> yeah
[10:36] <sebersole> dunno exactly
[10:36] <sebersole> but something to think through
[10:36] <gbadner> well, maybe not chained, just sequential
[10:36] <sebersole> ideall the more often we can reuse the loadplans the better
[10:36] <gbadner> yeah, the more granular, the better
[10:36] <sebersole> +1000
[10:37] <sebersole> heres maybe another way to look at it
[10:37] <gbadner> I can see the follow on load plans being simple building blocks
[10:37] <gbadner> more a matter of assembling them
[10:37] <sebersole> well
[10:38] <sebersole> the difficulty there is still this notion of paths
[10:38] <sebersole> the loadplan, as is, maintains that because the fetches (all kinds) are still "in path"
[10:39] <sebersole> the difference really all comes down to ResultSetProcessor
[10:39] <sebersole> and the fact that we might have to perform additional selects here, versus there
[10:40] <gbadner> is "fetch depth" defined by jpa?
[10:40] <sebersole> still thinking about the entity graph case here
[10:40] <sebersole> not sure what you mean by jpa and "fetch depth"
[10:40] <sebersole> anyway
[10:40] <gbadner> oh, I guess that would be more of an implementation detail
[10:41] <sebersole> what i am trying to plan through is that it sure would be nice to have entity graphs work such that...
[10:41] <gbadner> anything that exceeds it would be follow-on
[10:41] <sebersole> we use the same loadplan whether there is a entity graph in play or not
[10:42] <sebersole> the difference is that the graph simply defines additonal subsequent fetches
[10:42] <sebersole> which really effects just the ResultSetProcessor
[10:42] <sebersole> this is part of that code I have not fleshed out yet
[10:43] <gbadner> ResultSetProcessor isn't 1:1 w/ LoadPlan?
[10:43] <sebersole> but the basic idea is that ResultSetProcessor would need a way to let each subsequent select know about the matching key found for each row
[10:44] <sebersole> thats to be decided
[10:44] <sebersole> if we go the LoadPlan.buildResultSetProcessor() route
[10:44] <sebersole> the decision point is whether the ResultSetProcessor is stateful
[10:45] <sebersole> but
[10:45] <sebersole> the important point in either case
[10:45] <sebersole> is that given a LoadPlan I can get a delegate capable of reading the results from a result set with the shape described by that plan
[10:46] <sebersole> i dont care really whether there are multiple physical instances
[10:46] <sebersole> or one per loadplan
[10:46] <sebersole> thats not important to me
[10:46] <sebersole> li think having them statefukl could be useful
[10:47] <sebersole> we could get rid of the LoadContexts stuff
[10:47] <gbadner> that would be nice
[10:47] <sebersole> since the entity and collection load context reall would be the stateful resultsetprocessor
[10:48] <sebersole> we'd need to be able to chain those though
[10:48] <-- DavideD has left this server (Ping timeout: 272 seconds).
[10:48] <sebersole> hopefully that makes sense
[10:48] --> hardy has joined this channel (~hardy@redhat/jboss/hardy).
[10:49] <gbadner> I'm thinking about what "chaining" means
[10:49] <sebersole> well subsequent selects...
[10:49] <sebersole> so you are processing a row from the result set that has Employee#1
[10:49] <gbadner> I can see that a ResultSetProcessor would get some input (e.g., ID) for loading
[10:49] <sebersole> and Employee's Address was not join fetched
[10:50] <sebersole> so you need to get their Address
[10:50] <gbadner> yep, got it
[10:50] <sebersole> ok
[10:51] <sebersole> now if only sannegrinovero would hurry and get done with that ast work...
[10:51] <sebersole>
[10:51] <gbadner> but still, a LoadPlan/ResultSetProcessor could be very granular; just needs some input
[10:51] <sebersole> yep
[10:51] <sebersole> i think the real decision at the moment
[10:51] <sebersole> is
[10:51] <sebersole> how to tie entity graphs in
[10:52] <sebersole> is that a different loadplan?
[10:52] <sebersole> or
[10:52] <sebersole> is the ResultSetProcessor simply sensitive to defined graphs
[10:53] <sebersole> what i envision inside ResultSetProcessor for subsequent fetches is important there i guess
[10:53] <sebersole> since i dont think that code is in place yet
[10:53] <sebersole> but
[10:53] <sebersole> going back to the Employee/Address example
[10:54] <-- hardy has left this server (Ping timeout: 252 seconds).
[10:54] <sebersole> the general idea was that ResultSetProcessor would use the EntityReturn to read the Employee data
[10:55] <sebersole> EntityReturn sees that it has a Fetch for Address
[10:55] <sebersole> so it would either (a) register that "address key" with the EntityFetch<Address>
[10:55] <sebersole> or
[10:55] <sebersole> (b) ask the EntityFetch<Address> to go ahead and load itself
[10:56] --> hardy has joined this channel (~hardy@redhat/jboss/hardy).
[10:56] <sebersole> (b) is closer to what we do today
[10:56] <sebersole> (a) would allow better use of batch loading for handling select fetches
[10:57] <sebersole> well not even better...
[10:57] <sebersole> it would allow use of it period
[10:57] <gbadner> digesting...
[10:57] <-- emmanuel-iThing has left this server (Quit: emmanuel-iThing).
[10:57] <sebersole> its a bit tyo follow
[10:58] <gbadner> "address key" would be the employee ID?
[10:58] <gbadner> or whatever to identify the employee's address(es)
[10:59] <gbadner> are you saying (b) would be a join fetch?
[10:59] <sebersole> "address key" would depend on the fk defined
[10:59] <gbadner> ok
[11:00] <sebersole> assumign a normal many-to-one, no it would be the addr_id value
[11:00] <sebersole> no, (b) is a subsequent select
[11:00] <sebersole> select * from employee where ...
[11:01] <sebersole> select * from address where id = ?
[11:01] <sebersole> again, thats depends on the definition for Address
[11:02] <sebersole> but asssuming a non-join, non-lazy association thats what it qwould mean above
[11:02] <gbadner> yes, the address fetch would be follow-on
[11:03] <sebersole> but
[11:03] <sebersole> what could be a nice improvement here...
[11:03] <sebersole> is to delay loading that Address until after all rows have been processed
[11:04] <sebersole> and use "batch fetch" approach
[11:04] <sebersole> i say "batch fetch" approach because its not technically a batch fetch
[11:04] <gbadner> yes, that would definitely simplify the processing
[11:04] <sebersole> we are just using the same sql and ResultSetProcessor we would have used for a batch fetch of Address
[11:05] <sebersole> wel not just simply
[11:05] <sebersole> its woud be a huge perf gain
[11:05] <gbadner> right, that's a whole lot simpler
[11:06] <sebersole> anyway, things to keep in mind
[11:06] <gbadner> I've spent a lot of time in that loader code
[11:06] <sebersole> as thats not there
[11:06] <sebersole> yes its confusing
[11:06] <gbadner> very
[11:06] <sebersole> it has all the stuff i dont like
[11:07] <sebersole> parallel arrays, subclassing out the whazoo
[11:07] <gbadner> have you thought about caching results/loadplans?
[11:07] <sebersole> wdym?
[11:07] <sebersole> you mean the query cache?
[11:07] <gbadner> yeah
[11:07] <sebersole> a little
[11:08] <sebersole> would be nice
[11:08] <sebersole> but i dont consider it a priority
[11:08] <sebersole> having a loadplan makes it more doable
[11:09] <sebersole> since you have a singular repsentation of the "shape"

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira