I need to dive a bit into the JPA 2.1 PaaS proposal but here are a few comments in line
int he mean time.
On 30 mars 2012, at 17:57, Steve Ebersole wrote:
The scope of multi-tenancy in JPA 2.1 is pretty much set it seems,
its
just details at this point. And that scope is decidedly different from
what I did for Hibernate. What we have in Hibernate is actually more
encompassing.
The terms being used to describe the differences are PaaS versus SaaS.
What we have in Hibernate falls into what the spec committee is calling
the SaaS category, essentially the ability to run one instance of the
application (and therefore one instance of Hibernate) simultaneously
handling multiple tenant requests. This is stated to be outside the
scope of multi-tenancy for JPA 2.1. Instead JPA 2.1 will support what
the committee is calling PaaS category, essentially multiple instances
of the application (and therefore multiple instances of Hibernate) would
be needed to handle multiple client requests.
Essentially, you could largely achieve what is planned for multi-tenancy
in JPA 2.1 using JPA 2.0 or earlier. But there will be some caveats to
multi-tenancy in JPA that we will need to deal with, which I wanted to
start discussing.
First I think we should completely not use the existing multi-tenancy
stuff in our support for JPA 2.1 multi-tenancy. Hibernate
multi-tenancy, as it exists already, has some "overheads" thAt simply
are not needed for the JPA use-case. For example, in Hibernate
multi-tenancy (or "Saas multi-tenancy"), there is a need to know which
of multiple available pools to use for getting Connections for use with
particular tenants. But that is simply not needed for JPA multi-tenancy
(or "PaaS multi-tenancy") because that running JPA provider instance can
handle only one tenant at a time. Similar deal with second level cache
keys: no need to encode tenant id into the key. That app will hold only
one tenant's data (unless multiple apps share the same cache instances,
but that's handle-able by prefixing the regions per app/tenant).
I am not sure how the PaaS multi-tenant config will look like exactly but if we can
automatically prefix the 2LC regions without adding an explicit mandatory property
that would be nice.
If we do end up leveraging the existing multi-tenancy support, we will
additionally need to know the type of multi-tenancy in effect. Here I
mean this question of SaaS versus PaaS rather than the question of
SEPARATE_DATABASE versus SEPARATE_SCHEMA versus DISCRIMINATED.
JPA is proposing some limitations to available features based on the
type of multi-tenancy (if any) is used in the application that we should
go back and look at, even for our SaaS stuff. For example, in the
SEPARATE_SCHEMA the current proposal is to disallow the deployment from
referencing schemas/catalogs at all, which is a good idea. There is
also proposed limitations on native-sql queries such that:
1) they would not be portably available if using DISCRIMINATED, although
persistence providers could choose to support it if they can handle
injecting any needed discriminator fragments
2) could not reference schemas/catalogs in the SEPARATE_SCHEMA, which is
totally inline with the limitation on naming schemas/catalogs in general
in SEPARATE_SCHEMA.
Personally, I find all of the proposed limitations outlined above
reasonable. There is a question of whether we would want to support
native-sql queries in the DISCRIMINATED case. Take an example where the
user supplies a query like [select ... from CUST_TBL c]. Obviously we
need to limit that to return just their data, so we would need to
instead pass a query along to the database like [select ... from
CUST_TBL c where c.tenant = 'acme'] (assuming 'acme' is the current
tenant identifier). It is a lot to bite off to support this
unilaterally, because often folks are resorting to native-sql queries
because they need to leverage some db-specific SQL syntax/feature and
that is essentially impossible to properly parse and interpret. One
option is that we could instead allow a placeholder: [select ... from
CUST_TBL c where c.tenant = ${tenant}]. But even here we still have
difficulty with being able to guarantee that we catch all cases. So in
the end, I am not sure this is something we should be doing.
There is also a case to be made to not allow different tenants to see each other data.
so unless we can catch all malicious SQL queries, it's better disabled by default.
though I could see an app developer knowing his environment and enabling the
option if needed.