[hibernate-dev] multi-tenancy in Hibernate and JPA 2.1

Mon Apr 2 05:07:56 EDT 2012

I need to dive a bit into the JPA 2.1 PaaS proposal but here are a few comments in line int he mean time.

On 30 mars 2012, at 17:57, Steve Ebersole wrote:

> The scope of multi-tenancy in JPA 2.1 is pretty much set it seems, its 
> just details at this point.  And that scope is decidedly different from 
> what I did for Hibernate.  What we have in Hibernate is actually more 
> encompassing.
> 
> The terms being used to describe the differences are PaaS versus SaaS. 
> What we have in Hibernate falls into what the spec committee is calling 
> the SaaS category, essentially the ability to run one instance of the 
> application (and therefore one instance of Hibernate) simultaneously 
> handling multiple tenant requests.  This is stated to be outside the 
> scope of multi-tenancy for JPA 2.1.  Instead JPA 2.1 will support what 
> the committee is calling PaaS category, essentially multiple instances 
> of the application (and therefore multiple instances of Hibernate) would 
> be needed to handle multiple client requests.
> 
> Essentially, you could largely achieve what is planned for multi-tenancy 
> in JPA 2.1 using JPA 2.0 or earlier.  But there will be some caveats to 
> multi-tenancy in JPA that we will need to deal with, which I wanted to 
> start discussing.
> 
> First I think we should completely not use the existing multi-tenancy 
> stuff in our support for JPA 2.1 multi-tenancy.  Hibernate 
> multi-tenancy, as it exists already, has some "overheads" thAt simply 
> are not needed for the JPA use-case.  For example, in Hibernate 
> multi-tenancy (or "Saas multi-tenancy"), there is a need to know which 
> of multiple available pools to use for getting Connections for use with 
> particular tenants.  But that is simply not needed for JPA multi-tenancy 
> (or "PaaS multi-tenancy") because that running JPA provider instance can 
> handle only one tenant at a time.  Similar deal with second level cache 
> keys: no need to encode tenant id into the key.  That app will hold only 
> one tenant's data (unless multiple apps share the same cache instances, 
> but that's handle-able by prefixing the regions per app/tenant).

I am not sure how the PaaS multi-tenant config will look like exactly but if we can
automatically prefix the 2LC regions without adding an explicit mandatory property
that would be nice.

> 
> If we do end up leveraging the existing multi-tenancy support, we will 
> additionally need to know the type of multi-tenancy in effect.  Here I 
> mean this question of SaaS versus PaaS rather than the question of 
> SEPARATE_DATABASE versus SEPARATE_SCHEMA versus DISCRIMINATED.
> 
> JPA is proposing some limitations to available features based on the 
> type of multi-tenancy (if any) is used in the application that we should 
> go back and look at, even for our SaaS stuff.  For example, in the 
> SEPARATE_SCHEMA the current proposal is to disallow the deployment from 
> referencing schemas/catalogs at all, which is a good idea.  There is 
> also proposed limitations on native-sql queries such that:
> 1) they would not be portably available if using DISCRIMINATED, although 
> persistence providers could choose to support it if they can handle 
> injecting any needed discriminator fragments
> 2) could not reference schemas/catalogs in the SEPARATE_SCHEMA, which is 
> totally inline with the limitation on naming schemas/catalogs in general 
> in SEPARATE_SCHEMA.
> 
> Personally, I find all of the proposed limitations outlined above 
> reasonable.  There is a question of whether we would want to support 
> native-sql queries in the DISCRIMINATED case.  Take an example where the 
> user supplies a query like [select ... from CUST_TBL c].  Obviously we 
> need to limit that to return just their data, so we would need to 
> instead pass a query along to the database like [select ... from 
> CUST_TBL c where c.tenant = 'acme'] (assuming 'acme' is the current 
> tenant identifier).  It is a lot to bite off to support this 
> unilaterally, because often folks are resorting to native-sql queries 
> because they need to leverage some db-specific SQL syntax/feature and 
> that is essentially impossible to properly parse and interpret.  One 
> option is that we could instead allow a placeholder: [select ... from 
> CUST_TBL c where c.tenant = ${tenant}].  But even here we still have 
> difficulty with being able to guarantee that we catch all cases.  So in 
> the end, I am not sure this is something we should be doing.

There is also a case to be made to not allow different tenants to see each other data.
so unless we can catch all malicious SQL queries, it's better disabled by default.
though I could see an app developer knowing his environment and enabling the
option if needed.