On 12/16/2016 09:30 PM, Bill Burke wrote:
Not every key is a simple UUID. Specifically any table that
references
a user: UserFederatedStorageProvider tables and offline tokens. THis
is because of the user storage spi. These tables must be searchable by
user id which can either be a UUID (keycloak database) or "f:" +
componentUUID + ":" + opaque_external_identifier (External User Storage
Provider database). So breaking up a User Storage SPI USER_ID isn't
going to help as you'll still need to create an index based on these
columns.
It is, though you're right that the composite index will be in the end introduced on
these columns. In the first step, the column will be on string fields (due to time
constraints) but constrained by the format you describe here. That will be optimized to
store uuid in binary form in 3.x (it is possible without JSON import and export, see
below). Index on the UUID part would be then operating on just 16 bytes. That is more
efficient than 36 or 2*36 bytes, as more a single database index page can fit in 2-4 times
as much records.
So the way I suggest is to have an index on field with UUID part (be it user or component)
and a field with the opaque part. No foreign key would be defined on these fields.
Also, there's been some talk of using this ID format for other
model types like Roles and Groups.
Also, Everything has to remain backward compatible. We will need to
support existing databases. So, can you support changing to a
completely different id format and type without screwing up existing
databases??? I don't think doing a full export/import to/from JSON is
going to work with large deployments.
Yes, I can. I've checked that conversion of UUID columns works in database-only mode
for PgSQL, Oracle, MySQL, MSSQL, DB2 being checked now. Result is that no JSON
import/export is needed for these DBs.
Will this @Nationalize Hibernate
annotation allow us to turn off the SQl Server implicit unicode
conversion and send everything via ascii by default?
This is only a partial solution as it depends not only on our end but also on database
settings that is out of KC control (how database has been created, what charset is the
default one for VARCHAR fields etc.). The approach where UUID is represented via character
representation (with character set conversions on the way even for 8-bit encodings)
suffers from performance penalty. Using raw 16 bytes for UUID storage does not.
Finally, I'm going to freak if I have to do a lot of refactoring
to the
User Storage SPI. A solution that keeps existing user storage and SPIs
backward compatible with very little work on the backend should be made
a priority instead of completely rearchitecting something so
fundamental to our datamodel.
I don't think that there would be a lot of refactoring though it won't be for
free. It will work with existing interfaces implemented by custom providers.
For this to be possible, it only needs to be stressed that the key format is strictly
"either UUID pointing of a user entity or the "f:" format" - as you
mentioned in the e-mail. I've sent a PR [1] to make this visible for custom User
Storage SPIs not only in documentation but also in server log. Please consider it for
merging.
--Hynek
[1]
https://github.com/keycloak/keycloak/pull/3666
On 12/16/16 2:12 PM, Hynek Mlnarik wrote:
> Hi All,
>
> Apologies for a long e-mail.
>
> TLDR: We need to define format of primary keys (UUID) so that it is possible to
transform the primary and foreign keys from VARCHAR(36) columns into database-native
binary format. This is in particular important to document in 2.5 in Storage Ids section
of the new User Storage SPI [1]
>
> Long version:
>
> I have looked at current database model and while generally it looks well, there is
an interesting issue with primary / foreign keys that causes performance degradation on
both Java and - most importantly - database side, causing even deadlocks for some
databases.
>
> The issue comes from database handling of IDs. IDs are in fact UUIDs, i.e. series of
bytes that are represented by Strings in KC JPA classes. Why this causes performance
degradation is due to various representations conversions (byte array vs String in Java)
and - most importantly for database - character set conversions. In the worst case, The
conversions occur both in JDBC driver and the database. The consequences are demonstrated
by Jira issue KEYCLOAK-3210 [2] when several simultaneous requests lead to deadlocking the
database.
>
> When JDBC driver obtains a string, it converts its representation into a character
set understood by database. Database might need to convert the string to a character set
specified for the column. This is nicely illustrated in MSSQL which makes distinction
between VARCHAR (8-bit codepages) and NVARCHAR (UCS-2 Unicode charset). IDs are VARCHARs
which is indeed an efficient way to use strings that consist of ASCII-only characters
(though not optimal for UUID, read below). However, if Unicode characters are to be
supported, MSSQL JDBC driver sends all character parameters as Unicode Strings [3].
Database then performs a conversion from Unicode to 8-bit charset which generally loses
some data. To account for this loss, instead of performing an index scan that directly
points to a requested row, it returns a range where the requested record should be. This
has fatal impact on performance. For more detailed analysis of the resulting plans, see
comment in [2].
>
> Clearly, the scan by id should be fast and the format of IDs in database matters. It
should avoid conversions as much as possible. Hence the following plan came:
>
> * In the result, all primary keys and corresponding keys have to be represented by
binary UUID data type (where supported, some databases represent UUID as e.g.
VARBINARY(16)), i.e. 16 bytes instead of 36 bytes
> * All keys in the JPA classes should be of type UUID, not String
>
> As a result, database indices get smaller (16 bytes of indexed data per record vs 36
bytes as it is now in case of 8-bit storage of characters), and no character conversions
are in place, hence the overall performance increases.
>
> This task is a slighty big one so it won't fit into KC 2.5 timeframe, but we
should definitely aim for 3.0.
>
> This has several preconditions:
> * The String keys in keycloak JPA classes, wherever used, are restricted to UUID
format
> * This format is documented and respected by all custom implementations, namely User
Storage implementations.
> * There exists conversion from String to native UUID for used databases (this is
certainly possible for PostgreSQL MSSQL, DB2, and MySQL, most likely others)
>
> Similarly to JPA, Infinispan classes should be revisited and optimized to save some
bytes that might be important for cluster replication by replacing String with UUIDs
>
> Thoughts?
>
> --Hynek
>
> [1]
https://github.com/keycloak/server_development_guide/blob/6b82f0868c0d7a1...
> [2]
https://issues.jboss.org/browse/KEYCLOAK-3210
> [3]
https://msdn.microsoft.com/en-us/library/ms378857(v=sql.110).aspx
>
> _______________________________________________
> keycloak-dev mailing list
> keycloak-dev(a)lists.jboss.org
>
https://lists.jboss.org/mailman/listinfo/keycloak-dev
_______________________________________________
keycloak-dev mailing list
keycloak-dev(a)lists.jboss.org
https://lists.jboss.org/mailman/listinfo/keycloak-dev