... |
*max-processing-kb* (default \-1) - setting determines the total size in kilobytes of batches that can be guaranteed for use by _one_ active plan and may be in addition to the memory held based on _max-reserve-kb_. Typical minimum memory required by Teiid when all the active plans are active is _\#active-plans*max-processing-kb_. The default value of \-1 will auto-calculate a typical max based upon the max heap available to the VM and max active plans. The auto-calculated value assumes a 64bit architecture and will limit processing batch usage to 10% of memory beyond the first 300 megabytes (which are assumed for use by the AS and other Teiid purposes). |
In systems where large intermediate results are normal (scrolling cursors or sorting over millions of rows) you can consider increasing the _max-processing-kb_ and decreasing the _max-reserve-kb_ so that each request has access to an effectively smaller buffer space. |
{info} |
With default settings including 20 active-plans and an 8GB VM size, then _max-processing-kb_ will be: (((1024-300) * 0.1) + (7 * 1024 * 0.1))/20 = 789.2 MB/20 = 39.46 MB or 40407 KB per plan. This implies a range between 0 and 789 MB that may be reserved with roughly 40 MB per plan. You should be cautious in adjusting max-processing-kb on your own. Typically it will not need adjusted unless you are seeing situations where plans seem memory constrained such as performing 3 pass sorts. |
{info} |
*max-file-size* (default 2GB) - Each intermediate result buffer, temporary LOB, and temporary table is stored in its own set of buffer files, where an individual file is limited to _max-file-size_ megabytes. Consider increasing the storage space available to all such files by increasing _max-buffer-space,_ if your installation makes use of internal materialization, makes heavy use of SQL/XML, or processes large row counts. |
*processor-batch-size* (default 256) - Specifies the target row count of a batch the query processor. A batch is used to represent both linear data stores, such as saved results, and temporary table pages. Teiid will adjust the processor-batch-size to a working size based upon an estimate of the data width of a row relative to a nominal expectation of 2KB. The base value can be doubled or halved up to three times three times depending upon the data width estimate. For example a single small fixed width (such as an integer) column batch will have a working size of *processor-batch-size * 8* rows. A batch with hundreds of variable width data (such as string) will have a working size of *processor-batch-size / 8* rows. |
|
h1. Big Data/Memory |
{info} Additional considerations are needed if large VM sizes and/or datasets are being used. Teiid has a non-negligible amount of overhead per batch/table page on the order of 100-200 bytes. If you are dealing with datasets with billions of rows and you run into OutOfMemory issues, consider increasing the _processor-batch-size_ to force the allocation of larger batches and table pages. A general guideline would be to double processor-batch-size for every doubling of the effective heap for Teiid beyond 4 GB - processor-batch-size = 512 for an 8 GB heap, processor-batch-size = 1024 for a 16 GB heap, etc. {info} |
|
*processor-batch-size* (default 256) - Specifies the max row count of a batch sent internally within the query processor. A batch in Teiid contain a set of Tuples, a Tuple is a Object list, which can be thought as a row in RDBMS table. Teiid has internal algorithm to calculate the max row count of a batch: * If the Tuple in batch contain lots of columns, the max row count of a batch will be *processor-batch-size* * If the Tuple in batch contain a few of columns, the max row count of a batch will be *processor-batch-size * 2* * If the Tuple in batch contain limited columns(eg, Tuple contain several int columns and a String column), the max row count of a batch will be *1024* * If the Tuple in batch contain very limited columns(eg, Tuple only contain one int column), the max row count of a batch will be *2048* Additional considerations are needed if large VM sizes and/or datasets are being used. Teiid has a non-negligible amount of overhead per batch/table page on the order of 100-200 bytes. Depending on the data types involved each full batch/table page will represent a variable number of rows (a power of two multiple above or below the processor batch size). If you are dealing with datasets with billions of rows and you run into OutOfMemory issues, consider increasing the _processor-batch-size_ to force the allocation of larger batches and table pages. |
*max-storage-object-size* (default 8288608 or 8MB) - The maximum size of a buffered managed object in bytes and represents the individual batch page size. If the _processor-batch-size_ is increased and/or you are dealing with extremely wide result sets (several hundred columns), then the default setting of 8MB for the _max-storage-object-size_ may be too low. The inline-lobs also account in this size if batch contains them. The sizing for _max-storage-object-size_ is in terms of serialized size, which will be much closer to the raw data size than the Java memory footprint estimation used for _max-reserved-kb_. _max-storage-object-size_ should not be set too large relative to _memory-buffer-space_ since it will reduce the performance of the memory buffer. The memory buffer supports only 1 concurrent writer for each _max-storage-object-size_ of the _memory-buffer-space_. Note that this value does not typically need to be adjusted. |
... |
The BufferManager is responsible for tracking both memory and disk usage by Teiid. Configuring the BufferManager properly is one of the most important parts of ensuring high performance. Execute following command on CLI to find all possible settings on BufferManager
/subsystem=teiid:read-resource
All the properties that start with "buffer-service" configure BufferManager. Some of them are described below. The Teiid engine uses batching to reduce the number of memory rows processed at a given time. The batch sizes may be adjusted to larger values as more clients will be accessing the Teiid server simultaneously.
max-reserve-kb (default -1) - setting determines the total size in kilobytes of batches that can be held by the BufferManager in memory. This number does not account for persistent batches held by soft (such as index pages) or weak references. The default value of -1 will auto-calculate a typical max based upon the max heap available to the VM. The auto-calculated value assumes a 64bit architecture and will limit buffer usage to 50% of the first gigabyte of memory beyond the first 300 megabytes (which are assumed for use by the AS and other Teiid purposes) and 75% of the memory beyond that.
With default settings and an 8GB VM size, then max-reserve-kb will at a max use: (((1024-300) * 0.5) + (7 * 1024 * 0.75)) = 5738 MB or 5875712 KB |
The BufferManager automatically triggers the use of a canonical value cache if enabled when more than 25% of the reserve is in use. This can dramatically cut the memory usage in situations where similar value sets are being read through Teiid, but does introduce a lookup cost. If you are processing small or highly similar datasets through Teiid, and wish to conserve memory, you should consider enabling value caching.
Memory consumption can be significantly more or less than the nominal target depending upon actual column values and whether value caching is enabled. Large non built-in type objects can exceed their default size estimate. If an out of memory errors occur, then set a lower max-reserve-kb value. Also note that source lob values are held by memory references that are not cleared when a batch is persisted. With heavy lob usage you should ensure that buffers of other memory associated with lob references are appropriately sized. |
max-processing-kb (default -1) - setting determines the total size in kilobytes of batches that can be guaranteed for use by one active plan and may be in addition to the memory held based on max-reserve-kb. Typical minimum memory required by Teiid when all the active plans are active is #active-plans*max-processing-kb. The default value of -1 will auto-calculate a typical max based upon the max heap available to the VM and max active plans. The auto-calculated value assumes a 64bit architecture and will limit processing batch usage to 10% of memory beyond the first 300 megabytes (which are assumed for use by the AS and other Teiid purposes).
With default settings including 20 active-plans and an 8GB VM size, then max-processing-kb will be: (((1024-300) * 0.1) + (7 * 1024 * 0.1))/20 = 789.2 MB/20 = 39.46 MB or 40407 KB per plan. This implies a range between 0 and 789 MB that may be reserved with roughly 40 MB per plan. You should be cautious in adjusting max-processing-kb on your own. Typically it will not need adjusted unless you are seeing situations where plans seem memory constrained such as performing 3 pass sorts. |
max-file-size (default 2GB) - Each intermediate result buffer, temporary LOB, and temporary table is stored in its own set of buffer files, where an individual file is limited to max-file-size megabytes. Consider increasing the storage space available to all such files by increasing max-buffer-space, if your installation makes use of internal materialization, makes heavy use of SQL/XML, or processes large row counts.
processor-batch-size (default 256) - Specifies the target row count of a batch the query processor. A batch is used to represent both linear data stores, such as saved results, and temporary table pages. Teiid will adjust the processor-batch-size to a working size based upon an estimate of the data width of a row relative to a nominal expectation of 2KB. The base value can be doubled or halved up to three times three times depending upon the data width estimate. For example a single small fixed width (such as an integer) column batch will have a working size of processor-batch-size * 8 rows. A batch with hundreds of variable width data (such as string) will have a working size of processor-batch-size / 8 rows.
Additional considerations are needed if large VM sizes and/or datasets are being used. Teiid has a non-negligible amount of overhead per batch/table page on the order of 100-200 bytes. If you are dealing with datasets with billions of rows and you run into OutOfMemory issues, consider increasing the processor-batch-size to force the allocation of larger batches and table pages. A general guideline would be to double processor-batch-size for every doubling of the effective heap for Teiid beyond 4 GB - processor-batch-size = 512 for an 8 GB heap, processor-batch-size = 1024 for a 16 GB heap, etc. |
max-storage-object-size (default 8288608 or 8MB) - The maximum size of a buffered managed object in bytes and represents the individual batch page size. If the processor-batch-size is increased and/or you are dealing with extremely wide result sets (several hundred columns), then the default setting of 8MB for the max-storage-object-size may be too low. The inline-lobs also account in this size if batch contains them. The sizing for max-storage-object-size is in terms of serialized size, which will be much closer to the raw data size than the Java memory footprint estimation used for max-reserved-kb. max-storage-object-size should not be set too large relative to memory-buffer-space since it will reduce the performance of the memory buffer. The memory buffer supports only 1 concurrent writer for each max-storage-object-size of the memory-buffer-space. Note that this value does not typically need to be adjusted.
memory-buffer-space (default -1) - This controls the amount of on or off heap memory allocated as byte buffers for use by the Teiid buffer manager measured in megabytes. This setting defaults to -1, which automatically determines a setting based upon whether it is on or off heap and the value for max-reserve-kb.
When left at the default setting the calculated memory buffer space will be approximately one quarter of the max-reserve-kb size. If the memory buffer is off heap and the max-reserve-kb is automatically calculated, then the memory buffer space will be subtracted out of the effective max-reserve-kb to keep the vm size consistent. |
memory-buffer-off-heap (default false) - Take advantage of the BufferManager memory buffer to access system memory without allocating it to the heap. Setting memory-buffer-off-heap to "true" will allocate the Teiid memory buffer off heap. Depending on whether your installation is dedicated to Teiid and the amount of system memory available, this may be preferable to on-heap allocation. The primary benefit is additional memory usage for Teiid without additional garbage collection tuning. This becomes especially important in situations where more than 32GB of memory is desired for the VM. Note that when using off-heap allocation, the memory must still be available to the java process and that setting the value of memory-buffer-space too high may cause the VM to swap rather than reside in memory. With large off-heap buffer sizes (greater than several gigabytes) you may also need to adjust VM settings.
Sun VM For Sun VMs the relevant VM settings are MaxDirectMemorySize and UseLargePages. For example adding: -XX:MaxDirectMemorySize=12g -XX:+UseLargePages to the VM process arguments would allow for an effective allocation of approximately an 11GB Teiid memory buffer (the memory-buffer-space setting) accounting for any additional direct memory that may be needed by the AS or applications running in the AS. |
max-buffer-space (default -1) - For table page and result batches the buffer manager will have a limited number of files that are dedicated to a particular storage size. However, as mentioned in the installation, creation of Teiid lob values (for example through SQL/XML) will typically create one file per lob once the lob exceeds the allowable in memory size of 8KB. In heavy usage scenarios, consider pointing the buffer directory on a partition that is routinely defragmented. By default Teiid will use up to 50GB of disk space. This is tracked in terms of the number of bytes written by Teiid. For large data sets, you may need to increase the max-buffer-space setting.
It's also important to keep in mind that Teiid has memory and other hard limits which breaks down along several lines in terms of # of storage objects tracked, disk storage, streaming data size/row limits, etc.
However handling a source that has tera/petabytes of data doesn't by itself impact Teiid in any way. What matters is the processing operations that are being performed and/or how much of that data do we need to store on a temporary basis in Teiid. With a simple forward-only query, as long as the result row count is less than 2^31, Teiid be perfectly happy to return a petabyte of data.
Each batch/table page requires an in memory cache entry of approximately ~ 128 bytes - thus the total tracked max batches are limited by the heap and is also why we recommend to increase the processing batch size on larger memory or scenarios making use of large internal materializations. The actual batch/table itself is managed by buffer manager, which has layered memory buffer structure with spill over facility to disk.
Using internal materialization is based on the buffermanager. Buffermanager settings may need to be updated based upon the desired amount of internal materialization performed by deployed vdbs.