TM1 / SQL Asynchronous Parallel Processing Manual
Introduction
The mission of tm1_bedrock_py’s Asynchronous TM1/SQL Integration module is to provide a high-performance, robust, and scalable bridge between TM1 cubes and relational databases.
This module is designed for large-scale data operations where performance is critical. It achieves this by slicing large datasets into smaller, independent chunks and processing them concurrently across multiple worker threads.
The module provides two core, specialized orchestrators:
async_executor_tm1_to_sql(): For exporting and transforming data from a TM1 cube into a SQL database in parallel. It slices the work based on an input MDX query template.async_executor_sql_to_tm1(): For loading and transforming data from a single, large SQL table into a TM1 cube in parallel. It slices the data by paginating the SQL query itself.
Slicing Mechanisms: How Parallelization Works
The two executors use fundamentally different strategies to parallelize their workloads. Understanding this distinction is key for optimizing performance.
async_executor_tm1_to_sql (TM1 Parameter Slicing)
This function’s parallelization strategy is based on parameterizing an MDX query.
Generate Parameter Tuples: It executes the MDX set queries in
param_set_mdx_listto generate a list of unique element combinations (e.g., [(‘202401’,), (‘202402’,)]).Substitute into Template: It iterates through these tuples, substituting the element names into the
data_mdx_templatestring to create a unique MDX query for each worker.Assign Unique Work: Each worker thread is assigned one of these unique MDX queries and runs the full
load_tm1_cube_to_sql_tableprocess on that data slice.
This approach is ideal when the source data can be logically partitioned by one or more TM1 dimensions (like Period, Version, or Region).
async_executor_sql_to_tm1 (SQL Pagination Slicing)
This function’s parallelization strategy is based on SQL result set pagination. It does not use TM1 dimension elements for slicing.
Count Total Records: It runs a
COUNT(*)on the source SQL table to determine the total number of records.Calculate Slices: Based on the
slice_size, it calculates the number of chunks needed to process all records.Assign Paginated Queries: It populates the
sql_query_templatewith unique{offset}and{fetch}values for each worker, effectively giving each worker a “page” of the total result set.Execute in Parallel: Each worker thread executes its unique, paginated SQL query and runs the full
load_sql_data_to_tm1_cubeprocess on its data slice.
This approach is the correct choice for loading a single, large SQL table that cannot be easily partitioned before extraction.
Function Reference
async_executor_tm1_to_sql (TM1 -> SQL)
This function is a high-performance orchestrator designed to export large, sliceable datasets from a TM1 cube to a SQL table.
Parameter Reference
target_table_name (required, string)
The name of the destination table in the SQL database.
param_set_mdx_list (required, list[string])
The slicing definition. A list of MDX set queries that define the parameters for parallelization.
data_mdx_template (required, string)
The workload template. An MDX query string with $-prefixed placeholders matching the dimensions from param_set_mdx_list.
data_copy_function (optional, callable)
The function to be executed by each worker. Defaults to
bedrock.load_tm1_cube_to_sql_table.
clear_target (optional, boolean; default=False)
If True, the entire target SQL table is cleared once before any workers start.
sql_delete_statement (optional, string)
A specific SQL statement to use for clearing the target table.
If
clear_target=Trueandsql_delete_statement=None, it defaults to a truncate statement with MS SQL syntax. To make sure seamless clearing use a custom statement.
Example Workflow
import asyncio
from TM1_bedrock_py import bedrock
from TM1py import TM1Service
from sqlalchemy import types, create_engine
# 1. Define the slicing parameters (Actual and Budget versions)
params = ["[Version].[Version].['Actual', 'Budget']"]
# 2. Define the MDX template with a placeholder
mdx_tmpl = "SELECT {[Period].[Period].Leaves} ON 0 FROM [SourceCube] WHERE ([Version].[Version].[$Version])"
# 3. Define the explicit data types for the target SQL table
sql_types = {
'Version': types.VARCHAR(50), 'Period': types.VARCHAR(50), 'Value': types.FLOAT
}
# 4. Map dimension names of the TM1 cube to match the the names of the SQL table columns
related_dimensions={
"Version": "SalesVersion",
"Value": "SalesAmount"
}
# 5. Create the SQL engine
engine = create_engine("mssql+pyodbc://...", fast_executemany=True)
# 6. Run the executor
with TM1Service(address='localhost', user='admin', password='apple', ssl=True) as tm1:
asyncio.run(bedrock.async_executor_tm1_to_sql(
tm1_service=tm1,
sql_engine=engine,
target_table_name="SalesArchive",
related_dimensions=related_dimensions,
param_set_mdx_list=params,
data_mdx_template=mdx_tmpl,
max_workers=2,
sql_dtypes=sql_types,
skip_zeros=True,
async_write=False, # The executor handles the async part; the worker should be synchronous.
use_blob=True, # Defaults to False since True needs administrator privilege. Setting true significantly improves performance.
# Clearing the SQL table defaults to a truncate statement with MS SQL syntax, to make sure seamless clearing use a custom statement.
sql_delete_statement="TRUNCATE TABLE [SalesArchive]",
clear_target=True
))
async_executor_sql_to_tm1 (SQL -> TM1)
This function is a high-performance orchestrator for loading a single, large SQL table or query result into a TM1 cube.
Parameter Reference
sql_query_template (required, string)
The workload template. A SQL query string that must contain an
ORDER BYclause and two placeholders:{offset}and{fetch}.
sql_table_for_count (required, string)
The name of the SQL table to perform a
COUNT(*)on to calculate the total number of slices.
target_cube_name (required, string)
The name of the destination cube in TM1.
slice_size (optional, int; default=100000)
The number of rows each worker will fetch from the database.
data_copy_function (optional, callable)
The function to be executed by each worker. Defaults to
bedrock.load_sql_data_to_tm1_cube.
target_clear_set_mdx_list (optional, list[string])
A list of MDX set expressions defining the slice to be cleared in the target TM1 cube once before any workers start.
Example Workflow
import asyncio
from TM1_bedrock_py import bedrock
from TM1py import TM1Service
from sqlalchemy import create_engine
# 1. Define the SQL query template with ORDER BY and pagination placeholders
sql_tmpl = """
SELECT * FROM dbo.FactSales
ORDER BY SaleID
OFFSET {offset} ROWS FETCH NEXT {fetch} ROWS ONLY
"""
# 2. Set the name of the Value column to match the SQL table
sql_value_column_name = "SalesValue"
# 3. Create the SQL engine
engine = create_engine("mssql+pyodbc://...")
# 4. Run the executor
with TM1Service(address='localhost', user='admin', password='apple', ssl=True) as tm1:
asyncio.run(bedrock.async_executor_sql_to_tm1(
tm1_service=tm1,
sql_engine=engine,
sql_query_template=sql_tmpl,
sql_table_for_count="dbo.FactSales",
target_cube_name="Sales",
slice_size=250000,
max_workers=8,
sql_value_column_name=sql_value_column_name,
clear_target=True,
target_clear_set_mdx_list=["{[Version].[Version].[Actual]}"],
use_blob=True, # Defaults to False since True needs administrator privilege. Setting true significantly improves performance.
async_write=False # The executor handles the async part; the worker should be synchronous.
))
Developer Comments & Performance Tuning
Warning
ORDER BY Clause is Mandatory (for ``async_executor_sql_to_tm1``)
For the OFFSET/FETCH slicing to be reliable, the sql_query_template must include a deterministic ORDER BY clause. Without it, you may process duplicate rows or miss others entirely.
Warning
Database Compatibility (for ``async_executor_sql_to_tm1``)
The OFFSET ... FETCH syntax is standard for MS SQL Server, Oracle, and DB2. Other databases like PostgreSQL and MySQL use LIMIT {fetch} OFFSET {offset}. Adjust the sql_query_template accordingly.
Warning
Tested Databases
For the 1.1.1 release, this functionality has been explicitly tested against MS SQL Server and PostgreSQL. Behavior with other database backends is yet to be verified.
Note
Performance Tuning
``max_workers``: The most critical parameter. The optimal value depends on the source server’s (TM1 or SQL) ability to handle concurrent reads and the target server’s ability to handle concurrent writes.
Slicing Strategy: For TM1-to-SQL, choose slicing dimensions in
param_set_mdx_listthat create balanced workloads. For SQL-to-TM1, tune theslice_sizeto balance query overhead and memory usage.High-Speed Methods: Always use performance-enhancing features when possible, such as creating the
sql_enginewith fast_executemany=True for MS SQL and passing use_blob=True when writing to TM1.
Note
Clearing Behavior
In both executors, the clear_target functionality executes a single, global clear operation on the entire target (SQL table or TM1 cube slice) before any parallel workers begin their tasks.
Conclusion
The asynchronous SQL executors provide a powerful, scalable solution for integrating TM1 with relational databases. By choosing the correct executor for your data flow direction and understanding its specific slicing mechanism, you can build highly efficient and parallelized ETL processes capable of handling massive data volumes.