.. _data-copy-manual:

======================================
Data Copy / Data Copy Intercube Manual
======================================

.. contents:: Table of Contents
   :local:
   :depth: 2

.. _introduction:

Introduction
============

The **mission** of ``tm1_bedrock_py``’s **Data Copy** module is to serve as the spiritual successor of TM1 Bedrock’s ``cube.data.copy``. Its primary goal is to **read data from TM1**, transform it using optional **mapping data**, and then **write the results back to TM1**.

At its core, it provides functionality akin to Bedrock’s ``cube.data.copy``—with **upgraded features**. Key design priorities include:

- **Ease of use**
- **Integration flexibility**
- **Auto-parallelization**

In its current form, Data Copy supports:

- **MDX inputs** (e.g., view MDX for data or set MDX for dimension filters)
- **Lists**
- **Dictionaries**
- **Callable function inputs** (for read, write, or clear tasks), alongside **default TM1py-based** operations.

TM1py’s asynchronous capabilities enable **parallel reading and writing** for improved performance.

----

.. _default-operations:

Default Operations
==================

Whether you use **Data Copy** or **Data Copy Intercube**, the **default sequence** of operations is:

1. **Read source data** from TM1 (via TM1py or a custom function).
2. **Retrieve metadata** for both query and cube(s) involved.
3. **Complete the output DataFrame** by appending the relevant dimension and element data (using source metadata).
4. **Read mapping information** for any cube-based mappings.
5. **Perform mapping transformations** (replace, map-and-replace, map-and-join) on the DataFrame.
6. **Redimensionalize** the data as needed (only applicable in *Data Copy Intercube*).
7. **Rearrange** the output DataFrame’s columns/dimensions to match the target cube (using target metadata).
8. **Apply a value function** (optional) for scaling or other calculations.
9. **Clear** the target cube (if so requested).
10. **Write** the (possibly transformed) data back to TM1.

----

.. _data_copy_data_copy_intercube_key_differences:

data_copy, data_copy_intercube key differences
==============================================

- **Data Copy**

  - The target **is the same** as the source cube.
  - ``target_cube_name`` and ``target_metadata`` do **not** apply.
  - **No column joining or redimensionalization** is applicable; dimensionality must remain unchanged.

- **Data Copy Intercube**

  - The target cube **differs** from the source cube.
  - Gathers information about the target into ``target_metadata``.
  - **Column joining and redimensionalization** (column addition or removal) are permitted because the two cubes’ dimensionalities may differ.

----

.. _parameter-reference:

Parameter Reference
===================

Below is a complete, itemized listing of all possible function inputs. Most parameters are **optional** and have defaults, but some are required based on context (e.g., certain parameters are mandatory when reading data, some only apply to Intercube operations, etc.).

.. _tm1-service-and-mdx-execution:

TM1 Service and MDX Execution
-----------------------------

1. **tm1_service** *(optional)*

   - A ``TM1Service`` object used by the default read/write/clear functionality.
   - **Developer comment**: If you rely on the built-in MDX executor and metadata collector, a valid ``tm1_service`` must be supplied.

2. **data_mdx** *(optional, string)*

   - An MDX query for the **source data** (valid view MDX format).

3. **data_mdx_list** *(optional, list[string])*

   - A list of MDX queries for the source data.
   - When multiple MDX queries are provided, **asynchronous reading** is enabled.
   - **Developer comment**: Either ``data_mdx`` or ``data_mdx_list`` **must** be specified (unless using purely custom callables).

4. **mdx_function** *(optional, callable)*

   - A **custom function** used to execute the MDX queries for data retrieval.
   - Defaults to ``tm1py``’s ``execute_mdx_dataframe`` (or ``execute_mdx_dataframe_async`` if multiple queries).

----

.. _metadata-functions:

Metadata Functions
------------------

1. **data_metadata_function** *(optional, callable)*

   - A **custom metadata function** (or a “value as function”) for retrieving source metadata.
   - If omitted, metadata is determined automatically.

2. **target_metadata_function** *(optional, callable; only in *Data Copy Intercube*)*

   - A **custom metadata function** (or value) for retrieving the target’s metadata.
   - If omitted, metadata is determined automatically.
   - **Developer comment**: In ``data_copy_intercube``, you must supply either ``target_cube_name`` or ``target_metadata_function`` so the system knows which target cube to work with.

----

.. _skipping-values-and-cells:

Skipping Values and Cells
-------------------------

1. **skip_zeros** *(optional, boolean; default = False)*

   - Skip zero values in source data.

2. **skip_consolidated_cells** *(optional, boolean; default = False)*

   - Skip consolidated cells in source data.

3. **skip_rule_derived_cells** *(optional, boolean; default = False)*

   - Skip rule-derived cells in source data.

----

.. _target-cube-definition-intercube:

Target Cube Definition (Intercube)
----------------------------------

- **target_cube_name** *(optional, string; only in *Data Copy Intercube*)*

- Name of the target cube to which data is written.
- Retrieved from target metadata if omitted.
- **Developer comment**: Either ``target_cube_name`` or ``target_metadata_function`` must be specified.

----

.. _shared_mapping_data:

Shared Mapping data
-------------------

**Example structure:**

.. code-block:: python

    shared_mapping = {
        "mapping_df":shared_mapping_df_name
        "mapping_mdx":"SELECT ... ON COLUMNS, ... ON ROWS ... FROM ... WHERE"
        "mapping_metadata_function":shared_mapping_metadata_function
    }

**Step-Specific Mapping Arguments**:

- ``mapping_mdx``: MDX that returns a mapping DataFrame.
- ``mapping_df``: Pass a DataFrame directly if you already have it.
- ``mapping_metadata_function``: Custom metadata function for the mapping. Pass if you already have it, makes DataFrame retrieval faster.

----

.. _mapping-steps:

Mapping Steps
-------------

**mapping_steps** *(optional, list[dict])*

- **Core transformation input** for Data Copy and Data Copy Intercube.
- A **list of nested dictionaries**, each describing one mapping step.
- Steps run **in the order** they’re listed.

**Example structure**:

.. code-block:: python

    mapping_steps = [
      {
        "method": "replace",
        "mapping": {
          "dim1tochange": {"source": "target"},
          "dim2tochange": {"source3": "target3", "source4": "target4"}
        }
      },
      {
        "method": "map_and_replace",
        "mapping_mdx": "SELECT ...",
        "mapping_metadata_function": some_metadata_function,
        "mapping_df": some_mapping_dataframe,
        "mapping_filter": {
          "dim": "element",
          "dim2": "element2"
        },
        "mapping_dimensions": {
          "dimname_to_change_in_source": "dimname_in_mapping"
        },
        "relabel_dimensions": false
      },
      {
        "method": "map_and_join",
        "mapping_mdx": "SELECT ...",
        "mapping_metadata_function": another_metadata_function,
        "mapping_df": another_mapping_dataframe,
        "mapping_filter": {
          "dim": "element",
          "dim2": "element2"
        },
        "joined_columns": ["col1", "col2"],
        "dropped_columns": ["col3", "col4"]
      }
    ]

.. _method-replace:

Method: **replace**
~~~~~~~~~~~~~~~~~~~

- **Most basic** transformation step.
- Replaces specified **source values** (in specified dimensions) with corresponding target values.
- Supports multiple **dimensions** and multiple **source→target** pairs per dimension.
- Required key: ``"method": "replace"``.

**Mapping example**:

.. code-block:: json

    "mapping": {
      "dim_to_map": { "source1": "target1", "source2": "target2" },
      "dim2_to_map": { "source3": "target3" }
    }

**Example transformation**:

**Input DataFrame**

.. code-block:: text

    period   dim_to_map   dim2_to_map   measure   value
    202202   source1      source3       value     100
    202203   source2      other         value     100
    202202   other        source3       value     100
    202203   other        other         value     100

**Output DataFrame**

.. code-block:: text

    period   dim_to_map   dim2_to_map   measure   value
    202202   target1      target3       value     100
    202203   target2      other         value     100
    202202   other        target3       value     100
    202203   other        other         value     100

----

.. _method-map_and_replace:

Method: **map_and_replace**
~~~~~~~~~~~~~~~~~~~~~~~~~~~

- **Joins** the source and mapping DataFrames on all **shared columns** (except those in ``"mapping_dimensions"``).
- **Replaces** source dimension values with **mapped** dimension values.
- Can optionally **relabel the column** name in the source DataFrame if ``"relabel_dimensions": true``.
- Required key: ``"method": "map_and_replace"``.
- You must define **either** step-specific or shared mapping data for it to work.

**Step-Specific Mapping Arguments**:

- ``mapping_mdx``: MDX that returns a mapping DataFrame.
- ``mapping_df``: Pass a DataFrame directly if you already have it.
- ``mapping_metadata_function``: Custom metadata function for the mapping.
- ``mapping_filter`` *(optional)*: Filters the mapping DataFrame locally. E.g., ``{"dimension": "element"}``.
- ``mapping_dimensions``: Dict specifying which dimension(s) to align between the source and mapping DataFrames.
- ``relabel_dimensions`` *(optional)*: Only applies to Data Copy Intercube. If ``true``, rename the dimension itself in the DataFrame after replacement.

**Example #1**

.. code-block:: python

    {
      "method": "map_and_replace",
      "mapping_df": employee_settings_df,
      "mapping_filter": {
        "Employee Settings": "ORG_UNIT_PARENT"
      },
      "mapping_dimensions": {
        "OrgUnit": "Value"
      },
      "relabel_dimensions": false
    }

**Source DataFrame**

.. code-block:: text

    Period   Employee    OrgUnit   Measure   Value
    202202   00000001    ABC123A   Value     120
    202202   00000002    ABC123A   Value     120
    202202   00000003    XYZ123X   Value     120
    202202   00000004    XYZ123X   Value     120

**Mapping DataFrame** (``employee_settings_df``)

.. code-block:: text

    Period   Employee    Employee Settings   Value
    202202   00000001    ORG_UNIT_PARENT     AAA0001
    202202   00000002    ORG_UNIT_PARENT     AAA0002
    202202   00000003    ORG_UNIT_PARENT     AAA0001
    202202   00000004    ORG_UNIT_PARENT     AAA0002
    202202   00000001    COST_POS_TYPE       AM
    202202   00000002    COST_POS_TYPE       KAM
    202202   00000003    COST_POS_TYPE       Team Leader
    202202   00000004    COST_POS_TYPE       Manager

**Output DataFrame**

.. code-block:: text

    Period   Employee    OrgUnit   Measure   Value
    202202   00000001    AAA0001   Value     120
    202202   00000002    AAA0002   Value     120
    202202   00000003    AAA0001   Value     120
    202202   00000004    AAA0002   Value     120

**Example #2**

.. code-block:: python

    {
      "method": "map_and_replace",
      "mapping_df": employee_to_orgunit,
      "mapping_dimensions": {
        "OrgUnit": "OrgUnit2"
      },
      "relabel_dimensions": true
    }

**Source DataFrame**

.. code-block:: text

    Period   Employee    OrgUnit   Measure   Value
    202202   00000001    ABC123A   Value     120
    202202   00000002    ABC123A   Value     120
    202202   00000003    XYZ123X   Value     120
    202202   00000004    XYZ123X   Value     120

**Mapping DataFrame** (``employee_to_orgunit``)

.. code-block:: text

    Period   Employee    OrgUnit2   Value
    202202   00000001    AAA0001    1
    202202   00000002    AAA0002    1
    202202   00000003    AAA0001    1
    202202   00000004    AAA0002    1

**Output DataFrame**

.. code-block:: text

    Period   Employee    OrgUnit2   Measure   Value
    202202   00000001    AAA0001    Value     120
    202202   00000002    AAA0002    Value     120
    202202   00000003    AAA0001    Value     120
    202202   00000004    AAA0002    Value     120

----

.. _method-map_and_join:

Method: **map_and_join**
~~~~~~~~~~~~~~~~~~~~~~~~

- **Joins** additional columns (``"joined_columns"``) from the mapping DataFrame to the source DataFrame based on shared dimensions.
- **Drops** columns listed under ``"dropped_columns"``.
- Required key: ``"method": "map_and_join"``.
- Must rely on either step-specific or shared mapping data.

**Step-Specific Mapping Arguments**:

- ``mapping_mdx``: MDX for retrieving the mapping DataFrame (if needed).
- ``mapping_df``: Use an existing DataFrame if you have one.
- ``mapping_metadata_function``: Metadata function for the mapping.
- ``mapping_filter``: Dict for filtering the mapping DataFrame.
- ``joined_columns`` *(required)*: Columns to join from the mapping into the source.
- ``dropped_columns`` *(optional)*: Columns to remove after joining.

**Example**

.. code-block:: python

    {
      "method": "map_and_join",
      "mapping_df": employee_settings_df,
      "joined_columns": ["Sales Channel", "Specialism"],
      "dropped_columns": ["Employee"]
    }

**Source DataFrame**

.. code-block:: text

    Period   Employee    Orgunit   Measure   Value
    202201   00000001    abc123a   cost001   10
    202201   00000002    abc123a   cost001   10
    202202   00000001    def345a   cost001   10
    202202   00000002    def345a   cost001   10

**Mapping DataFrame**

.. code-block:: text

    Period   Employee    Sales Channel   Specialism    Cost Position Type
    202201   00000001    temp            IT            KAM
    202201   00000002    perm            finance       AM
    202202   00000001    perm            engineering   Team Leader
    202202   00000002    perm            engineering   Manager

**Output DataFrame**

.. code-block:: text

    Period   Orgunit   Measure   Value   Sales Channel   Specialism
    202201   abc123a   cost001   10      temp            IT
    202201   abc123a   cost001   10      perm            finance
    202202   def345a   cost001   10      perm            engineering
    202202   def345a   cost001   10      perm            engineering

**Developer comments**:

- The column order will be further **rearranged** to match the **target cube**.
- **Numeric duplicates** (same dimensionality) can be **summed** if ``"sum_numeric_duplicates": true``.

----

.. _additional-intercube-parameters:

Additional Intercube Parameters
-------------------------------

The following parameters only apply to **Data Copy Intercube**, where source and target cubes differ in dimensionality or naming.

.. _source_dim_mapping:

1. **source_dim_mapping** *(optional, dict)*

   - Declares dimensions **present in the source** but **not present** in the target.
   - For each such dimension, specify an element to **filter**. Rows matching that element remain; all others are excluded. Then the dimension (column) is **dropped**.

   **Example**

   .. code-block:: text

       # Source DataFrame
       Period   Employee   Orgunit   Measure   Value
       202201   00000001   Total     cost001   1000
       202201   00000001   Total     cost002   2000
       202201   00000001   abc123a   cost001   10
       202201   00000001   def345a   cost002   20

       # source_dim_mapping
       {
         "Orgunit": "Total",
         "Measure": "cost001"
       }

       # Output DataFrame
       Period   Employee   Value
       202201   00000001   1000

.. _related_dimensions:

2. **related_dimensions** *(optional, dict)*

   - Defines relationships between **source dimension names** and **target dimension names**, preserving their **elements**.
   - Essentially **renames** the column in the DataFrame from source dimension to target dimension.

   **Example**

   .. code-block:: text

       # Source DataFrame
       Period   Employee   Orgunit   Measure   Value
       202201   00000001   Total     cost001   1000
       202201   00000001   Total     cost002   2000
       202201   00000001   abc123a   cost001   10
       202201   00000001   def345a   cost002   20

       # related_dimensions
       {
         "Employee": "Key Account Manager"
       }

       # Output DataFrame
       Period   Key Account Manager   Orgunit   Measure   Value
       202201   00000001             Total     cost001   1000
       202201   00000001             Total     cost002   2000
       202201   00000001             abc123a   cost001   10
       202201   00000001             def345a   cost002   20

.. _target_dim_mapping:

3. **target_dim_mapping** *(optional, dict)*

   - Declares dimensions **present in the target** but **not** in the (post-mapping) DataFrame.
   - For each missing dimension, a new column is **added** to the DataFrame, assigning a **single element** to all rows.

   **Example**

   .. code-block:: text

       # Source DataFrame
       Period   Employee   Orgunit   Value
       202201   00000001   Total     1000
       202201   00000001   Total     2000
       202201   00000001   abc123a   10
       202201   00000001   def345a   20

       # target_dim_mapping
       {
         "Lineitem": "Salary Costs",
         "Measure": "Value"
       }

       # Output DataFrame
       Period   Employee   Orgunit   Value   Lineitem       Measure
       202201   00000001   Total     1000    Salary Costs   Value
       202201   00000001   Total     2000    Salary Costs   Value
       202201   00000001   abc123a   10      Salary Costs   Value
       202201   00000001   def345a   20      Salary Costs   Value

----

.. _value-manipulation:

Value Manipulation
------------------

1. **value_function** *(optional, callable)*

- A function that **transforms** each numeric value in the DataFrame.

- Example:

  .. code-block:: python

      def multiply_by_two(x):
          return x * 2

- **Input DataFrame**

  .. code-block:: text

      Period   Employee   Orgunit   Value
      202201   00000001   Total     1000
      202201   00000001   Total     2000

- **Result**

  .. code-block:: text

      Period   Employee   Orgunit   Value
      202201   00000001   Total     2000
      202201   00000001   Total     4000

----

.. _clearing-options:

Clearing Options
----------------

1. **clear_target** *(optional, boolean; default = False)*
   - If ``True``, **clears** the target cube or slice before writing.

2. **clear_set_mdx_list** *(optional, list[string])*

   - A list of **set MDX** expressions for clearing.

   - **Example**

     .. code-block:: python

         clear_set_mdx_list = [
           "TM1FilterBylevel({TM1DrillDownMember({[Periods].[Fiscal Year].[2024]}, ALL, RECURSIVE)}, 0)",
           "{[Versions].[Versions].[Plan]}"
         ]

----

.. _performance-writing-modes:

Performance & Writing Modes
---------------------------

1. **async_write** *(optional, boolean; default = False)*
   - Use TM1py’s **asynchronous** write mode.

2. **use_ti** *(optional, boolean; default = False)*

   - Use an **unbound TurboIntegrator process** to perform the write.
   - Requires **admin privileges** and often yields higher performance.

3. **use_blob** *(optional, boolean; default = False)*

   - Use a **blob** for writing, also requiring **admin privileges**.
   - Typically **10× faster** than TI-based writes alone.

4. **increment** *(optional, boolean; default = False)*
   - If ``True``, **increments** the target cube values instead of overwriting them.

5. **sum_numeric_duplicates** *(optional, boolean; default = True)*
   - Whether to **sum** numeric values for rows that share **identical dimensionality**

----

.. _additional-keyword-arguments:

Additional Keyword Arguments
----------------------------

****kwargs** *(optional)*

- Extra keyword arguments passed to **custom callables** if needed.

----

.. _example_workflow:

Example Workflow
================

**Basic Data Copy Usage:**
--------------------------

.. code-block:: python

    from TM1py import TM1Service
    from TM1_bedrock_py import data_copy

    with TM1Service(address='localhost', user='admin', password='apple', ssl=True) as tm1:
        data_copy(
            tm1_service=tm1,
            data_mdx="SELECT ...",
            skip_zeros=True,
            target_cube_name="Target_Cube",
            mapping_steps=[{"method":"replace", "mapping":{"dimension":{"sourceelem":"targetelem"}}}],
            async_write=True
        )

**Function definitions with complete parameter lists and type declarations**
-----------------------------------------------------------------------------

.. code-block:: python

    def data_copy_intercube(
            tm1_service: Optional[Any],
            data_mdx: Optional[str] = None,
            mdx_function: Optional[Callable[..., DataFrame]] = None,
            data_mdx_list: Optional[list[str]] = None,
            skip_zeros: Optional[bool] = False,
            skip_consolidated_cells: Optional[bool] = False,
            skip_rule_derived_cells: Optional[bool] = False,
            target_cube_name: Optional[str] = None,
            target_metadata_function: Optional[Callable[..., DataFrame]] = None,
            data_metadata_function: Optional[Callable[..., DataFrame]] = None,
            mapping_steps: Optional[List[Dict]] = None,
            shared_mapping_df: Optional[DataFrame] = None,
            shared_mapping_mdx: Optional[str] = None,
            shared_mapping_metadata_function: Optional[Callable[..., Any]] = None,
            source_dim_mapping: Optional[dict] = None,
            related_dimensions: Optional[dict] = None,
            target_dim_mapping: Optional[dict] = None,
            value_function: Optional[Callable[..., Any]] = None,
            clear_set_mdx_list: Optional[List[str]] = None,
            clear_target: Optional[bool] = False,
            async_write: bool = False,
            use_ti: bool = False,
            use_blob: bool = False,
            increment: bool = False,
            sum_numeric_duplicates: bool = True,
            **kwargs
    ) -> None:
        pass

    def data_copy(
            tm1_service: Optional[Any],
            data_mdx: Optional[str] = None,
            mdx_function: Optional[Callable[..., DataFrame]] = None,
            data_mdx_list: Optional[list[str]] = None,
            skip_zeros: Optional[bool] = False,
            skip_consolidated_cells: Optional[bool] = False,
            skip_rule_derived_cells: Optional[bool] = False,
            data_metadata_function: Optional[Callable[..., DataFrame]] = None,
            mapping_steps: Optional[List[Dict]] = None,
            shared_mapping_df: Optional[DataFrame] = None,
            shared_mapping_mdx: Optional[str] = None,
            shared_mapping_metadata_function: Optional[Callable[..., Any]] = None,
            value_function: Optional[Callable[..., Any]] = None,
            clear_set_mdx_list: Optional[List[str]] = None,
            clear_target: Optional[bool] = False,
            async_write: bool = False,
            use_ti: bool = False,
            use_blob: bool = False,
            increment: bool = False,
            sum_numeric_duplicates: bool = True,
            **kwargs
    ) -> None:
        pass

.. _developer-comments:

Developer Comments
==================

- **If the default MDX executor and metadata collector** are used, you must pass a valid ``tm1_service``.
- **When applying** the asynchronous or TI-based writes, **administrator rights** may be required.
- **Column order** is eventually rearranged by ``dataframe_rearrange_dimensions`` to align with the target cube’s shape.
- **Numeric duplicates** can be automatically summed by enabling ``sum_numeric_duplicates``.
- **Either** ``data_mdx`` **or** ``data_mdx_list`` must be present (unless you supply purely custom read logic).
- For **Data Copy Intercube**, you must define either **``target_cube_name``** or **``target_metadata_function``**.

----

.. _conclusion:

Conclusion
==========

This manual describes **every aspect** of **tm1_bedrock_py**’s **Data Copy** and **Data Copy Intercube** modules. It details:

1. How to **read** source data (using MDX queries or custom functions)
2. **Transform** data through mapping steps, dimension manipulation, or value scaling
3. **Write** the resultant data to the same or a different cube (with optional clearance, asynchronous writing, TurboIntegrator, or blob usage)

By combining these building blocks, you can adapt your data-copy processes to a wide range of TM1 tasks while leveraging additional features like auto-parallelization, dimension re-labeling, and custom transformations.