How To: Dynamic Transposition in HANA
Transposition Use Case
There are cases in which “horizontal” data in HANA needs to be transposed into a “vertical” format. Reasons for this transformation include:
1) Data may need to be combined from multiple fact tables where one is “horizontally” structured and another is “vertically” structured. A common structure is required in order to combine them (typically via Union with Constant Values) in a shared data model.
2) “Vertical” data structures are much easier to work with for BI data modeling and reporting. (Less fields are required in the data model, and slicing/dicing/sorting on dimension values make much more sense than awkward display of different measure columns).
SAP FICO is a typical example of a system that has tables which should be transposed. COSS, for example, has separate measure columns for each month of the year. In other cases measure columns are defined by cost elements or cost element groups.
Problems with Physical Table Transposition
Data modelers have the option of transposing the physical tables via ETL transformations. Such an approach has several problems associated with it:
1) Extra work is required to implement the ETL process.
2) Vertical transposition can increase the number of rows of a table by a few orders of magnitude. This can lead to a table with more than 2 billion rows, forcing partitioning (which required additional maintenance and complexity).
3) Performance degradation given the orders of magnitude increase in table size.
3) The SAP HANA landscape can become very complex if additional ETL transformations are required on SAP source data coming through SLT. This is not recommended.
Benefits of Dynamic Transposition
Data modelers can also implement dynamic transposition, which pivots the data on-the-fly. This approach has the following benefits:
1) No required change to physical table structures.
2) Potential for automated maintenance of one additional table require for transposition.
3) Quick, easy, intuitive and maintainable data modeling.
4) High-performance data model.
Following is a simplified, step-by-step example of how to implement Dynamic Transposition on HANA. It demonstrates how to transpose tables on-the-fly via a graphical Calculation View. Special thanks goes to Tony Cheung, Adam Baryla and Imran Rashid of SAP America/Canada for sharing details of this approach with me on a prior project.
(There are other approaches to dynamic transposition, such as using a combination of filters and UNION/AGGREGATION nodes. Such approaches may have comparable performance, but the “hard-coded”, complex design makes them difficult to develop and maintain.)
Transposition Data Flow
The following shows the “Source” table that we want to transpose as well as the desired “Target” structure. Columns M1 through M3 represent aggregated measure values at a montly level (such as in COSS).
Implementing this transposition requires the following steps in a Calculation View:
1) Create an “Identity Matrix” table with any required DIMENSION columns and, optionally, and required SORT_ORDER columns.
2) Create constant value calculated columns in each table (in implementation these will be projection nodes in a Calculation View).
3) Cross Join the two tables by joining on constant value columns.
4) Create output measure (i.e. AMOUNT) as a calculated column.
5) Query required columns (aggregate AMOUNT, group by DIMENSION, sort by SORT_ORDER)
Step 1: Create an Identity Matrix table
Please note: If this table is large and/or changed frequently, consider generating and loading an associated CSV file in a programming language of your choice. This approach can be automated and easily maintained.
DROP TABLE MONTH_IDENT_MTRX;
CREATE COLUMN TABLE MONTH_IDENT_MTRX
PRIMARY KEY (MONTH)
INSERT INTO MONTH_IDENT_MTRX VALUES (‘JAN’, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
INSERT INTO MONTH_IDENT_MTRX VALUES (‘FEB’, 2, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
INSERT INTO MONTH_IDENT_MTRX VALUES (‘MAR’, 3, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0);
INSERT INTO MONTH_IDENT_MTRX VALUES (‘APR’, 4, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0);
INSERT INTO MONTH_IDENT_MTRX VALUES (‘MAY’, 5, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0);
INSERT INTO MONTH_IDENT_MTRX VALUES (‘JUN’, 6, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0);
INSERT INTO MONTH_IDENT_MTRX VALUES (‘JUL’, 7, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0);
INSERT INTO MONTH_IDENT_MTRX VALUES (‘AUG’, 8, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0);
INSERT INTO MONTH_IDENT_MTRX VALUES (‘SEP’, 9, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0);
INSERT INTO MONTH_IDENT_MTRX VALUES (‘OCT’, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0);
INSERT INTO MONTH_IDENT_MTRX VALUES (‘NOV’, 11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0);
INSERT INTO MONTH_IDENT_MTRX VALUES (‘DEC’, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1);
For practice purposes, here is SQL for generating a small, sample fact table:
DROP TABLE SMALL_FACT;
CREATE COLUMN TABLE SMALL_FACT
PRIMARY KEY (K)
INSERT INTO SMALL_FACT VALUES (‘A’, 230, 232, 240, 300, 320, 280, 340, 290, 220, 242, 234, 320);
Step 2: Create constant value calculated columns in each table
a) Create a simple Analytic View that wraps SMALL_FACT called AN_FACT
b) Create a Calculation View called CA_TRANSPOSE_EX
c) Bring in AN_FACT and the table MONTH_IDENT_MTRX
d) Create projection nodes above AN_FACT and MONTH_IDENT_MTRX
e) Create the following calculated column in each projection node.
Step 3: Cross Join the two tables
Join both tables on calculated column ONE
Step 4: Create output measure (i.e. AMOUNT) as a calculated column.
(Please don’t literally copy the ellipsis below! The whole calculation goes from COL1/M1 to COL12/M12. Shortened below for display.)
Step 5: Query required columns (aggregate AMOUNT, group by DIMENSION, sort by SORT_ORDER)
A few comments
When examining the data model above, you might has some performance concerns. For example:
1) It’s generally recommended NOT to join on calculated columns in a Calculation View
2) Dynamic transposition can clearly still create very large intermediate result sets.
With regard to the join on calculated columns, you would be semi-correct. If you have the “ability” (i.e. if your data is loaded via ETL, or via SLT and you have the option of altering table structures (not always possible, i.e. if ABAP Accelerator expects same structure as source)), then you should of course consider creating the “ONE” column as a physical column. And even if this is not possible, “ONE” can still be created as a physical column on the “identity matrix” table.
However, in testing with large data sets on a previous project (hundred of millions of records in the fact table, hundreds of records in the identity matrix table), colleagues of mine (referenced at the beginning ) found very good performance, despite using calculated columns (and despite concerns regarding intermediate result set size). I suspect the very low cardinality of the identity matrix table, the simplicity of the calculation, and the fact that only certain columns are “multiplied out” lead to the great performance. As always, try various approaches for your scenario to meet an optimal functional/performance trade-off.
Thanks for reading, and I hope you find this How-To guide on Dynamic Transposition in HANA useful!