Best practices using SUMMARIZE and ADDCOLUMNS - SQLBI (2024)

UPDATE 2022-02-11 : The article has been updated using DAX.DO for the sample queries and removing the outdated part.
UPDATE 2023-03-17 : Fixed an incorrect description before example #11.

Everyone using DAX is probably used to SQL query language. Because of the similarities between Tabular data modeling and relational data modeling, there is the expectation that you can perform the same operations as those allowed in SQL. However, in its current implementation DAX does not permit all the operations that you can perform in SQL. This article describes how to use ADDCOLUMNS and SUMMARIZE, which can be used in any DAX expression, including measures. For DAX queries, you should consider using SUMMARIZECOLUMNS, starting with the Introducing SUMMARIZECOLUMNS article. You can also read the All the secrets of Summarize article for more insights about inner workings of SUMMARIZE.

Extension Columns

Extension columns are columns that you add to existing tables. You can obtain extension columns by using both ADDCOLUMNS and SUMMARIZE. For example, the following query adds an Open Year column to the rows returned from the Store table.

EVALUATE ADDCOLUMNS ( Store, "Open Year", YEAR ( Store[Open Date] ))
StoreKeyStore CodeCountryStateNameSquare MetersOpen DateClose DateStatusOpen Year
101AustraliaAustralian Capital TerritoryContoso Store Australian Capital Territory5952008-01-01(Blank)(Blank)2,008
202AustraliaNorthern TerritoryContoso Store Northern Territory6652008-01-122016-07-07Closed2,008
303AustraliaSouth AustraliaContoso Store South Australia2,0002012-01-072015-08-08Restructured2,012
353AustraliaSouth AustraliaContoso Store South Australia3,0002015-12-08(Blank)(Blank)2,015

You can also create an extension column by using SUMMARIZE. For example, you can count the number of stores for each country by using the following query (please note that this query is not a best practice – you will see why later in this article).

EVALUATESUMMARIZE ( Store, Store[Country], "Stores", COUNTROWS( Store ))
CountryStores
Australia7
Canada7
France7
Germany10
Italy3
Netherlands5
United Kingdom7
United States27
Online1

In practice, an extension column is a calculated column created within the query.

Query Projection

In a SELECT statement in SQL, you can choose the column projected in the result, whereas in DAX you can only add columns to a table by creating extension columns. The only workaround available is to use SUMMARIZE to group the table by the columns you want to obtain in the output. As long as you do not need to see duplicated rows in the result, this solution does not have particular side effects. For example, if you want to get just the list of store names and their corresponding open date, you can write the following query.

EVALUATESUMMARIZE ( Store, Store[Name], Store[Open Date])
NameOpen Date
Contoso Store Australian Capital Territory2008-01-01
Contoso Store Northern Territory2008-01-12
Contoso Store South Australia2012-01-07
Contoso Store South Australia2015-12-08

Whenever you can create an extended column by using both ADDCOLUMNS and SUMMARIZE, you should always favor ADDCOLUMNS for performance reasons. For example, you can add the open year by using one of two techniques. First, you can just use SUMMARIZE.

EVALUATESUMMARIZE ( Store, Store[Name], Store[Open Date], "Open Year", YEAR ( Store[Open Date] ))
Store[Name]Store[Open Date]Open Year
Contoso Store Australian Capital Territory2008-01-012,008
Contoso Store Northern Territory2008-01-122,008
Contoso Store South Australia2012-01-072,012
Contoso Store South Australia2015-12-082,015td>

Second, you can use ADDCOLUMNS adding the Year Production column to the SUMMARIZE result.

EVALUATEADDCOLUMNS ( SUMMARIZE ( Store, Store[Name], Store[Open Date] ), "Open Year", YEAR ( Store[Open Date] ))
Store[Name]Store[Open Date]Open Year
Contoso Store Australian Capital Territory2008-01-012,008
Contoso Store Northern Territory2008-01-122,008
Contoso Store South Australia2012-01-072,012
Contoso Store South Australia2015-12-082,015

Both queries produce the same result.

However, you should always favor the ADDCOLUMNS version. The rule of thumb is that you should never add extended columns by using SUMMARIZE, unless it is required due to at least one of the following conditions:

  • You want to use ROLLUP over one or more grouping columns in order to obtain subtotals
  • You are using non-trivial table expressions in the extended column, as you will see in the “Filter Context in SUMMARIZE and ADDCOLUMNS” section later in this article

The best practice is that, whenever possible, instead of writing

SUMMARIZE( <table>, <group_by_column>, <column_name>, <expression> )

you should write:

ADDCOLUMNS( SUMMARIZE( <table>, <group by column> ), <column_name>, CALCULATE( <expression> ))

The CALCULATE you can see in the best practices template above is not always required, but you need it whenever the <expression> contains an aggregation function. The reason is that ADDCOLUMNS operates in a row context that does not automatically propagate into a filter context, whereas the same <expression> within a SUMMARIZE is executed into a filter context corresponding to the values in the grouped columns. The previous examples used a scalar expression over a column that was included in the SUMMARIZE output, so the reference to the column value was valid within the row context. Now, consider the following query that you have already seen at the beginning of this article.

EVALUATESUMMARIZE ( Store, Store[Country], "Stores", COUNTROWS( Store ))

If you rewrite this query by simply moving the Stores extended columns out of the SUMMARIZE into an ADDCOLUMNS function, you obtain the following query that produces the wrong result. This is because it returns the number of rows in the entire Store table for each row of the result instead of returning the number of stores for each country.

EVALUATEADDCOLUMNS ( SUMMARIZE ( Store, Store[Country] ), "Stores", COUNTROWS ( Store ))
CountryStores
Australia74
Canada74
France74
Germany74
Italy74
Netherlands74
United Kingdom74
United States74
Online74

In order to obtain the result you want, you have to wrap the expression for the Stores extended column within a CALCULATE statement. This way, the row context for Store[Country] is transformed into a filter context and the COUNTROWS function only considers the stores belonging to the country of the current row.

EVALUATEADDCOLUMNS ( SUMMARIZE ( Store, Store[Country] ), "Stores", CALCULATE ( COUNTROWS ( Store ) ))
CountryStores
Australia7
Canada7
France7
Germany10
Italy3
Netherlands5
United Kingdom7
United States27
Online1

Thus, as a rule of thumb, wrap any expression for an extended column within a CALCULATE function whenever you move an extended column out from SUMMARIZE into an ADDCOLUMN statement. Just pay attention to the caveats in the following section!

Filter Context in SUMMARIZE and ADDCOLUMNS

By describing the pattern of creating extended columns using ADDCOLUMNS instead of SUMMARIZE we mentioned that there are conditions in which you cannot do this substitution – the result would be incorrect. For example, when you apply filters over columns that are not included in the grouped column and then calculate the extended column expression using data coming from related tables, the filter context will be different between SUMMARIZE vs. ADDCOLUMNS.

The following query returns – by Product Category and Customer Country – the profit made by the top 2 customers for each product. Thus, a category might contain several customers, but no more than 2 per product:

EVALUATESUMMARIZE ( GENERATE ( 'Product', TOPN ( 2, Customer, [Sales Amount] ) ), Product[Category], Customer[Country], "Margin", [Margin])ORDER BY [Margin] DESC
Product[Category]Customer[Country]Margin
ComputersUnited States711,304.76
Home AppliancesUnited States401,760.06
TV and VideoUnited States288,672.69
Cell phonesUnited States205,773.77

In this case, applying the pattern of moving the extended columns out of a SUMMARIZE into an ADDCOLUMNS does not work, because the GENERATE used as a parameter of the SUMMARIZE returns only a few products and customers – while the SUMMARIZE only considers the sales related to these combinations of products and customers. Consider the following query and its result:

EVALUATEADDCOLUMNS ( SUMMARIZE ( GENERATE ( 'Product', TOPN ( 2, Customer, [Sales Amount] ) ), Product[Category], Customer[Country] ), "Margin", [Margin])ORDER BY [Margin] DESC
Product[Category]Customer[Country]Margin
ComputersUnited States1,440,461.21
Cell phonesUnited States612,576.04
Home AppliancesUnited States537,585.70
TV and VideoUnited States498,765.37
Cana…da

As you can see, the results are different as Margin is higher than the initial result. This is because this query is computing the Margin measure in a filter context that filters only Product[Category] and Customer[Country], ignoring the filter of the top 2 customers that used in the table argument of SUMMARIZE, where the result of GENERATED included all the columns of Product and Customer but just for the top 2 customers for each country. Only these customers were considered in the Margin calculation inside SUMMARIZE, because SUMMARIZE was using only those customers filtered by TOPN.

If you wrap the SUMMARIZE into an ADDCOLUMNS, the extended columns created in ADDCOLUMNS work on a filter context defined by Product[Category] and Customer[Country], considering many more sales than those originally used by the initial query. Thus, in order to generate the equivalent result by using ADDCOLUMNS, it is necessary to bring the same filter obtained by GENERATE in the following Margin measure evaluation. We can do that by storing the result of GENERATED in the ProductsCustomers variable that we reference in SUMMARIZE and we pass as a filter argument to CALCULATE to evaluate the Margin measure in ADDCOLUMNS. The use of KEEPFILTERS is crucial to keep the result of the context transition and filter by Product[Category] and Customer[Country] within the filter stored in ProductsCustomers.
This is the equivalent DAX query using ADDCOLUMNS for generating the extended column:

EVALUATEVAR ProductsCustomers = GENERATE ( 'Product', TOPN ( 2, Customer, [Sales Amount] ) )RETURN ADDCOLUMNS ( SUMMARIZE ( ProductsCustomers, Product[Category], Customer[Country] ), "Margin", CALCULATE ( [Margin], KEEPFILTERS ( ProductsCustomers ) ) )ORDER BY [Margin] DESC
Product[Category]Customer[Country]Margin
ComputersUnited States711,304.76
Home AppliancesUnited States401,760.06
TV and VideoUnited States288,672.69
Cell phonesUnited States205,773.77

The conclusion is that extended columns in a SUMMARIZE expression should not be moved out to an ADDCOLUMNS if the table used in SUMMARIZE has particular filters and the extended column expression uses columns that are not part of the output. Even though you can create an equivalent ADDCOLUMNS query, the result is more complex and there are no performance benefits in this refactoring. The more complex query has the same (not so good) performance as the SUMMARIZE query – both queries in this section require almost 10 seconds to run on the sample Contoso database.

ADDCOLUMNSBest practices using SUMMARIZE and ADDCOLUMNS - SQLBI (1)

Returns a table with new columns specified by the DAX expressions.

ADDCOLUMNS ( <Table>, <Name>, <Expression> [, <Name>, <Expression> [, … ] ] )

SUMMARIZEBest practices using SUMMARIZE and ADDCOLUMNS - SQLBI (2)

Creates a summary of the input table grouped by the specified columns.

SUMMARIZE ( <Table> [, <GroupBy_ColumnName> [, [<Name>] [, [<Expression>] [, <GroupBy_ColumnName> [, [<Name>] [, [<Expression>] [, … ] ] ] ] ] ] ] )

SUMMARIZECOLUMNSBest practices using SUMMARIZE and ADDCOLUMNS - SQLBI (3)

Create a summary table for the requested totals over set of groups.

SUMMARIZECOLUMNS ( [<GroupBy_ColumnName> [, [<FilterTable>] [, [<Name>] [, [<Expression>] [, <GroupBy_ColumnName> [, [<FilterTable>] [, [<Name>] [, [<Expression>] [, … ] ] ] ] ] ] ] ] ] )

ROLLUPBest practices using SUMMARIZE and ADDCOLUMNS - SQLBI (4)

Identifies a subset of columns specified in the call to SUMMARIZE function that should be used to calculate subtotals.

ROLLUP ( <GroupBy_ColumnName> [, <GroupBy_ColumnName> [, … ] ] )

CALCULATEBest practices using SUMMARIZE and ADDCOLUMNS - SQLBI (5)

Context transition

Evaluates an expression in a context modified by filters.

CALCULATE ( <Expression> [, <Filter> [, <Filter> [, … ] ] ] )

COUNTROWSBest practices using SUMMARIZE and ADDCOLUMNS - SQLBI (6)

Counts the number of rows in a table.

COUNTROWS ( [<Table>] )

GENERATEBest practices using SUMMARIZE and ADDCOLUMNS - SQLBI (7)

The second table expression will be evaluated for each row in the first table. Returns the crossjoin of the first table with these results.

GENERATE ( <Table1>, <Table2> )

TOPNBest practices using SUMMARIZE and ADDCOLUMNS - SQLBI (8)

Returns a given number of top rows according to a specified expression.

TOPN ( <N_Value>, <Table> [, <OrderBy_Expression> [, [<Order>] [, <OrderBy_Expression> [, [<Order>] [, … ] ] ] ] ] )

KEEPFILTERSBest practices using SUMMARIZE and ADDCOLUMNS - SQLBI (9)

CALCULATE modifier

Changes the CALCULATE and CALCULATETABLE function filtering semantics.

KEEPFILTERS ( <Expression> )

Best practices using SUMMARIZE and ADDCOLUMNS - SQLBI (2024)

References

Top Articles
Latest Posts
Article information

Author: Francesca Jacobs Ret

Last Updated:

Views: 6486

Rating: 4.8 / 5 (68 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Francesca Jacobs Ret

Birthday: 1996-12-09

Address: Apt. 141 1406 Mitch Summit, New Teganshire, UT 82655-0699

Phone: +2296092334654

Job: Technology Architect

Hobby: Snowboarding, Scouting, Foreign language learning, Dowsing, Baton twirling, Sculpting, Cabaret

Introduction: My name is Francesca Jacobs Ret, I am a innocent, super, beautiful, charming, lucky, gentle, clever person who loves writing and wants to share my knowledge and understanding with you.