Ready, SET, go – how does SQL server handle recursive CTE’s

This blogpost will cover some of the basics in recursive CTE’s and explain the approach done by the SQL Server engine.

First of all, a quick recap on what a recursive query is.

Recursive queries are useful when building hierarchies, traverse datasets and generate arbitrary rowsets etc. The recursive part (simply) means joining a rowset with itself an arbitrary number of times.

A recursive query is defined by an anchor set (the base rowset of the recursion) and a recursive part (the operation that should be done over the previous rowset).

The basics in recursive CTE

A recursive query helps in a lot of scenarios. For instance, where a dataset is built as a parent-child relationship and the requirement is to “unfold” this dataset and show the hierarchy in a ragged format.

A recursive CTE has a defined syntax – and can be written in general terms like this – and don’t run way because of the general syntax – a lot of examples (in real code) will come:

select result_from_previous.*
 from result_from_previous
 union all
 select result_from_current.*
 from set_operation(result_from_previous, mytable) as result_from_current

Or rewritten in another way:

select result_from_previous.*
 from result_from_previous
 union all
 select result_from_current.*
 from result_from_previous.*
 join mytable
 on condition(result_from_previous)

Another way to write the query (using cross apply):

select result_from_current.*
from result_from_previous
cross apply (
select result_from_previous.*
union all
select *
from mytable
where condition(result_from_previous.*)
) as result_from_current

The last one – with the cross apply – is row based and a lot slower than the other two. It iterates over every row from the previous result and computes the scalar condition (which returns true or false). The same row then gets compared to each row in mytable and the current row of result_from_previous. When these conditions are real – the query can be rewritten as a join. Why you should not use the cross apply for recursive queries.

The reverse – from join to cross apply – is not always true. To know this, we need to look at the algebra of distributivity.

Distributivity algebra

Most of us have already learned that below mathematics is true:

X x (Y + Z) = (X x Y) + (X x Z)

But below is not always true:

X ^ (Y x Z) = (X ^ Z) x (X ^ Y)

Or said with words, distributivity means that the order of operations is not important. The multiplication can be done after the addition and the addition can be done after the multiplication. The result will be the same no matter what.

This arithmetic can be used to generate the relational algebra – it’s pretty straight forward:

set_operation(A union all B, C) = set_operation(A, C) union all set_operation(B, C)

The condition above is true as with the first condition in the arithmetic.

So the union all over the operations is the same as the operations over the union all. This also implies that you cannot use operators like top, distinct, outer join (more exceptions here). The distribution is not the same between top over union all and union all over top. Microsoft has done a lot of good thinking in the recursive approach to reach one ultimate goal – forbid operators that do not distribute over union all.

With this information and knowledge our baseline for building a recursive CTE is now in place.

The first recursive query

Based on the intro and the above algebra we can now begin to build our first recursive CTE.

Consider a sample rowset (sampletree):

id parentId name
1 NULL Ditlev
2 NULL Claus
3 1 Jane
4 2 John
5 3 Brian

From above we can see that Brian refers to Jane who refers to Ditlev. And John refers to Claus. This is fairly easy to read from this rowset – but what if the hierarchy is more complex and unreadable?

A sample requirement could be to “unfold” the hierarchy in a ragged hierarchy so it is directly readable.

The anchor

We start with the anchor set (Ditlev and Claus). In this dataset the anchor is defined by parentId is null.

This gives us an anchor-query like below:

recursive CTE 1

Now on to the next part.

The recursive

 After the anchor part, we are ready to build the recursive part of the query.

The recursive part is actually the same query with small differences. The main select is the same as the anchor part. We need to make a self join in the select statement for the recursive part.

Before we dive more into the total statement – I’ll show the statement below. Then I’ll run through the details.

recursive CTE 2

Back to the self-reference. Notice the two red underlines in the code. The top one indicates the CTE’s name and the second line indicates the self-reference. This is joined directly in the recursive part in order to do the arithmetic logic in the statement. The join is done between the recursive results parentId and the id in the anchor result. This gives us the possibility to get the name column from the anchor statement.

Notice that I’ve also put in another blank field in the anchor statement and added the parentName field in the recursive statement. This gives us the “human readable” output where I can find the hierarchy directly by reading from left to right.

To get data from the above CTE I just have to make a select statement from this:

recursive CTE 3

And the results:

recursive CTE 4

I can now directly read that Jane refers to Ditlev and Brian refers to Jane.

But how is this done when the SQL engine executes the query – the next part tries to explain that.

The SQL engines handling

Given the full CTE statement above I’ll try to explain what the SQL engine does to handle this.

The documented semantics is as follows:

  1. Split the CTE into anchor and recursive parts
  2. Run the anchor member creating the first base result set (T0)
  3. Run the recursive member with Ti as an input and Ti+1 as an output
  4. Repeat step 3 until an empty result set is returned
  5. Return the result set. This is a union all set of T0 to Tn

So let me try to rewrite the above query to match this sequence.

The anchor statement we already know:

recursive CTE 5

First recursive query:

recursive CTE 6

Second recursive query:

recursive CTE 7

The n recursive query:

The union all statement:

This gives us the exactly same result as we saw before with the rewrite:

Notice that the statement that I’ve put in above named Tn is actually empty. This to give the example of the empty statement that makes the SQL engine stop its execution in the recursive CTE.

This is how I would describe the SQL engines handling of a recursive CTE.

Based on this very simple example, I guess you already can think of ways to use this in your projects and daily tasks.

But what about the performance and execution plan?


The execution plan for the original recursive CTE looks like this:

The top part of this execution plan is the anchor statement and the bottom part is the recursive statement.

Notice that I haven’t made any indexes in the table, so we are reading on heaps here.

But what if the data is more complex in structure and depth. Let’s try to base the answer on an example:

From the attached sql code you’ll find a script to generate +20.000 rows in a new table called complextree. This data is from a live solution and contains medical procedure names in a hierarchy. The data is used to show the relationships in medical procedures done by the Danish hospital system. It is both deep and complex in structure. (Sorry for the Danish letters in the data…).

When we run a recursive CTE on this data – we get the exactly same execution plan:

This is also what I would expect as the amount of data when read from heaps very seldom impact on the generated execution plan.

The query runs on my PC for 25 seconds.

Now let me put an index in the table and let’s see the performance and execution plan.

The index is only put on the parentDwId as, according to our knowledge from this article is the recursive parts join column.

The query now runs 1 second to completion and generates this execution plan:

The top line is still the anchor and the bottom part is the recursive part. Notice now the SQL engine uses the non-clustered index to perform the execution and the performance gain is noticeable.


I hope that you’ve now become more familiar with the recursive CTE statement and are willing to try it on your own projects and tasks.

The basics is somewhat straight forward – but beware that the query can become complex and hard to debug as the demand for data and output becomes stronger. But don’t be scared. As I always say – “Don’t do a complex query all at once, start small and build it up as you go along”.

Happy coding.

External links:

The with operator in T-SQL:

Recursive CTE’s from MSDN:

Wikipedia on distributivity:

Use of hierarchyid in SQL Server

I attended a TDWI conference in May 2016 in Chicago. Here I got a hint about the datatype hierarchyid in SQL Server which could optimize and eliminate the good old parent/child hierarchy.

Until then I (and several other in the class) haven’t heard about the hierarchyid datatype in SQL Server. So I had to find out and learn this.

Here’s a blogpost covering some of the aspects of the datatype hierarchyid – including:

  • Introduction
  • How to use it
  • How to optimize data in the table
  • How to work with data in the hierarchy-structure
  • Goodies


The datatype hierarchyid was introduced in the SQL Server as from version 2008. It is a variable length system datatype. The datatype can be used to represent a given element’s position in a hierarchy – e.g. an employee’s position within an organization.

The datatype is extremely compact. The storage is dependent in the average fanout (fanout = the number of children in all nodes). For smaller fanouts (0-7) the typical storage is about 6 x Log A * n bits. Where A is the average fanout and n in the total number of nodes in the tree. Given above formula an organization with 100,000 employees and a fanout of 6 levels will take around 38 bits – rounded to 5 bytes of total storage for the hierarchy structure.

Though the limitation of the datatype is 892 bytes there is a lot of room for extremely complex and deep structures.

When representing the values to and from the hierarchyid datatype the syntax is:

[level id 1]/[level id 2]/..[level id n]



The data between the ‘/ can be of decimal types e.g. 0.1, 2.3 etc.

Given two specific levels in the hierarchy a and b given that a < b means that b comes after a in a depth first order of comparison traversing the tree structure. Any search and comparison on the tree is done this way by the SQL engine.

The datatype directly supports deletions and inserts through the GetDescendant method (see later for full list of methods using this feature). This method enables generation of siblings to the right of any given node and to the left of any given node. Even between two siblings. NOTE: when inserting a new node between two siblings will produce values that are slightly less compact.

Hierarchyid in SQL Server how to use it

Given an example of data – see compete sql script at the end of this post to generate the example used in this post.

hierarchyid in SQL Server 1

The Num field is a simple ascending counter for each level member in the hierarchy.

There are some basic methods to be used in order to build the hierarchy using the hierarchy datatype.

GetRoot method

The GetRoot method gives the hierarchyid of the rootnode in the hierarchy. Represented by the EmployeeId 1 in above example.

The code and result could look like this:

hierarchyid in SQL Server 2

The value ‘0x’ from the OrgPath field is the representation of the string ‘/’ giving the root of the hierarchy. This can be seen using a simple cast to varchar statement:

hierarchyid in SQL Server 3

Building the new structure with the hierarchyid dataype using a recursive SQL statement:

hierarchyid in SQL Server 4

Notice the building of the path after the union all. This complies to the above mentioned syntax for building the hierarchy structure to convert to a hierarchyid datatype.

If I was to build the path for the EmployeeId 10 (Name = ‘Mads’) in above example it would look like this: ‘/2/2/’. A select statement converting the hierarchyid field OrgPath for the same record, reveals the same thing:

hierarchyid in SQL Server 5

Notice the use of the ToString method here. Another build in method to use for the hierarchyid in SQL Server.

GetLevel method

The GetLevel method returns the current nodes level with an index of 0 from the top:

hierarchyid in SQL Server 6

GetDescendant method

This method returns a new hierarchyid based on the two parameters child1 and child2.

The use of these parameters is described in the BOL HERE.

Below is showed some short examples on the usage.

Getting a new hierarchyid when a new employee referring to top manager is hired:

hierarchyid in SQL Server 7

Getting a new hierarchyid when a new hire is referring to Jane on the hierarchy:

hierarchyid in SQL Server 8

Dynamic insert new records in the hierarchy table – this can easily be converted into a stored procedure:

hierarchyid in SQL Server 9

Notice the new GetAncestor method which takes one variable (the number of steps up the hierarchy) and returns that levels Hierarchyid. In this case just 1 step up the hierarchy.

More methods

There are several more methods to use when working on a hierarchy table – as found on BOL:

GetDescendant – returns a new child node of a given parent. Takes to parameters.

GetLevel – returns the given level for a node (0 index)

GetRoot – returns a root member

ToString – converts a hierarchyid datatype to readable string

IsDescendantOf – returns boolean telling if a given node is a descendant of given parent

Parse – converts a string to a hierarchyid

Read – is used implicit in the ToString method. Cannot be called by the T-SQL statement

GetParentedValue – returns node from new root in case of moving a given node

Write – returns a binary representation of the hierarchyid. Cannot be called by the T-SQL statement.


As in many other scenarios of the SQL Server the usual approach to indexing and optimization can be used.

To help on the usual and most used queries I would make below two indexes on the example table:

hierarchyid in SQL Server 10

But with this like with any other indexing strategy – base it on the given scenario and usage.


So why use this feature and all the coding work that comes with it?

Well – from my perspective – it has just become very easy to quickly get all elements either up or down from a given node in the hierarchy.

Get all descendants from a specific node

If I would like to get all elements below Jane in the hierarchy I just have to run this command:

hierarchyid in SQL Server 11

Think of the work you would have to do if this was a non hierarchy structured table using only parent/child and recursice sql if the structure was very complex and deep.

I know what I would choose.


As seen above the datatype hierarchyid can be used to give order to the structure of a hierarchy in a way that is both efficient and fairly easy maintained.

If one should optimize the structure even further, then the EmployeeId and the ManagerId could be dropped as the EmployeeId is now as distinct as the OrgPath and can be replaced by this. The ManagerId is only used to build the structure – but this is now also given by the OrgPath.

Happy coding…

External references:

Hierarchyid from MSDN

Using hierarchyid from TechNet

Update SCD Type 2 dimension in one single transaction using only T-SQL

Merge logo new

Recently I got a request inside my organization to make sure that a SCD Type 2 dimension would keep track of the changes due to requrementes from the business.

This needed to be done in a single transaction in pure T-SQL code.

So – what to do and how to do it. Here’s one way.

The sourcetable looks like this:

SCD type 2 dimension

The request was to keep track of changes in the ManagerId according to CaseId.

I’ve created a SCD2 table like this:

CREATE TABLE [dbo].[CaseProjectManagerHistory](
	[dwid] [bigint] IDENTITY(1,1) NOT NULL,
	[CaseId] [int] NULL,
	[ManagerId] [int] NULL,
	[dwDateFrom] [date] NULL,
	[dwDateTo] [date] NULL,
	[dwIsCurrent] [bit] NULL,
	[dwChangeDate] [date] NULL

The fields are as follows:
dwid: Identifier for the table
CaseId: The caseid for the rows
ManagerId: The managerid for the row
dwDateFrom: The date from where the row is actual
dwDateTo: The date to where the row is actual
dwIsCurrent: Boolean that tells if the row is the current one or not
dwChangeDate: The date of the change (if the row has changed since the first write)

If you need to catch up on the history types in a dimension – then take a look at Kennie’s blogppost HERE.

First of all I started out with a merge statement that would insert all the new values not in the table and update the ones that needed update.

Something like this:

merge dbo.CaseProjectManagerHistory as target
	using (select CaseId, ManagerId, cast(getdate() as date) as startDate, datefromparts(2199,1,1) as endDate, 1 as [current], cast(getdate() as date) as changeDate from dbo.[Case]) as source
	on target.CaseId = source.CaseId
	when not matched by target
			insert (CaseId, ManagerId, dwDateFrom, dwDateTo, dwIsCurrent, dwChangeDate)
			values (source.CaseId, source.ManagerId, source.startDate, source.endDate, source.[current], source.changeDate)
	when matched 
		and target.dwIsCurrent = 1
		and exists (select source.CaseId, source.ManagerId
					select target.CaseId, target.ManagerId)
		and target.dwChangeDate <= source.ChangeDate
		and source.changeDate < target.dwDateTo
			update set dwIsCurrent = 0, target.dwChangeDate = source.changeDate, target.dwDateTo = dateadd(d,-1,source.startDate)

Those of you who haven’t tried and worked with a merge-statement – you can get the 101 from BOL here.

But this merge statement only inserts new rows and updates existing rows. The rows that are updated still needs to be in the table in order to fully apply to the SCD 2 rules.

This can be done by using the cluse ‘output’ from the merge-statement and then use the output rows to insert into the same table.

It will look like this:

insert into dbo.CaseProjectManagerHistory_demo (CaseId, ManagerId, dwDateFrom, dwDateTo, dwIsCurrent, dwChangeDate)
select CaseId, ManagerId, startDate, endDate, [current], changeDate 
from (
	merge dbo.CaseProjectManagerHistory_demo as target
	using (
			,cast(getdate() as date) as startDate
			,datefromparts(2199,1,1) as endDate
			,1 as [current]
			,cast(getdate() as date) as changeDate 
		where 1=1
			and caseid in (2005,2013,2015,2016,2019,2021,2023,2025,2027,2028)
			) as source
	on target.CaseId = source.CaseId
	when not matched by target -- indsæt nye rækker
			insert (CaseId, ManagerId, dwDateFrom, dwDateTo, dwIsCurrent, dwChangeDate)
			values (source.CaseId, source.ManagerId, source.startDate, source.endDate, source.[current], source.changeDate)
	when matched -- opdater eksisterende rækker
		and target.dwIsCurrent = 1
		and exists (select source.CaseId, source.ManagerId --filtrer kun på rækker der ikke allerede eksisterer i target
					select target.CaseId, target.ManagerId)
		and target.dwChangeDate <= source.ChangeDate
		and source.changeDate < target.dwDateTo
		update set dwIsCurrent = 0, target.dwChangeDate = source.changeDate, target.dwDateTo = dateadd(d,-1,source.startDate) 
		output $action ActionOut, source.CaseId, source.ManagerId, source.startDate, source.endDate, source.changeDate, source.[current]) as mergeOutput
where mergeOutput.ActionOut = 'UPDATE';

The mergestatement ‘output’ action is used to insert the same rows to the history table once more. The only change is the ‘end date’.

Happy coding!

Note: I did a short presentation with this at my workplace a few weeks ago, and here Kennie (l, b, t) told me that there is a bug in the merge statement that needs to be taken into account. Read more of that here.

Row level security in SQL Server 2016

With the release of SQL Server 2016 comes many great new features. One of these is the implementation of row level security in the database engine.

This blogpost will cover the aspects of this new feature – including:

  • Setup
  • Best practice
  • Performance
  • Possible security leaks


The row level security feature was released earlier this year to Azure – following Microsoft’s cloud-first release concept.

A past big issue with the SQL Server engine was that in only understands tables and columns. Then you had to simulate security using secured views, stored procedures or table value functions. The problem here was to make sure that there were no way to bypass them.

With SQL Server 2016, this is no longer an issue.

Now the SQL Server engine handles the security policy in a central controlled area.

row level security 1

Setup and best practice

The Row-level security is based on a special inline table valued function. This function returns either a single row with a 1 or no rows based on the users rights to that specific row.

Let us take an example:

First of all, I’ll create a database and some users to test with:

USE RowFilter;

 A table with examples and grant select to the new users:

CREATE TABLE dbo.SalesFigures (
[userCode] NVARCHAR(10),
[sales] MONEY)
INSERT  INTO dbo.SalesFigures
VALUES ('userBrian',100), ('userJames',250), ('userBrian',350)
GRANT SELECT ON dbo.SalesFigures TO userBrian
GRANT SELECT ON dbo.SalesFigures TO userJames

Now we’ll add a filter predicate function as below:

CREATE FUNCTION dbo.rowLevelPredicate (@userCode as sysname)
RETURN SELECT 1 AS rowLevelPredicateResult
WHERE @userCode = USER_NAME();

This illustrates that the current user must have associated records in order to get any results. Notice that the functions does not have access to the rows itself.

Furthermore the function can contain joins and lookup tables in the where clause – but beware of the performance hit here. Look further down this post for more info.

The last thing to do is to add a filter predicate to the table dbo.SalesFigures:

ADD FILTER PREDICATE dbo.rowLevelPredicate(userCode)
ON dbo.SalesFigures

That’s it.

Let’s test the results with the users added before:

 EXECUTE AS USER = 'userBrian';
SELECT * FROM dbo.SalesFigures;

This gives me 2 rows:

EXECUTE AS USER = 'userJames';
SELECT * FROM dbo.SalesFigures;

This gives me 1 row:

The execution plan shows a new filter predicate when this row level security is added:

To clean up the examples.

USE master;


Some might ask, “what about the performance – isn’t there a performance hit in this use of functions?”

The short answer is “It depends”.

If you only use a direct filter on the table there is very little to no impact on the performance. The filter is applied directly to the table as any other filter. Compared to the old way of doing the row filter with stored procedures or table valued functions this new approach is performing better.

If you plan to use lookup tables or joins in the predicate function, then you must beware of the helper tables’ indexes and how fast they can deliver data to the function. If the tables are large and slow performing (without indexes etc.) then you will experience bad performance in the row filter function. But that’s just like any other lookup or join that you might do in your solutions.

Best practices

There are some best practices given from Microsoft:

  • It is highly recommended to create a separate schema for the RLS objects (predicate function and security policy).
  • The ALTER ANY SECURITY POLICY permission is intended for highly-privileged users (such as a security policy manager). The security policy manager does not require SELECT permission on the tables they protect.
  • Avoid type conversions in predicate functions to avoid potential runtime errors.
  • Avoid recursion in predicate functions wherever possible to avoid performance degradation. The query optimizer will try to detect direct recursions, but is not guaranteed to find indirect recursions (i.e., where a second function calls the predicate function).
  • Avoid using excessive table joins in predicate functions to maximize performance.

Possible security leaks

This new row filter context can cause information leakage using some carefully codes queries.

Above example can be breached with the following query:

SELECT 1/([sales]-250) FROM dbo.SalesFigures
WHERE Usercode = 'userJames'

This will give an error: Divide by zero error encountered.

This will tell the user trying to access the table, that userJames has a sale of 250. So even though the row filter prevents users from accessing data that they are not allowed, hackers can still try to determine the data in the table using above method.


The new row level security feature has been very much a wanted feature for quite a while now, and with the function now in place, and planned to be released in the RTM version of SQL Server 2016, the DBA’s and other people working with security can use this out-of-the-box.

I hope this post makes a great start for you if you would like to try out the row level security function. Currently the feature is awailable in the latest CTP version (2.2) – which can be downloaded here:

The DBAs guide to stretch database

One of the new features in SQL Server 2016 – and there is a lot – is the ability to stretch the on premise databases to an Azure environment.

This blogpost will cover some of the aspects of this – including:

  • Primarily setup – how to get started
  • Monitoring state of databases that are in ‘stretch mode’
  • Daily work with stretch databases
  • Backup – what’s good to know

With the release of SQL Server 2016, the new feature called stretch database is also released. The feature lets you as a database administrator, make databases stretch (read: copy old data) to an Azure environment. The data is still able to respond to the normal queries that are used, in other way; there is no need to change the current setup for existing applications and other data-contracts to use this feature.

So when is the stretch database something you should consider

  • When you only sometimes need to query the historical data
  • The transactional data that are stored needs all historical data
  • The size of the database tables are growing out of control (but not an issue of bad design – then you need to take other actions…)
  • The backup times are too long in order to make the daily timeslots for maintenance

If you have one or more marks on the above list, then you have a database that are candidate for stretching into Azure.

A typical database in stretch mode are a transactional database with very large amounts of data (more than a billion rows) stored in a small number of tables.

The feature is applied to individual tables in the database – but a need for enabling the feature on database level is a prerequisite.

The limitations

No free goodies without limitations.

There are a list of limitations to a stretch database. Two types of limitations, datatypes and features.

The datatypes that are not supported for stretch database is:

  • filestream
  • timestamp
  • sql_variant
  • XML
  • geometry
  • geography
  • hierarchyid
  • CLR user-defined types (UDTs)

The features that are not supported:

  • Column Set
  • Computed Columns
  • Check constraints
  • Foreign key constraints that reference the table
  • Default constraints
  • XML indexes
  • Full text indexes
  • Spatial indexes
  • Clustered columnstore indexes
  • Indexed views that reference the table

Therefore, it is advisable to have an agreement with your developers if you plan to use the stretch feature. It is more likely that they can code without the above lists, but if they already have implemented features, and needs to work around them, then you are not in good standing for a while.


In order to handle and maintain the stretch feature the current user must be a member of the db_owner group and CONTROL DATABASE permissions is needed for enabling stretch on database level.

Setup – how to get started

First, get an Azure account. If you not already have one. Then…

A small change in sp_configure is needed to get the feature ready.

EXEC sp_configure 'remote data archive' , '1';

Enabling the database

It is a prerequisite to enable the database for stretch in order to enable its tables.

It is pretty straight forward – just right-click the database – choose tasks and select ‘Enable database for stretch’:

Then the SQL Server asks you to sign in to your Azure environment.

You need to choose a set of settings for your stretch database in Azure – including:

  • Location for the server
  • Credential for the server
  • Firewall rules

There is a summary page with all info – when complete, just hit ‘Finish’.

Note: the current applications and data-contracts are NOT able to access the data in Azure directly. The only way to access this data is through the normal on premise database. This database then makes the call to access the Azure database or not based on the current configuration and state of migration (see below for help in the latter).

Enabling tables for stretch

As easy as the database, so is the tables.

Right-click the table that you want to stretch – choose ‘Stretch’ and ‘Enable Stretch’.

Ass seen on the screenshot you can also here do the following tasks: Disable, Pause and Resume stretch. All 3 hopefully self-explainable.

Monitoring the state of databases and tables in stretch mode

There is released a list of Dynamic management views (DMVs) and updated to existing catalog views to help with the work of monitoring the state of stretch databases.

The DMV  sys.dm_db_rda_migration_status shows you the current state, in batches and rows, of the data in the stretched tables. For more information, refer to MSDN: sys.dm_db_rda_migration_status.

The catalog views sys.databases and sys.tables now also contains information about the stretch feature on each part respectively. See more as MSDN: sys.databases and sys.tables.

To view the remote databases and tables for stretch data use the two new catalog views sys.remote_data_archive_databases and sys.remote_data_archive_tables.

A big note for the current CTP 2.2 release:
This release only supports the stretch data for entire tables. This meaning that an architectural decision needs to be taken to move historical data to separate tables. I will assume that the final release will contain a query based configuration in order to find and detect the historical data to be moved to the Azure environment.

Backup and restore

The backup and restore is the same as before the stretch feature. The same strategy must be taken and also the same precautions for data storage in Azure.

One must keep in mind that the on premise backup only happens with on premise data.

The restore process adds a step to the checklist when restoring a database with stretch enabled.

Upon the end of restore a connection to the stretched database in Azure must be reestablished with the stored procedure sys.sp_reauthorize_remote_data_archive.

When this SP is executed, the vertical arrow on this illustration is reestablished:


The stretch database feature is a very nice and good feature to get with the release of SQL Server 2016. It enables the DBA to handle historical data and storage capacity without having the consult the developers and/or architects of new solutions. Also current applications and solutions can be configured to use this new feature.

This post makes a great place to begin with the stretch feature of SQL Server 2016. Personally, I hope that the final feature has a bit more configuration to handle the historical data.

Enlarge AdventureWorks2012


Just recently I had to have a big datawarehouse solution to test some performance optimization using BIML.
I could use the AdventureWorks2012 database, but I needed the clean datawarehouse tables in order to have minimum data maintennance when testing the BIML scripts.

I could not find it, and figures out it was faster to make my own.

So heavily inspired by this post from Jonathan Kehayias (blog), I’ve made a script that can be used to enlarge the dbo.FactInternetSales table.

The script creates a new table called dbo.FactInternetSalesEnlarged and copies data from dbo.FactInternetSales into it with a randomizer. Exploding the data to a 100 times bigger table – est. 6 mio rows.

Get the script here:


Happy coding 🙂

Split delimited string into rows

tangled string

On several occasions I’ve been assigned the task to split delimited string
into rows.

I’ve done this in different ways, but never thought about the performance or stability of the different approaches for doing this.

So here’s my 25 cents and findings.

My first solution to this was to code a function to traverse through the string and insert a new value to a temp table for every delimiter found in the string:

CREATE FUNCTION [dbo].[list_to_table] (
            @list varchar(4000)

            item int


IF CHARINDEX(',',@list) = 0 or CHARINDEX(',',@list) is null
    INSERT INTO @tab (item) VALUES (@list);

DECLARE @l_pos INt;

SET @c_pos = 0;
SET @n_pos = CHARINDEX(',',@list,@c_pos);

WHILE @n_pos > 0
    INSERT INTO @tab (item) VALUES (CAST(SUBSTRING(@list,@c_pos+1,@n_pos - @c_pos-1) as int));
    SET @c_pos = @n_pos;
    SET @l_pos = @n_pos;
    SET @n_pos = CHARINDEX(',',@list,@c_pos+1);

INSERT INTO @tab (item) VALUES (CAST(SUBSTRING(@list,@l_pos+1,4000) as int));


Then I ran into performance issues with very long strings – pushing me to find a better solution with more performance. I began to look into the xml-aproach for solving the issue – and ended up with this:

declare @string as nvarchar(4000)
	r.value('@value','int') as Kategori
from (
	select cast('<A value = "'+ replace(@string,',','"/><A value = "')+ '"/>' as xml
	) as xml_str) xml_cte 
cross apply xml_str.nodes('/A') as x(r)

Performance for the two different solutions is shown below:

String contains all numbers from 0 to 100 with comma as delimiter, machine 4 cores 16 gb ram and ssd.

Function: 7 ms (on average)
XML: 3 ms (on average)

No matter how long a string I send to the XML code it runs around 3 ms – the function just climbs and climbs in time (naturally).

Anybody who has done the same and found an even better way to dynamically split delimited string into rows and want to share?

Multidimensional ROLAP partitions and updatable columnstore indexes – the new black?


I came across a colleague of mine, who asked me if the new updatable columnstore index and ROLAP partitions in a Multidimensional cube is the new trend of fast and no-latency Business Intelligence.

Well – here is my 25 cents.

I’ll start with the updatable columnstore indexes.

With SQL Server 2014 Microsoft introduces updatable columnstore indexes. Which in short terms defines that the columnstore no longer has to be dropped/disabled when loading data to the table. When the new data is loaded to the table, it is therefore first loaded to a temporary deltastore where background processes splits it into rowgroups, does indexing and compression.

This progress can be observed through the DMV sys.column_store_row_groups. Books online documents that every rowgroup can have 3 statuses: OPEN, CLOSED and COMPRESSED.

OPEN: A rowgroup in read/write state that is accepting new records. An OPEN rowgroup is still in rowstore format and has not been compressed to columnstore format yet.

CLOSED: A rowgroup which is filled and therefore locked for read/write. This rowgroup still needs compression

COMPRESSED: A rowgroup that has been locked and compressed and a part of the CS index.

The first two states are not yet part of the CS Index – but resides as internal objects in the database. They are not to be found in sys.objects or sys.partitions. They are there… They can be found trough the sys.system_internals_partitions view.

The latter one above, sys.partitions, only shows row groups that are in COMPRESED state and can therefore give wrong answers if it’s used to for example calculate space used.

Therefore, when a query is executed against a table that contains a CS index that has rowgroups in OPEN or CLOSED state we will not get full benefit of the CS index, as the engine then needs to read from the rowgroups and not the CS index. Even worse is it when data already resides in the CS index and new data is loaded to the table. Then the engine needs to look at both the CS index and the rowgroups. There is no guarantee that the CS index contains all the data that is required for the query, so even if it is so, then a scan of the rowgroups are still executed.

One could wish that the background process of getting the rowgroups from state OPEN to COMPRESSED are blazing fast. But it is not. It is slow slow slow. A quick test on a virtual machine (4 cores 6 GB ram) shows that the background process of getting the rowgroups to COMPRESSED state is averaging with approx. 180.000 rows pr. second.

Next up – the ROLAP partitions in Multidimensional

I will not go through the explanation of building a ROLAP partitions as they are found in very good blogs around the BI universe.

But I’ll take a look at the way the SSAS engine gets the data from a ROLAP partition. Based in the dimensional relationships inside the SSAS cube the engine generates a t-sql statement to get the data.

This statement is not pretty and is not optimized in any way. It cannot be altered or user defined in any way. I have seen these queries in Xevents – and they do not perform very good overall.


If the data is fully loaded every night in the layer just before the cube, and this data is more than a couple of million rows, then the combination is not a good option. It will take way to long time for the background process to get the CS index up and running, resulting in very poor performance at the end users.

If the updated data is small (a couple of 100.000 rows) and is only loaded at nighttime, then there can be arguments to use the combination. But then again, why not just load the data to a normal table and process the corresponding partition.

So – it depends. Partly on the solution build, the environment and the architecture.

Happy coding.

SSISDB crash – Certificate, asymmetric key or private key data invalid


One of my longterm customers had a database-server crash a while ago. The server would not spin up after reboot. The SSISDB crash also gave problems.

The whole installation including tweeks and other stuff had to be re-attached to a new and fresh installation of SQL Server 2012.

It was s fairly young BI-server, so the damage was not that big after all.

Untill today…

SSISDB crash

The development of SSIS project to maintain the BI sollution had to be deployed to the SSISDB catalog.

I got an error like below:

The certificate, asymmetric key, or private key data is invalid. Changed database context to ‘SSISDB’.
(Microsoft SQL Server, Error: 15297)

SSISDB key invalid

Microsoft has defined the error but not yet documentet it – this link is displayed when I press the copy button in the dialog from above and paste into notepad:

I found this blogpost – which guided me to the sollution.
A place in his guide – he mentions this code:


But that this not do the trick – I got an error telling me that I could not force regenerate the key.
I tried to remove the ‘FORCE’ command and things started to happen upon execution.

I got a ‘command completed successfully’ and thought that now I could carry on with Gilberts guide. But but – again I ran into troubles.

I ran the above script but another error apeared. The login was allready existing on the server. But when browsing the login folder I then came to a halt.
I do not know what caused this, me scripting or the guy reinstating the server from the crash. But the login as mentioned above had changed name – not much – but still enough for me not to notice at first glanse.

The prefix and suffix with ‘##’ were gone from the login. The login had been changed from the correct ‘[##MS_SQLEnableSystemAssemblyLoadingUser##]’ to ‘[MS_SQLEnableSystemAssemblyLoadingUser]’.

I manually changed the login name and added the two ‘##’ in each end of the name – and voila!

Everything is now up and running again.


I got a good guide from Gilbert Quevauvilliers – thank you so much. I may have found out myself – but the time to come to the thought that the login had changed name somehow – that would have been the last thing I would check on the list of – well – alot of things.

I would hope that Microsoft could be a little more explicit in their product help – but as allways it’s a hard prioritized list of things for them to do, and they cannot do it all at once.

I’ve learned a but from this – and hope that You the reader could use this help.