Blog

Enlarge AdventureWorks2012

sql-banner

Just recently I had to have a big datawarehouse solution to test some performance optimization using BIML.
I could use the AdventureWorks2012 database, but I needed the clean datawarehouse tables in order to have minimum data maintennance when testing the BIML scripts.

I could not find it, and figures out it was faster to make my own.

So heavily inspired by this post from Jonathan Kehayias (blog), I’ve made a script that can be used to enlarge the dbo.FactInternetSales table.

The script creates a new table called dbo.FactInternetSalesEnlarged and copies data from dbo.FactInternetSales into it with a randomizer. Exploding the data to a 100 times bigger table – est. 6 mio rows.

Get the script here:

EnlargeAdventureWorksDW2012

Happy coding 🙂

SSIS expressions I often use

expression

If either you are doing your SSIS expressions by hand or using the BIML framework, you’ve came across the expressions and the expression-builder.

This is a helper list, with my most often used, and wich are allways forgotten when I need them, of my commonly used SSIS expressions.

Strings

Filename from fully qualified pathstring

  • RIGHT([FilePath],FINDSTRING(REVERSE([FilePath]),”\\”,1) – 1)

Folderpath from fully qualified pathstring

  • SUBSTRING([FilePath], 1, LEN([FilePath]) – FINDSTRING(REVERSE([FilePath] ), “\\” ,1 ) + 1)

Foldername from fully qualified pathstring

  • TOKEN[FilePath],”\\”,TOKENCOUNT([FilePath],”\\”) – 1)

This is only for SSIS2012 and onwards. The TOKEN and TOKENCOUNT expressions are not in prior versions of SQL Server

For prior versions of SQL Server:

  • SUBSTRING([FilePath],LEN([FilePath]) – FINDSTRING(REVERSE([FilePath]),”\\”,2) + 2,(LEN([FilePath]) – FINDSTRING(REVERSE([FilePath]),”\\”,1)) – (LEN([FilePath]) – FINDSTRING(REVERSE([FilePath]),”\\”,2)) – 1)

Replace empty strings

With SQL Server 2012 the new REPLACENULL function was implemented, making it alot easier to replace empty values.

  • REPLACENULL([ColumnName], [replace value])

For earlier versions of SQL Server

  • LEN([ColumnName]) == 0 ? [replace value] : [ColumnName]
  • ISNULL([ColumnName]) ? [replace value] : [ColumnName]

Date and time

Date from datetime

If you want to remove the time element in a datetime object, you should cast it to DT_DBDATE. But because that datatype is very inconvenient to use, you should cast it back to the original datatype. That will set the time to 0:00.

  • (DT_DATE)(DT_DBDATE)@[User::datetimeVariable]
  • (DT_DATE)(DT_DBDATE)[datetimeColumn]
  • (DT_DBTIMESTAMP)(DT_DBDATE)GETDATE()

Time from datetime

If you want to remove the date element in a datetime object, you should cast it to DT_DBTIME. And optional cast it to a string.

  • (DT_STR,8,1252)(DT_DBTIME)@[User::datetimeVariable]
  • (DT_STR,8,1252)(DT_DBTIME)[datetimeColumn]
  • (DT_STR,8,1252)(DT_DBTIME)GETDATE()

First day of the current month

If you want to get the first day of the current month, you take the current datetime and deduct the current day number (minus 1). Optional you can remove the time part:

  • DATEADD(“d”, -DAY(GETDATE()) + 1, GETDATE())
  • (DT_DBTIMESTAMP)(DT_DBDATE)DATEADD(“d”, -DAY(GETDATE()) + 1, GETDATE())

Last day of the current month

If you want to get the last day of the current month, you add 1 month and deduct the current day number. Optional you can remove the time part:

  • DATEADD(“d”, -DAY(GETDATE()), DATEADD(“m”, 1, GETDATE()))
  • (DT_DBTIMESTAMP)(DT_DBDATE)DATEADD(“d”, -DAY(GETDATE()), DATEADD(“m”, 1, GETDATE()))

And if you realy want the last second of the current month 30-06-2011 23:59:59

  • DATEADD(“s”, -1,DATEADD(“d”, -DAY(GETDATE()) + 1, DATEADD(“m”, 1, (DT_DBTIMESTAMP)(DT_DBDATE)GETDATE())))

Weeknumber of the month

1-june-2012 is weeknumber 23 in the year, but weeknumber 1 of the month june 2012.

  • (DATEPART(“ww”,[YourDate]) – DATEPART(“ww”,DATEADD(“d”, -DAY([YourDate]) + 1, [YourDate]))) + 1

Datetime as concatenated string

  • (DT_STR, 4, 1252)DATEPART(“yyyy”, @[System::StartTime]) +
    RIGHT(“0” + (DT_STR, 2, 1252)DATEPART(“mm”, @[System::StartTime]), 2) +
    RIGHT(“0” + (DT_STR, 2, 1252)DATEPART(“dd”, @[System::StartTime]), 2) +
    RIGHT(“0” + (DT_STR, 2, 1252)DATEPART(“hh”, @[System::StartTime]), 2) +
    RIGHT(“0” + (DT_STR, 2, 1252)DATEPART(“mi”, @[System::StartTime]), 2) +
    RIGHT(“0” + (DT_STR, 2, 1252)DATEPART(“ss”, @[System::StartTime]), 2)

I just got 15.000 new colleagues

rehfeldims

Guess what. I just got 15.000 new colleagues.

Rehfeld Partners is to be acquired by IMS Health. IMS Health is a leading global information and technology services company, with more than 60 years experience, providing clients in the healthcare industry with comprehensive solutions to measure and improve their performance.

Rehfeld has long history – back in 1998 it started with the current directors Ditlev Moltke {linkedin} and Claus Rehfeld {linkedin}. From the start Rehfeld has focused on business intelligence. The foundation started in the healthcare part of Denmark and has expanded ever since to be the biggest Business Intelligence consulting company in Denmark.

Now a new era is about begin. Ditlev and Claus is handing over the company to IMS Health, including all staff.

Am I scared you might ask? A bit, but positively – absolutely. It is a big step for a company like Rehfeld to be part of the globally biggest information and technology company. We are adding our professionalism and know-how about business intelligence, knowledge of the Nordic customers and a strong technologic staff to IMS Health and IMS Health is adding international experience, good contacts with the life science indutry and even more technology.

Rehfeld and the staff is going to do excactly the same job in the future. Nothing has changed there. We are going to get new processes and new way of doing the work, but the work and projects are the same. Still the best in class on delivering business intelligence to the Denmark in order to make Denmark a more effective society.

I’m so looking forward to the next page of my work-life. It is going to be thrilling, fast and good.

Fully automate the BIML expansion

Mist 4.0

With all the BIML code done and the production scenarios start to emerge, it would be great to have the BIML code automatic expanded and create new packages if there are changes to the underlying metadata. Either configured manually or directly from the sources. Also known as BIML expansion.

With the MIST application from Varigence – this is possible through the command line util that ships with the installation.

When the installation of MIST has finished, you’ll find a new .exe-file in the installation folder called bimlc.exe.

This file is the core engine of the command line util.

According to the online documentation found here – the command line util is pretty straight forward.
The observant reader will notice that the link points to a documentation for hadron.exe – this is the old name for the same tool. See this documentation from Varigence.

The syntax is


bimlc.exe -s="<sourcefile>" -t="<targetfolder>"

I’ve made a quick demo biml-project with below code:


<# var meta_connectionstring = "Provider=SQLNCLI11;Server=[myServer];Initial Catalog=[myDatabase];Integrated Security=SSPI;"; #>
<#
DataTable tables;
tables = ExternalDataAccess.GetDataTable(meta_connectionstring, "[dbo].[storedProcedureThatGivesMeTheData]");
#>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="BIMLMETA" CreateInProject="true" ConnectionString="Provider=SQLNCLI11;Data Source=[myServer];Integrated Security=SSPI;Initial Catalog=[myDatabase]" />
</Connections>
<Packages>
<# foreach(DataRow table in tables.Rows) { #>
<Package Name="SSIS package <#=table["schema"] #>_<#=table["table"] #>">
</Package>
<# } #>
</Packages>
</Biml>
<#@ import namespace="System.Data" #>
<#@ import namespace="System.Linq"#>

This file is called AutoCompile.biml, and I want to put the generated files in c:\AutoCompilePackages. Then the commandline would be:

bimlc.exe -s="c:\AutoCompile\AutoCompile.biml" -t="c:\AutoCompilePackages"

When I then press ENTER the compiler starts up and does it’s thing:

BIMLAutoCompileCommand

The output tells us that there are 9 packages created in each of its own projects.

I can find the compiled projects and packages in my defined output folder.

BIMLAutoCompileOutput

And the content of each folder.

BIMLAutoCompileOutputContent

Thats it. Easy job 🙂 thanks to Varigence and their MIST product.

Referencing the same custom script task in SSIS with BIML

Biml

In these times of data warehouse automation and fast track SSIS, BIML has become a huge contributor to my daily work.
On one of my most recent projects we had a custom script task in SSIS that we needed to implement in the same package several times.

The code of the custom script is in its own biml-file in the project and referenced from the main biml-file.

We kept getting a mysterious error when executing the package in debug mode from Visual Studio. (No warning upon building the packages).

After alot of debugging and hair pulling hours, we finally got our arms around the bug.

The error was that the ProjectCoreName of the new 3 script tasks in the same package had the same name.

My learning from this, and yes, I’m not a .NET developer, is that upon executing the custom script task, the SSIS engine must compile the code and store it for usage upon handling the data. This storage is shared with the whole package, and therefore cannot contain the same .NET project name. [Written from my personal understading – correct me if I’m wrong].

So for any future BIML users who wants to add the same custom script task to the BIML project – remember to add a variable to the ProjectCoreName of your biml-script.

This can be done fairly easy, as your biml-scripts can relate to the variables calling the biml-file. Just add the standard

<#=variablename#>

to he ProjectCoreName-tag of your biml-file containing the custom script.

Happy BIML-ing…

Install SQL Server in 5 minutes

sql server 2014 logo

You know this everybody – you need to install a whole new SQL Server on the customers Windows Server.
And the customer doesn’t have the time or money to wait for you to click around the GUI and sit around and watch the progress-bar move every second. You can actually install SQL Server in 5 minutes.

So why not do this in a fully automatic way and in the same time save both effort and expensive time doing so?

Using the command prompt

There is a command line util that gives you the ability to write a very very very long command and in that way install the SQL Server from there. But I personally do not want to sit around and write these commands every time.

Using config files

A very more efficient and reusable way is to make your own configuration file and use this every time you need to install a new SQL Server.

When you click your way through the UI to install the SQL Server the UI actually make a config file and then use this file to install using a very small commandline:

setup.exe /configurationfile=config.ini (if ‘config.ini’ is your config-file).

The configurationfile is easy to model around with and in that way you can have your own personal config-file that can be adapted to every environment and every customer. And if the customer for some reason needs to reinstall the server again, they just need the media and the config-file.

The upside

When you use configurationfiles and the command line the installation can be done in, for my example on my laptop, in aprox. 5 minutes.

2014-11-04 21_18_27-Summary.txt - Notepad

How long do you normally wait for the installation to finish?

Downsides…

I really can’t find any. Once you’ve got the config-file up and running, it’s easy-peasy to install a new SQL Server. You can also reuse the config-file for a newer version of SQL Server – just add or modify the file to fulfill the needs for that version and that environment.

If you have encountered any downsides that I haven’t seen – then please leave a comment and I’ll get back to you.

Further reading

Microsoft has good documentation for this:

Command prompt installation

Configuration file installation

Power Query – October 2014 update

power bi for office 365

The Power Query tool has just been updated and there are some pretty neat features rolled out:

  1. New Transfomation: Use Headers as First Row.
  2. Column Name field in Merge Columns dialog.
  3. Quick Access Toolbar in Query Editor dialog.
  4. Warning for stale data previews in new queries.
  5. Function Invocation from Search.
  6. Preserve Excel Number formats in the worksheet on query refresh.

Microsoft has posted a video containing a brief overview of the update:

Along side this update – there is also an updated Salesforce connector for Power Query which you need to install after installing the above update.

There is a MSDN blog with more info here: MSDN

Downloads:

Power Query update

Salesforce connector

Split delimited string into rows

tangled string

On several occasions I’ve been assigned the task to split delimited string
into rows.

I’ve done this in different ways, but never thought about the performance or stability of the different approaches for doing this.

So here’s my 25 cents and findings.

My first solution to this was to code a function to traverse through the string and insert a new value to a temp table for every delimiter found in the string:

CREATE FUNCTION [dbo].[list_to_table] (
            @list varchar(4000)
            )

RETURNS @tab TABLE (
            item int
            )

BEGIN

IF CHARINDEX(',',@list) = 0 or CHARINDEX(',',@list) is null
BEGIN
    INSERT INTO @tab (item) VALUES (@list);
    RETURN;
END

DECLARE @c_pos INT;
DECLARE @n_pos INT;
DECLARE @l_pos INt;

SET @c_pos = 0;
SET @n_pos = CHARINDEX(',',@list,@c_pos);

WHILE @n_pos > 0
BEGIN
    INSERT INTO @tab (item) VALUES (CAST(SUBSTRING(@list,@c_pos+1,@n_pos - @c_pos-1) as int));
    SET @c_pos = @n_pos;
    SET @l_pos = @n_pos;
    SET @n_pos = CHARINDEX(',',@list,@c_pos+1);
END;

INSERT INTO @tab (item) VALUES (CAST(SUBSTRING(@list,@l_pos+1,4000) as int));

RETURN;
END

Then I ran into performance issues with very long strings – pushing me to find a better solution with more performance. I began to look into the xml-aproach for solving the issue – and ended up with this:

declare @string as nvarchar(4000)
select 
	r.value('@value','int') as Kategori
from (
	select cast('<A value = "'+ replace(@string,',','"/><A value = "')+ '"/>' as xml
	) as xml_str) xml_cte 
cross apply xml_str.nodes('/A') as x(r)

Performance for the two different solutions is shown below:

String contains all numbers from 0 to 100 with comma as delimiter, machine 4 cores 16 gb ram and ssd.

Function: 7 ms (on average)
XML: 3 ms (on average)

No matter how long a string I send to the XML code it runs around 3 ms – the function just climbs and climbs in time (naturally).

Anybody who has done the same and found an even better way to dynamically split delimited string into rows and want to share?

I’m speaking at SQL Saturday Copenhagen and Edinburgh

sqlsat275_web

With a little push from some of my colleagues, I have submitted a speaksersession to SQL Saturday in Copenhagen and Edinburgh.

To my big surprise both sessions (with the same subject) was chosen as a part of the sessions to be held. I’m very excited and thrilled that these possibilities was given to me.

sqlsat281_web

The SQL Saturday in Copenhagen is going to be my virgin session, where I’ll speak about a dynamic approach to partition kubes – bassed on the multidimensional model. The Tabular model’s dynamic partitioning can be read in a blogpost HERE.

I hope to see you at the events – it’s free. You can sign up here: Copenhagen Edinburgh.

en_USEnglish
da_DKDanish en_USEnglish