Welcome to the second month of the SQL Server Performance Tuning series. This month will be one of the most interesting and challenging ones - it’s the month where we are talking exclusively about Indexing, Indexing, and Indexing in SQL Server. But it’s worthwhile - trust me.
Today I’m covering so-called Heap Tables, and in the following 3 weeks we are talking about Clustered Indexes, Non-Clustered Indexes, and Indexing Strategies for your SQL Server database. Let’s talk about heap tables. Heap tables are tables without a Clustered Index. A table in SQL Server can have a Clustered Index, then it’s called a Clustered Table, and without a Clustered Index, it’s called a heap table.
In a heap table, the data is not sorted in any way, it’s just a pile of unordered, unstructured records. When you access a heap table through a SELECT statement, SQL Server will use a Table Scan operator in the Execution Plan, when you have no suitable Non-Clustered Index defined. There is no Table Seek operator available. That’s very important.
As you will see next week, when we talk about Clustered Indexes, you access a Clustered Index through a Clustered Index Scan and through a Clustered Index Seek operator. On a heap table you have only a Table Scan operator. And Table Scan means that you have to scan the whole table. And it will not scale with the amount of data you have. The more data you have, the longer the operation takes.
A Table Scan is alwas a O(n) operation more about the Big O notation - it will not scale when your table gets larger and large. Let’s have now a more detailed look on the advantages and disadvantages about heap tables in SQL Server.
Heap tables are very, very, very fast - for inserting data. As we have said earlier it’s just a pile of data - nothing more. When you crack out every page from a traditional phone book and place the invidiual pages on the desk in front of you, you have created a heap table. Inserting new phone book records into that heap table is very fast: you allocate a new page (of 8kb), write the new records onto that page, and finally put the page to the other ones in front of you. Finished. No sorting order has to be guaranteed.
It’s the same with SQL Server: a new page is allocated, the records are stored onto that page, and the page is assigned to the heap table. Finished. This is a very fast approach, because SQL Server doesn’t has to ensure any sorting order. It’s up to SQL Server where to put the new records.
For that reason it can be sometimes a good idea to leave some tables as heap tables in a database schema: tables on which you have a huge, parallel INSERT activity. Just think about your logging/auditing tables. But I’m never ever recommending using heap tables everywhere. There are only a few specific use cases where they make sense. But not everywhere!
Besides the advantage that heap tables are very fast for inserting data, there is also a huge amount of disadvantages, that you also have to incorporate in your thoughts when you decide to create a table as a heap table.
As a first disadvantage, a heap table leads to random I/O in the storage subsystem when you access the table data. Imagine you are executing a simple SELECT statement against your heap table. If the data isn’t yet cached in the Buffer Pool, SQL Server has to issue physical reads to your storage subsystem. These read will be random I/O, because the pages of the heap table are stored somewhere in your data files, but not next to each other.
If you are using traditional rotational storage (which is currently still the most common case), you will have performance problems on your storage level, because random I/O is slow, very slow. SSD drives are a big game changer here, because it’s doesn’t really matter anymore, if you are performing random or sequential I/O on SSDs. Both operations are almost the same regarding their speed (random I/O is still “a little” bit slower than sequential I/O).
Another big problem that you will definitely have with your heap tables are so-called Forwarding Records. Records that are stored on a heap table can move in some circumstances from one page to another one. If this occurs, SQL Server stores on the original page a forwarding record that points to the new location where the record is stored.
When you are accessing your data, SQL Server still accesses the original page, and fetches the record through the forwarding record, which is an additional page read that you need. And this will also slow down your read performance tremendously.
Heap tables have their eligibility - in some cases. I always recommend to create a Clustered Table (define a Clustered Index on it), and think in some specific use cases about it, if a heap table would serve your business needs in a better way (regarding the performance characteristics). If you want to have a more detailed explanation when heap tables can be suitable for your workload, I also highly recommend to read Thomas Kejser’s blog posting Clustered Indexes vs. Heaps about it. Thomas gives you a very controverse, but still valid insight when heap tables can make sense for you.
Next week I will talk more about Clustered Indexes in SQL Server. You will learn how to choose your right Clustered Key, and when they are good, and when they are bad. Stay tuned, and see you very soon.