Article Source
- Title: Understanding the SQL Server Columnstore Index
- Authors: Adam Jorgensen, Steven Wort, Ross LoForte, Brian Knight
Understanding the SQL Server Columnstore Index
Columnstore index is a new type of index introduced in SQL Server 2012. It is a column-based non-clustered index geared toward increasing query performance for workloads that involve large amounts of data, typically found in data warehouse fact tables.
This new type of index stores data column-wise instead of row-wise, as indexes currently do. For example, consider an Employee table containing employee data, as shown in Table 1.
Table 1: Sample Employee Table</p>
FirstName | LastName | HireDate | Gender |
Adam | Jorgensen | 5/9/2008 | Male |
Sherri | McDonald | 7/1/2009 | Female |
Brian | McDonald | 09/15/2009 | Male |
Jose | Chinchilla | 1/10/2010 | Male |
Tim | Murphy | 7/1/2009 | Male |
Tim | Moolic | 6/1/2008 | Male |
In a row-based index, the data in the Employee table is stored in one or more data pages, as shown in Figure 1.
In a column-based index, the data in the Employee table is stored in separate pages for each of the columns, as shown in Figure 2.
Performance advantages in columnstore indexes are possible by leveraging the VertiPaq compression technology, which enables large amounts of data to be compressed in-memory. This in-memory compressed store reduces the number of disk reads and increases buffer cache hit ratios because only the smaller column-based data pages that need to satisfy a query are moved into memory.</p>
For wide tables, such as those commonly found in data warehouses, columnstore indexes come in handy as you essentially reduce the amount and size of data needed to be accessed for any given query. For example, consider the following query: ```sql SELECT FirstName, LastName, FROM EmployeeTable WHERE HireDate >= '1/1/2010' ``` A *column-store index* is more efficient for this example because only one smaller-sized (*compressed*) data page is needed to satisfy the query. In this case, the columnstore index for the HireDate column satisfies the WHERE clause. A row-based index is not as efficient because it may need to load one or more larger-sized data pages into memory and read the entire rows, including columns not needed to satisfy the query. A larger-sized data page and additional unnecessary columns increases data size, memory usage, disk reads, and overall query time. Imagine if this table had 20 or more columns! Columnstore indexes have some requirements and limitations, as shown in Table 2.
Table 2: Requirements and Limitations of Columnstore Index
Description | Requirement/Limitation |
No. of columnstore indexes per table | 1 |
Index record size limit of 900 bytes | No limit/Not applicable |
Index limit of 16 key columns | No limit/Not applicable |
Table partitioning support | Yes, as a partition aligned index. |
Can be combined with row-based indexes? | Yes, if clustered index, all columns must be present in columnstore index. |
Update, Delete, Insert, Merge supported? | No, columnstore indexes are read-only but workarounds exist. Refer to Books Online: Best Practices: Updating Data in a Columnstore Index. |
Data types that can be included in a columnstore index | Char, varchar except varchar(max), nchar, nvarchar except nvarchar(max), decimal and numeric except with precision greater than 18 digits, int, bigint, smallint, tinyint, float, real, bit, money, smallmoney, all date and time data types except datetimeoffset with scale greater than 2. |
Data types that cannot be included in a columnstore index | Binary, varbinary, ntext, text, image, varchar(max), nvarchar(max), uniqueidentifier, rowversion, timestamp, sql_variant, decimal and numeric with precision greater than 18 digits, datetimeoffset with scale greater than 2, CLR types including hierarchyid and spatial types, xml. |