What I will discuss below, and over a few future posts, is equally consequential to application developers, working within the 2GB limits of SQL compact, as it is to developers, such as myself, who measure space requirements in terabytes.
Space constraints aside; inappropriate datatypes can have a significant impact on performance. Smaller data types mean:
- more rows can be stored per data page, and so the more rows can be retrieved for the same I/O cost.
- less network traffic generated during ETL processes, and when returning query results to a client machine.
- more rows can be cached, meaning a reduction in repeat data reads for similar or repeated queries.
String Data
There are really just 3 string data types in SQL Server, although all are available in unicode and non-unicode varieties. Each of the unicode equivalents (nChar, nVarchar and nText) require 2 bytes per character storage but are still constrained by the same storage limit meaning the maximum number of characters is halved.
- Char(x)
Storage requirements | 1 byte per character |
Description | Char is a fixed width datatype; should you define a field as Char(50), it will be stored as 50 bytes, irrespective of the number of characters being held. A maximum of 8000 characters can be stored in the char datatype. |
Good Usage | The char data type should be used with consistent length string values. As a rule of thumb; Char should be used in preference to Varchar whenever the average string length is within two characters of the maximum string length. |
Bad Usage | The char data type should not be used when storing strings with a large distribution of lengths. Choosing a character limit that is too large will result in a space penalty for every row in the table. |
- Varchar(x) and Varchar(max)
Storage Requirements | 1 byte per character + 2 bytes |
Description | Varchar is a variable width datatype; should you define a field as varchar(50) and use it to store a 10 character string, SQL Server will only use 12 bytes of storage. The 2 byte overhead is so that the variable start and end positions of the string can be stored. The standard varchar datatype can store up to 8000 characters, whereas varchar(max) can be used to store strings of up to 2GB in size. Data held as a varchar(max) datatype will be held in row, and so will have no additional query overhead, as long as the entire row stays within the 8000 byte limit. Should the row length exceed 8000 bytes, the contents of the varchar(max) field will be store separate to the rest of the row, in a blob, and will result in an additional query overhead. |
Good Usage | The Varchar data type should be used for storing strings with significant variance in length. As a rule of thumb; Varchar should be used in preference to Char whenever the average string length is less than the maximum string length minus 2. |
Bad Usage | Varchar(1)!!! Varchar(1) limits the string to a single character, but imposes a 2 byte overhead per value, giving it a similar storage requirement to a char(3). A char(1) would be the correct data type in this instance. |
- Text
Storage Requirements | 1 byte per character |
Description | Text is a variable length datatype which can be used to store strings of up to 2GB in size. All Text data is stored outside of the data row, in a blob, and so will have a query overhead. The text datatype has been replaced by varchar(max), and will be removed in a future version of SQL Server, and so should be avoided. |
Next up...
In the next instalment of this series, I will discuss the options available for storing date and time data.