So, what is it?
Well it is not really rocket science, it is nothing more than a table containing consecutive number from 0 or 1, up to whatever level is required to satisfy a particular need.
And How can that help me?
The beauty of a tally table is that, if used correctly, it can be used as an alternative to a loop, but without the performance overheads of 'rbar' (row by agonising row) operations.
OK, so show me an example
So lets create a 'Tally' table;
Create Table #Numbers
(
number tinyint
)
;
Declare @num tinyint
Set @num = 1
;
While @num <= 50
begin
insert #Numbers (number)
Select @num
;
Select @num = @num + 1
;
end
Select top 5 number
from #Numbers
order by number
Having thought long and hard about how to demonstrate the use of a numbers table, I decided to recreate soundex.
What is Soundex?
Soundex is an algorithm developed early in the 20th century for matching English names, irrespective of variations in spelling. The algorithm works by matching names phonetically, in other words based on how it sounds.
The soundex key is generated by applying the following 4 rules to name data:
- Where there are multiple letters with the same sound value adjacent in the name, the second is removed.
- The first letter is always retained, but all subsequent occurrences of the following letters are removed: a, e, i, o, u, y, h, w.
- The remaining letters, other than the first, are replaced by a number signifying how they sound.
- Retain a maximum of 4 characters, or if the remaining string is less than 4 characters then append zeros to make up the difference.
Sound Value | Letters | |
1 | b, f, p, v | |
2 | c, g, j, k, q, s, x, z | |
3 | d, t | |
4 | l | |
5 | m, n | |
6 | r |
These sound values in SQL Server's implementation can be proven by the following method:
Create table #sound_Lookup
(
letter char(1),
soundval tinyint
)
Declare @ascii smallint
Select @ascii = ASCII('a')
while @ascii <= ASCII('z')
begin
insert #sound_Lookup
Select char(@ascii),
SUBSTRING(soundex('a' + char(@ascii)),2,1) soundval
;
Select @ascii = @ascii + 1
;
end
Select *
from #sound_Lookup
where soundval > 0
order by 1
letter | soundval | |
b | 1 | |
c | 2 | |
d | 3 | |
f | 1 | |
g | 2 | |
j | 2 | |
k | 2 | |
l | 4 | |
m | 5 | |
n | 5 | |
p | 1 | |
q | 2 | |
r | 6 | |
s | 2 | |
t | 3 | |
v | 1 | |
x | 2 | |
z | 2 |
So, how does the numbers table help?
To be able to implement the soundex rules we will need to iterate through each letter in a string, but that will be slow and can only be performed on one record at a time. The following script uses the numbers table to split the string in to its constituent letters:
Declare @Name varchar(50)
Select @Name = 'William'
Select n.Number,
SUBSTRING(@Name, n.number, 1) Letter,
l.Soundval
from #Numbers n
left join #sound_Lookup l ON SUBSTRING(@Name, n.number, 1) = l.letter
where SUBSTRING(@Name, n.number, 1) <> ''
Number | Letter | Soundval | |
1 | W | 0 | |
2 | i | 0 | |
3 | l | 4 | |
4 | l | 4 | |
5 | i | 0 | |
6 | a | 0 | |
7 | m | 5 |
Applying the rules in order:
1. Where there are multiple letters with the same sound value adjacent in the name, the second is removed, in this case the second 'l'.
Declare @Name varchar(50)
Select @Name = 'William'
Select n.number,
SUBSTRING(@Name, n.number, 1) Letter,
l.soundval
from #Numbers n
join #sound_Lookup l ON SUBSTRING(@Name, n.number, 1) = l.letter
left join #sound_Lookup p ON SUBSTRING(@Name, n.number - 1, 1) = p.letter -- previous letter in string
where SUBSTRING(@Name, n.number, 1) <> ''
and l.soundval <> ISNULL(p.soundval , 9)
number | Letter | soundval | |
1 | W | 0 | |
2 | i | 0 | |
3 | l | 4 | |
5 | i | 0 | |
6 | a | 0 | |
7 | m | 5 |
2. The first letter is always retained, but all subsequent occurrences of the following letters are removed: a, e, i, o, u, y, h, w.
Declare @Name varchar(50)
Select @Name = 'William'
Select n.number,
SUBSTRING(@Name, n.number, 1) Letter,
l.soundval
from #Numbers n
join #sound_Lookup l ON SUBSTRING(@Name, n.number, 1) = l.letter
left join #sound_Lookup p ON SUBSTRING(@Name, n.number - 1, 1) = p.letter -- previous letter in string
where SUBSTRING(@Name, n.number, 1) <> ''
and l.soundval <> ISNULL(p.soundval , 9)
and (
SUBSTRING(@Name, number, 1) not in ('a','e','h','i','o','u','w','y')
or number = 1 -- Retain first letter
)
number | Letter | soundval | |
1 | W | 0 | |
3 | l | 4 | |
7 | m | 5 |
3. The remaining letters, other than the first, are replaced by a number signifying how they sound.
Declare @Name varchar(50)
Select @Name = 'William'
Select n.number,
case when n.number = 1 then SUBSTRING(@Name, n.number, 1)
else cast(l.soundval as CHAR(1))
end [Character]
from #Numbers n
join #sound_Lookup l ON SUBSTRING(@Name, n.number, 1) = l.letter
left join #sound_Lookup p ON SUBSTRING(@Name, n.number - 1, 1) = p.letter -- previous letter in string
where SUBSTRING(@Name, n.number, 1) <> ''
and l.soundval <> ISNULL(p.soundval , 9)
and (
SUBSTRING(@Name, number, 1) not in ('a','e','h','i','o','u','w','y')
or number = 1 -- Retain first letter
)
number | Character | |
1 | W | |
3 | 4 | |
7 | 5 |
This is all well and good, but we are not wanting a record set. What we are wanting to return is a single string. The following script uses the XML resordset functionality (introduced in SQL Server 2005) to recombine the remaining characters in to a string:
Declare @Name varchar(50)
Select @Name = 'William'
Select case when n.number = 1 then SUBSTRING(@Name, n.number, 1)
else cast(l.soundval as CHAR(1))
end
from #Numbers n
join #sound_Lookup l ON SUBSTRING(@Name, n.number, 1) = l.letter
left join #sound_Lookup p ON SUBSTRING(@Name, n.number - 1, 1) = p.letter -- previous letter in string
where SUBSTRING(@Name, n.number, 1) <> ''
and l.soundval <> ISNULL(p.soundval, 9)
and (
SUBSTRING(@Name, number, 1) not in ('a','e','h','i','o','u','w','y')
or number = 1 -- Retain first letter
)
for XML path ('')
4. Retain a maximum of 4 characters, or if the remaining string is less than 4 characters then append zeros to make up the difference.
So, we need to pad the string out with zeros:
Declare @Name varchar(50)
Select @Name = 'William'
Select CAST((
Select case when n.number = 1 then SUBSTRING(@Name, n.number, 1)
else cast(l.soundval as CHAR(1))
end
from #Numbers n
join #sound_Lookup l ON SUBSTRING(@Name, n.number, 1) = l.letter
left join #sound_Lookup p ON SUBSTRING(@Name, n.number - 1, 1) = p.letter
where SUBSTRING(@Name, n.number, 1) <> ''
and l.soundval <> ISNULL(p.soundval, 9)
and (
SUBSTRING(@Name, number, 1) not in ('a','e','h','i','o','u','w','y')
or number = 1 -- Retain first letter
)
for XML path ('')
) + '0000' as char(4))
Now, lets see how the code works on a bigger sample, and how it compares to the built in Soundex function.
Create Table #Names
(
Name varchar(50)
)
;
insert #Names values ('Jonathan')
insert #Names values ('Johnathon')
insert #Names values ('William')
insert #Names values ('Wilheim')
insert #Names values ('ThisIsAVeryLongTestString')
insert #Names values ('Tymczak')
insert #Names values ('Ashcraft')
;
Select Name,
CAST( (
Select case when n.number = 1 then SUBSTRING(Name, n.number, 1)
else cast(l.soundval as CHAR(1))
end
from #Numbers n
join #sound_Lookup l ON SUBSTRING(Name, n.number, 1) = l.letter
left join #sound_Lookup p ON SUBSTRING(Name, n.number - 1, 1) = p.letter
where SUBSTRING(Name, n.number, 1) <> ''
and l.soundval <> ISNULL(p.soundval, 9)
and (
SUBSTRING(Name, number, 1) not in ('a','e','h','i','o','u','w','y')
or number = 1
)
for XML path ('')
) + '0000' as Char(4)) MySoundex,
SOUNDEX(Name) RealSoundex
from #Names x
Name | MySoundex | RealSoundex | |
Jonathan | J535 | J535 | |
Johnathon | J535 | J535 | |
William | W450 | W450 | |
Wilheim | W450 | W450 | |
ThisIsAVeryLongTestString | T221 | T221 | |
Tymczak | T522 | T522 | |
Ashcraft | A226 | A226 |