Tuesday, January 13, 2009

Finding the Last Occurrence of a Pattern in a String

When dealing with string data, one of the most useful Transact-SQL functions you can have in your toolbox is PATINDEX. Aside from it’s many creative uses (and there are quite a few), it allows you to search for an expression (including wildcard operators) within any valid SQL Server character type. It returns the position of the first occurrence of the pattern within the string, and returns zero (0) if the pattern is not found. As always, an example is worth 1000 bon mots:

SELECT PATINDEX('%c%''abcdefgabcdefga')

The above snippet returns 3, because the first instance of “c” within “abcdefgabcdefga” is at the third character.

On occasion, you may be required to find the last occurrence of the pattern rather than the first. This usually happens when you’re dealing with data from external systems, or when you’ve inherited custody of an inappropriately denormalized system (one where more than one “logical” data item has been stored in a single, concatenated field). I have the same requirement for my “mad science” series, so I developed a function to take care of it. Here it is:

CREATE FUNCTION dbo.REVERSEPATINDEX(@StringToFind    nvarchar(max),

@StringToSearch nvarchar(max))

RETURNS bigint

AS

BEGIN

--LastIndex will store the "previous" index of the pattern

--while we search for the "next" index inside our string.

DECLARE @LastIndex bigint, @NextIndex bigint


--We will chop off the "searched" portion of our string

--After each search iteration

DECLARE @CurrentStringToSearch nvarchar(max)


--Assign the value of the input parameter to @CurrentStringToSearch

SET @CurrentStringToSearch = @StringToSearch


--Grab the first occurrence, if it exists, using PATINDEX

SET @NextIndex = PATINDEX(@StringToFind, @CurrentStringToSearch)


--If we've found the pattern within the string, continue and find

--subsequent occurrences until there are no more to find.

WHILE @NextIndex > 0

BEGIN

--Add @NextIndex to @LastIndex, effectively storing within

--@LastIndex the index of the occurence that we have just found,

--based upon the original (non-truncated) string.

SET @LastIndex = COALESCE(@LastIndex,0) + @NextIndex


--Truncate @CurrentStringToSearch, removing the portion of the

--string that we have already searched.

SET @CurrentStringToSearch
= RIGHT(@CurrentStringToSearch, LEN(@CurrentStringToSearch) - @NextIndex)


--Find the NEXT occurence of the pattern in the remaining portion of

--the string to search.

SET @NextIndex = PATINDEX(@StringToFind, @CurrentStringToSearch)

END


--We will COALESCE @LastIndex and @NextIndex; @LastIndex will be NULL

--if the pattern does not exist in the string (as the loop above will

--never have been executed).

RETURN COALESCE(@LastIndex, @NextIndex)

END

GO

And to test:

SELECT dbo.REVERSEPATINDEX('%c%', 'abcdefgabcdefga')

Returns 10 – the correct answer, given that the last occurrence of “c” in “abcdefgabcdefga” is at the tenth character.

No comments: