'UNDOCUMENTED FEATURE when SELECT in VARCHAR with trailing whitespace SQL Server

I hope this is an interesting puzzle for an SQL expert out there.

When I run the following query, I would expect it to return no results.

-- Create a table variable Note: This same behaviour occurs in standard tables.

DECLARE @TestResults TABLE (Id int IDENTITY(1,1) NOT NULL, Foo VARCHAR(100) NOT NULL, About VARCHAR(1000) NOT NULL)

-- Add some test data Note: Without space, space prefix and space suffix

INSERT INTO @TestResults(Foo, About) VALUES('Bar', 'No spaces')
INSERT INTO @TestResults(Foo, About) VALUES('Bar ', 'Space Suffix')
INSERT INTO @TestResults(Foo, About) VALUES(' Bar', 'Space prefix')

-- SELECT statement that is filtered by a value without a space and also a value with a space suffix

SELECT 
     t.Foo
     , t.About
FROM @TestResults t
WHERE t.Foo like 'Bar '
AND t.Foo like 'Bar'
AND t.Foo = 'Bar '
AND t.Foo = 'Bar'

The results return a single row:

[Foo]  [About]
Bar    Space Suffix

I need to know more about this behaviour and how I should work around it.

It is also worth noting that LEN(Foo) is odd too, as follows:

DECLARE @TestResults TABLE (Id int IDENTITY(1,1) NOT NULL, Foo VARCHAR(100) NOT NULL, About VARCHAR(1000) NOT NULL)
INSERT INTO @TestResults(Foo, About) VALUES('Bar', 'No spaces')
INSERT INTO @TestResults(Foo, About) VALUES('Bar ', 'Space Suffix')
INSERT INTO @TestResults(Foo, About) VALUES(' Bar', 'Space prefix')

SELECT 
     t.Foo
     , LEN(Foo) [Length]
     , t.About
FROM @TestResults t

Gives the following results:

[Foo]   [Length]  [About]
Bar     3         No spaces
Bar     3         Space Suffix
 Bar    4         Space prefix

Without any lateral thinking, what do I need to change my WHERE clause to in order to return 0 results as expected?



Solution 1:[1]

The answer is to add the following clause:

AND DATALENGTH(t.Foo) = DATALENGTH('Bar')

Running the following query...

DECLARE @Chars TABLE (CharNumber INT NOT NULL)

DECLARE @CharNumber INT = 0

WHILE(@CharNumber <= 255)
    BEGIN
        INSERT INTO @Chars(CharNumber) VALUES(@CharNumber)

        SET @CharNumber = @CharNumber + 1

    END

SELECT 
    CharNumber
    , IIF('Test' = 'Test' + CHAR(CharNumber),1,0) ['Test' = 'Test' + CHAR(CharNumber)]
    , IIF('Test' LIKE 'Test' + CHAR(CharNumber),1,0) ['Test' LIKE 'Test' + CHAR(CharNumber)]
    , IIF(LEN('Test') = LEN('Test' + CHAR(CharNumber)),1,0) [LEN('Test') = LEN('Test' + CHAR(CharNumber))]
    , IIF(DATALENGTH('Test') = DATALENGTH('Test' + CHAR(CharNumber)),1,0) [DATALENGTH('Test') = DATALENGTH('Test' + CHAR(CharNumber))]
FROM @Chars
WHERE ('Test' = 'Test' + CHAR(CharNumber))
OR ('Test' LIKE 'Test' + CHAR(CharNumber))
OR (LEN('Test') = LEN('Test' + CHAR(CharNumber)))
ORDER BY CharNumber

...produces the following results...

CharNumber  'Test' = 'Test' + CHAR(CharNumber)  'Test' LIKE 'Test' + CHAR(CharNumber)   LEN('Test') = LEN('Test' + CHAR(CharNumber))    DATALENGTH('Test') = DATALENGTH('Test' + CHAR(CharNumber))
0           1                                   1                                       0                                               0
32          1                                   0                                       1                                               0
37          0                                   1                                       0                                               0

DATALENGTH can be used to test the equality of two VARCHAR, therefore the original query can be corrected as follows:

-- Create a table variable Note: This same behaviour occurs in standard tables.

DECLARE @TestResults TABLE (Id int IDENTITY(1,1) NOT NULL, Foo VARCHAR(100) NOT NULL, About VARCHAR(1000) NOT NULL)

-- Add some test data Note: Without space, space prefix and space suffix

INSERT INTO @TestResults(Foo, About) VALUES('Bar', 'No spaces')
INSERT INTO @TestResults(Foo, About) VALUES('Bar ', 'Space Suffix')
INSERT INTO @TestResults(Foo, About) VALUES(' Bar', 'Space prefix')

-- SELECT statement that is filtered by a value without a space and also a value with a space suffix

SELECT 
     t.Foo
     , t.About
FROM @TestResults t
WHERE t.Foo like 'Bar '
AND t.Foo like 'Bar'
AND t.Foo = 'Bar ' 
AND t.Foo = 'Bar' 
AND DATALENGTH(t.Foo) = DATALENGTH('Bar') -- Additional clause

I also made a function to be used instead of =

ALTER FUNCTION dbo.fVEQ( @VarCharA VARCHAR(MAX), @VarCharB VARCHAR(MAX) ) 
RETURNS BIT 
WITH SCHEMABINDING
AS
BEGIN
    -- Added by WonderWorker on 18th March 2020

    DECLARE @Result BIT = IIF(
        (@VarCharA = @VarCharB AND DATALENGTH(@VarCharA) = DATALENGTH(@VarCharB))

    , 1, 0)

    RETURN @Result

END

..Here is a test for all 256 characters used as trailing characters to prove that it works..

-- Test fVEQ with all 256 characters

DECLARE @Chars TABLE (CharNumber INT NOT NULL)

DECLARE @CharNumber INT = 0

WHILE(@CharNumber <= 255)
    BEGIN
        INSERT INTO @Chars(CharNumber) VALUES(@CharNumber)

        SET @CharNumber = @CharNumber + 1

    END

SELECT 
    CharNumber
    , dbo.fVEQ('Bar','Bar' + CHAR(CharNumber)) [fVEQ Trailing Char Test]
    , dbo.fVEQ('Bar','Bar') [fVEQ Same test]
    , dbo.fVEQ('Bar',CHAR(CharNumber) + 'Bar') [fVEQ Leading Char Test]
FROM @Chars
WHERE (dbo.fVEQ('Bar','Bar' + CHAR(CharNumber)) = 1)
AND (dbo.fVEQ('Bar','Bar') = 0)
AND (dbo.fVEQ('Bar',CHAR(CharNumber) + 'Bar') = 1)

Solution 2:[2]

The reason why trailing whitespace is disregarded in string comparison, is because of the notion of fixed-length string fields, in which any content shorter than the fixed length is automatically right-padded with spaces. Such fixed-length fields cannot distinguish meaningful trailing spaces from padding.

The rationale for why fixed-length string fields even exist, is that they improve performance significantly in many cases, and when SQL was designed it was common for character-based terminals (which usually treated trailing spaces equivalent to padding), reports printed with monospaced fonts (which used trailing spaces for padding and alignment), and data storage and exchange formats (which used fixed-length fields in place of extensive and costly delimiters and complicated parsing logic), to all be oriented around fixed-length fields, so there was a tight integration with this concept at all stages of processing.

When comparing two fixed-length fields of the same fixed length, a literal comparison would of course be possible and would produce correct results.

But when comparing a fixed-length field of a given fixed length, to a fixed-length field of a different fixed length, the desired behaviour would never be to include the trailing spaces in the comparison, since two such fields could never match literally simply by virtue of their differing fixed lengths. The shorter field could be cast and padded to the length of the longer (at least conceptually if not physically), but the trailing space would still then be considered as padding rather than as meaningful.

When comparing a fixed-length field to a variable-length field, the desired behaviour is also probably never to include trailing spaces in the comparison. More complicated approaches which attempt to attribute meaning to trailing spaces in the variable-length side of the comparison, would only come at the cost of slower comparison logic and additional conceptual complexity and potential for error.

In terms of why variable-length to variable-length comparisons ignore trailing spaces, since here spaces can be meaningful in principle, the rationale is probably maintaining consistency in comparison behaviour as when fixed-length fields are involved, and the avoidance of the most common kind of error, since trailing spaces are spurious in databases far more often than they are meaningful.

Nowadays, a database system designed in every respect from scratch would probably forsake fixed-length fields, and probably perform all comparisons literally, leaving the developer to deal explicitly with spurious trailing spaces, but in my experience this would result in extra development effort and far more frequent error than the current arrangement in SQL, where errors in program logic involving the silent disregard of trailing spaces usually only occurs when designing complex string-shredding logic to be used against un-normalised data (which is a kind of data that SQL is specifically not optimised for handling).

So to be clear, this is not an undocumented feature, but a prominent feature that exists by design.

Solution 3:[3]

If you change the query to

SELECT 
     Foo
     , About
     , CASE WHEN Foo LIKE 'Bar ' THEN 'T' ELSE 'F' END As Like_Bar_Space
     , CASE WHEN Foo LIKE 'Bar'  THEN 'T' ELSE 'F' END As Like_Bar
     , CASE WHEN Foo =    'Bar ' THEN 'T' ELSE 'F' END As EQ_Bar_Space
     , CASE WHEN Foo =    'Bar'  THEN 'T' ELSE 'F' END As EQ_Bar
FROM @TestResults

it gives you a better overview, as you see the result of the different conditions separately:

Foo     About         Like_Bar_Space   Like_Bar   EQ_Bar_Space   EQ_Bar
------  ------------  ---------------  ---------  -------------  ------
Bar     No spaces      F                T          T              T
Bar     Space Suffix   T                T          T              T
 Bar    Space prefix   F                F          F              F

It looks like equals = ignores trailing spaces in both searched string and pattern. LIKE, however, does not ignore the trailing space in the pattern but ignores an extra trailing space in the searched string. Leading spaces are never ignored.

I don't know how wrong entries got in there, but you can fix them with

UPDATE @TestResults SET Foo = TRIM(Foo)

You can make a trailing space sensitive test with:

WHERE t.Foo + ";" = pattern + ";" 

You can make a trailing space insensitive test with:

WHERE RTRIM(t.Foo) = RTRIM(pattern)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3