Handling NULL values in SQL comparison logic

In SQL, NULL represents an "unknown" value. Because of this, it does not behave like a standard piece of data when used in comparison operators.
If you don't account for this behavior, you may inadvertently exclude rows from your results.

tl;dr

Never assume that NOT covers everything the primary condition missed. Always consider whether a column allows NULL values and, if it does, explicitly define how the query should handle them.

The Problem: "Missing" Rows

Consider a table of users where the name column allows NULLs. We have three users: Alice, Bob, and one user with a NULL name.

If you attempt to update or filter these users using a standard toggle, you might run the following queries:

SELECT * FROM users WHERE name = 'Alice';
SELECT * FROM users WHERE NOT name = 'Alice';

Logic might suggest that these two queries combined would cover the entire table. However:

Query 1 returns Alice.
Query 2 returns Bob.
The NULL user is omitted from both.

The Cause: Three-Valued Logic

Most programming languages use Boolean logic. SQL uses Three-Valued Logic, where a comparison can result in TRUE, FALSE, or UNKNOWN.

When you compare any value to NULL using standard operators (like = or !=), the result is always UNKNOWN.

The critical rule is that a WHERE clause only includes rows where the condition evaluates strictly to TRUE. Since NOT (UNKNOWN) is still UNKNOWN, rows containing NULL fail both the positive and negative check.

The Solution: Accounting for Unknowns

To ensure your queries capture NULL values, you must handle them explicitly.

Option 1: Using `IS NULL`

The most portable way to handle this is by adding an explicit check for the NULL state:

-- This ensures you catch both 'Bob' and the 'NULL' user
UPDATE users 
SET status = 'not_alice' 
WHERE name != 'Alice' OR name IS NULL;

Option 2: Using `IS DISTINCT FROM` (PostgreSQL)

PostgreSQL provides a cleaner way to handle this using the IS DISTINCT FROM operator Show archive.org snapshot . You can think of it as a "null-safe" version of "not equal to."

Unlike the standard != operator, IS DISTINCT FROM treats NULL as a comparable state rather than an unknown. It simply asks: "Are these two things different?"

-- This captures every row that is not 'Alice', including NULLs
SELECT * FROM users 
WHERE name IS DISTINCT FROM 'Alice';

By using this operator, the NULL row is successfully included because it is "distinct" from the string 'Alice'. This removes the need for an extra OR name IS NULL clause.

Option 3: Using `COALESCE`

A third way to handle NULLs is to use the COALESCE function. It allows you to provide a fallback value for any NULL entries, effectively converting the "unknown" state into a known value that standard operators can understand.

For example, if we treat a NULL name as an empty string, we can then use the standard "not equal" operator:

-- Converts NULL to '', then checks if the result is not 'Alice'
SELECT * FROM users 
WHERE COALESCE(name, '') != 'Alice';

Careful: This prevents the database from using an index for the comparison. It's fine to use when performance does not matter, or on tables with only a few entries.

Arne Hartherz 2022-01-21

Handling NULL values in SQL comparison logic

tl;dr

The Problem: "Missing" Rows

The Cause: Three-Valued Logic

The Solution: Accounting for Unknowns

Option 1: Using IS NULL

Option 2: Using IS DISTINCT FROM (PostgreSQL)

Option 3: Using COALESCE

Option 1: Using `IS NULL`

Option 2: Using `IS DISTINCT FROM` (PostgreSQL)

Option 3: Using `COALESCE`