I was debugging a query that was supposed to return unprocessed jobs — rows where the
processed_at column hadn't been set yet. Simple enough.
I wrote:
SELECT * FROM jobs WHERE processed_at = NULL;
Zero rows. Every time. Even when I could see unprocessed jobs sitting in the table.
I'd made this mistake before. I knew the fix — IS NULL instead of = NULL. I changed it,
got my results, and moved on. But this time I stopped and actually looked up why. Why does
= NULL silently return nothing instead of throwing an error? Why does IS NULL work when
= NULL doesn't? And what does that say about how SQL thinks about data?
The answer turned out to be more interesting — and more consequential — than I expected.
📜 Where NULL Came From
NULL wasn't an accident or an oversight. It was a deliberate design decision made by Edgar F. Codd, the mathematician who invented the relational model in his landmark 1970 paper "A Relational Model of Data for Large Shared Data Banks."
Codd's problem was straightforward: real-world data is incomplete. A person's middle name might
not be known. A delivery date might not be set yet. A measurement might not have been taken.
Any honest data model needs a way to represent "this information is missing" — and
Codd didn't want that to be an empty string, or a zero, or a magic sentinel value like -1.
Those all carry meaning of their own and corrupt the data's semantics.
His solution was NULL: a marker meaning "value unknown or inapplicable." And to make it
consistent, he argued that comparisons involving an unknown value must themselves produce an
unknown result — which gave SQL its three-valued logic: TRUE, FALSE, and UNKNOWN.
Codd actually went further than SQL did. In later work he proposed two distinct kinds of NULL: "missing but applicable" (a phone number that exists but isn't recorded) and "missing and inapplicable" (a phone number for someone who doesn't have one). SQL collapsed both into a single NULL, which Codd himself criticised. His collaborator C.J. Date went further and argued that NULL was a mistake entirely — that missing information should be handled at the application layer, not embedded in the type system of the database.
That debate never fully resolved. But SQL kept NULL, and the three-valued logic it requires is now fundamental to every relational database in use today — which is why understanding it isn't optional.
🤔 The Rule and the Reason
In SQL, NULL is not a value. It's the absence of a known value. It means "I don't know
what this is." And that distinction changes everything.
When you write processed_at = NULL, SQL evaluates: "Is this unknown value equal to unknown?"
The answer isn't yes or no. It's also unknown. You're asking whether something you don't
know equals something else you don't know. There's no way to determine that.
So the expression processed_at = NULL doesn't return TRUE or FALSE. It returns NULL.
And a WHERE clause only keeps rows where the condition evaluates to TRUE. NULL is not
TRUE, so those rows are silently discarded.
-- This compares a value to "unknown" — result is always NULL
WHERE processed_at = NULL -- ❌ returns NULL, not TRUE or FALSE
-- This asks "does this column have no value?" — that's a known fact
WHERE processed_at IS NULL -- ✅ returns TRUE or FALSE
IS NULL is not a comparison operator. It's a test for the presence or absence of a value.
That's why it works where = doesn't.
🧩 Three-Valued Logic
Standard boolean logic has two values: TRUE and FALSE. SQL has three: TRUE, FALSE, and
NULL (also called UNKNOWN). This is called three-valued logic and it's the root of
almost every NULL-related surprise in SQL.
Here's the truth table for AND and OR with NULL:
AND TRUE FALSE NULL
TRUE → TRUE FALSE NULL
FALSE → FALSE FALSE FALSE
NULL → NULL FALSE NULL
OR TRUE FALSE NULL
TRUE → TRUE TRUE TRUE
FALSE → TRUE FALSE NULL
NULL → TRUE NULL NULL
Two things stand out:
FALSE AND NULL = FALSE — because regardless of what NULL turns out to be, AND
with a known FALSE can never be TRUE. SQL can determine the outcome without resolving the
unknown.
TRUE OR NULL = TRUE — same logic. Regardless of what NULL is, the OR with a
known TRUE is always TRUE.
But NULL AND NULL = NULL. NULL OR NULL = NULL. And critically, NULL = NULL = NULL.
This is why you can never test for NULL equality — including testing whether two NULL values are equal to each other.
💣 The NOT IN Trap
This is where three-valued logic causes real production bugs. Consider a query to find users who are not in a list of suspended accounts:
SELECT id FROM users
WHERE id NOT IN (SELECT user_id FROM suspensions);
This works fine until the suspensions table has a row where user_id is NULL. At that
point, the query returns no rows at all.
Here's why. NOT IN expands to a series of != comparisons AND'd together:
-- NOT IN (1, 2, NULL) expands to:
WHERE id != 1 AND id != 2 AND id != NULL
id != NULL evaluates to NULL for any value of id. And any value AND NULL where the
rest of the chain is TRUE gives you NULL. So the entire predicate evaluates to NULL for
every row, and the WHERE clause eliminates everything.
-- Safe version: use NOT EXISTS instead
SELECT id FROM users u
WHERE NOT EXISTS (
SELECT 1 FROM suspensions s
WHERE s.user_id = u.id
);
-- Or filter NULLs out explicitly
SELECT id FROM users
WHERE id NOT IN (
SELECT user_id FROM suspensions
WHERE user_id IS NOT NULL
);
NOT EXISTS handles NULL gracefully because it tests for the existence of a matching row,
not for value equality. If user_id is NULL, the inner query finds no matching row and
NOT EXISTS returns TRUE.
This is the kind of bug that's invisible until production data contains a NULL that your
test data didn't.
📊 How Aggregates Handle NULL
NULL has consistent — but often surprising — behaviour inside aggregate functions. The rule is simple: most aggregates ignore NULLs entirely.
-- Table: scores (values: 10, 20, NULL, 30)
SELECT COUNT(*) FROM scores; -- 4 (counts all rows)
SELECT COUNT(score) FROM scores; -- 3 (ignores NULL)
SELECT SUM(score) FROM scores; -- 60 (ignores NULL)
SELECT AVG(score) FROM scores; -- 20 (60 / 3, not 60 / 4)
SELECT MAX(score) FROM scores; -- 30
SELECT MIN(score) FROM scores; -- 10
The AVG behaviour is the one that bites people. If you're averaging a column where NULL
means "zero" in your domain, the result will be wrong. You need to make that explicit:
-- If NULL should be treated as 0
SELECT AVG(COALESCE(score, 0)) FROM scores; -- 15 (60 / 4)
COUNT(*) vs COUNT(column) is the other common trap. They look similar but mean different
things — one counts rows, the other counts non-NULL values in a specific column.
🛠️ The Tools for Handling NULL
COALESCE returns the first non-NULL argument. Use it to substitute a default value
wherever NULL would cause problems:
-- Fallback chain
SELECT COALESCE(nickname, first_name, 'Anonymous') AS display_name
FROM users;
-- Safe arithmetic (NULL + anything = NULL)
SELECT COALESCE(discount, 0) + base_price AS total FROM orders;
NULLIF is the inverse — it returns NULL if two expressions are equal. Useful for
converting sentinel values to proper NULLs:
-- Treat empty string as NULL
SELECT NULLIF(description, '') FROM products;
-- Avoid division by zero
SELECT total / NULLIF(divisor, 0) FROM calculations;
IS DISTINCT FROM is PostgreSQL-specific and treats NULL as a known, comparable value
rather than unknown. It returns FALSE when both sides are NULL:
-- Regular equality — NULL = NULL is NULL, not TRUE
SELECT NULL = NULL; -- NULL
-- IS DISTINCT FROM — NULL treated as a value
SELECT NULL IS DISTINCT FROM NULL; -- FALSE (they are the same)
SELECT 1 IS DISTINCT FROM NULL; -- TRUE (they are different)
SELECT 1 IS NOT DISTINCT FROM NULL; -- FALSE (they are different)
IS NOT DISTINCT FROM is useful when you want to compare two columns that might both be
NULL and treat that as a match — something = cannot do:
-- Find rows where old and new status are the same (including both NULL)
WHERE old_status IS NOT DISTINCT FROM new_status
🎯 The Rules to Internalize
-
NULL = NULLisNULL, notTRUE. Nothing equalsNULL. Not evenNULLitself. -
WHEREfilters onTRUE. AnyNULLresult from a condition silently excludes the row. -
NOT INwith a subquery containingNULLreturns nothing. UseNOT EXISTSinstead. -
Aggregates ignore
NULL.COUNT(*)andCOUNT(column)are not the same thing. -
COALESCEfor defaults.NULLIFfor sentinel-to-NULL conversion.IS DISTINCT FROMwhen you need NULL-safe equality.
✨ Final Thoughts
The = NULL mistake is one of those things that feels like a gotcha until you understand
the reasoning behind it — and then it becomes completely logical. NULL is unknown. Comparing
unknown to anything produces unknown. The WHERE clause, which only wants TRUE, discards
the row. No error, because nothing broke. SQL did exactly what three-valued logic required.
What I find interesting is how consistent this model is once you accept it. The NOT IN
trap, the aggregate behaviour, the COALESCE patterns — they all follow from the same
underlying principle. NULL is not a value. It's the absence of one. Every rule about NULL
in SQL flows from that single decision.
The trap isn't the rule. The trap is not knowing the rule exists.