Why = NULL Never Works in PostgreSQL (And What SQL is Actually Doing) 🕳️



I was debugging a query that was supposed to return unprocessed jobs — rows where the processed_at column hadn't been set yet. Simple enough.

I wrote:

SELECT * FROM jobs WHERE processed_at = NULL;

Zero rows. Every time. Even when I could see unprocessed jobs sitting in the table.

I'd made this mistake before. I knew the fix — IS NULL instead of = NULL. I changed it, got my results, and moved on. But this time I stopped and actually looked up why. Why does = NULL silently return nothing instead of throwing an error? Why does IS NULL work when = NULL doesn't? And what does that say about how SQL thinks about data?

The answer turned out to be more interesting — and more consequential — than I expected.


📜 Where NULL Came From

NULL wasn't an accident or an oversight. It was a deliberate design decision made by Edgar F. Codd, the mathematician who invented the relational model in his landmark 1970 paper "A Relational Model of Data for Large Shared Data Banks."

Codd's problem was straightforward: real-world data is incomplete. A person's middle name might not be known. A delivery date might not be set yet. A measurement might not have been taken. Any honest data model needs a way to represent "this information is missing" — and Codd didn't want that to be an empty string, or a zero, or a magic sentinel value like -1. Those all carry meaning of their own and corrupt the data's semantics.

His solution was NULL: a marker meaning "value unknown or inapplicable." And to make it consistent, he argued that comparisons involving an unknown value must themselves produce an unknown result — which gave SQL its three-valued logic: TRUE, FALSE, and UNKNOWN.

Codd actually went further than SQL did. In later work he proposed two distinct kinds of NULL: "missing but applicable" (a phone number that exists but isn't recorded) and "missing and inapplicable" (a phone number for someone who doesn't have one). SQL collapsed both into a single NULL, which Codd himself criticised. His collaborator C.J. Date went further and argued that NULL was a mistake entirely — that missing information should be handled at the application layer, not embedded in the type system of the database.

That debate never fully resolved. But SQL kept NULL, and the three-valued logic it requires is now fundamental to every relational database in use today — which is why understanding it isn't optional.


🤔 The Rule and the Reason

In SQL, NULL is not a value. It's the absence of a known value. It means "I don't know what this is." And that distinction changes everything.

When you write processed_at = NULL, SQL evaluates: "Is this unknown value equal to unknown?" The answer isn't yes or no. It's also unknown. You're asking whether something you don't know equals something else you don't know. There's no way to determine that.

So the expression processed_at = NULL doesn't return TRUE or FALSE. It returns NULL. And a WHERE clause only keeps rows where the condition evaluates to TRUE. NULL is not TRUE, so those rows are silently discarded.

-- This compares a value to "unknown" — result is always NULL
WHERE processed_at = NULL    -- ❌ returns NULL, not TRUE or FALSE

-- This asks "does this column have no value?" — that's a known fact
WHERE processed_at IS NULL   -- ✅ returns TRUE or FALSE

IS NULL is not a comparison operator. It's a test for the presence or absence of a value. That's why it works where = doesn't.


🧩 Three-Valued Logic

Standard boolean logic has two values: TRUE and FALSE. SQL has three: TRUE, FALSE, and NULL (also called UNKNOWN). This is called three-valued logic and it's the root of almost every NULL-related surprise in SQL.

Here's the truth table for AND and OR with NULL:

AND          TRUE    FALSE   NULL
TRUE      →  TRUE    FALSE   NULL
FALSE     →  FALSE   FALSE   FALSE
NULL      →  NULL    FALSE   NULL

OR           TRUE    FALSE   NULL
TRUE      →  TRUE    TRUE    TRUE
FALSE     →  TRUE    FALSE   NULL
NULL      →  TRUE    NULL    NULL

Two things stand out:

FALSE AND NULL = FALSE — because regardless of what NULL turns out to be, AND with a known FALSE can never be TRUE. SQL can determine the outcome without resolving the unknown.

TRUE OR NULL = TRUE — same logic. Regardless of what NULL is, the OR with a known TRUE is always TRUE.

But NULL AND NULL = NULL. NULL OR NULL = NULL. And critically, NULL = NULL = NULL.

This is why you can never test for NULL equality — including testing whether two NULL values are equal to each other.


💣 The NOT IN Trap

This is where three-valued logic causes real production bugs. Consider a query to find users who are not in a list of suspended accounts:

SELECT id FROM users
WHERE id NOT IN (SELECT user_id FROM suspensions);

This works fine until the suspensions table has a row where user_id is NULL. At that point, the query returns no rows at all.

Here's why. NOT IN expands to a series of != comparisons AND'd together:

-- NOT IN (1, 2, NULL) expands to:
WHERE id != 1 AND id != 2 AND id != NULL

id != NULL evaluates to NULL for any value of id. And any value AND NULL where the rest of the chain is TRUE gives you NULL. So the entire predicate evaluates to NULL for every row, and the WHERE clause eliminates everything.

-- Safe version: use NOT EXISTS instead
SELECT id FROM users u
WHERE NOT EXISTS (
  SELECT 1 FROM suspensions s
  WHERE s.user_id = u.id
);

-- Or filter NULLs out explicitly
SELECT id FROM users
WHERE id NOT IN (
  SELECT user_id FROM suspensions
  WHERE user_id IS NOT NULL
);

NOT EXISTS handles NULL gracefully because it tests for the existence of a matching row, not for value equality. If user_id is NULL, the inner query finds no matching row and NOT EXISTS returns TRUE.

This is the kind of bug that's invisible until production data contains a NULL that your test data didn't.


📊 How Aggregates Handle NULL

NULL has consistent — but often surprising — behaviour inside aggregate functions. The rule is simple: most aggregates ignore NULLs entirely.

-- Table: scores (values: 10, 20, NULL, 30)

SELECT COUNT(*)        FROM scores;  -- 4  (counts all rows)
SELECT COUNT(score)    FROM scores;  -- 3  (ignores NULL)
SELECT SUM(score)      FROM scores;  -- 60 (ignores NULL)
SELECT AVG(score)      FROM scores;  -- 20 (60 / 3, not 60 / 4)
SELECT MAX(score)      FROM scores;  -- 30
SELECT MIN(score)      FROM scores;  -- 10

The AVG behaviour is the one that bites people. If you're averaging a column where NULL means "zero" in your domain, the result will be wrong. You need to make that explicit:

-- If NULL should be treated as 0
SELECT AVG(COALESCE(score, 0)) FROM scores;  -- 15 (60 / 4)

COUNT(*) vs COUNT(column) is the other common trap. They look similar but mean different things — one counts rows, the other counts non-NULL values in a specific column.


🛠️ The Tools for Handling NULL

COALESCE returns the first non-NULL argument. Use it to substitute a default value wherever NULL would cause problems:

-- Fallback chain
SELECT COALESCE(nickname, first_name, 'Anonymous') AS display_name
FROM users;

-- Safe arithmetic (NULL + anything = NULL)
SELECT COALESCE(discount, 0) + base_price AS total FROM orders;

NULLIF is the inverse — it returns NULL if two expressions are equal. Useful for converting sentinel values to proper NULLs:

-- Treat empty string as NULL
SELECT NULLIF(description, '') FROM products;

-- Avoid division by zero
SELECT total / NULLIF(divisor, 0) FROM calculations;

IS DISTINCT FROM is PostgreSQL-specific and treats NULL as a known, comparable value rather than unknown. It returns FALSE when both sides are NULL:

-- Regular equality — NULL = NULL is NULL, not TRUE
SELECT NULL = NULL;              -- NULL

-- IS DISTINCT FROM — NULL treated as a value
SELECT NULL IS DISTINCT FROM NULL;      -- FALSE (they are the same)
SELECT 1    IS DISTINCT FROM NULL;      -- TRUE  (they are different)
SELECT 1    IS NOT DISTINCT FROM NULL;  -- FALSE (they are different)

IS NOT DISTINCT FROM is useful when you want to compare two columns that might both be NULL and treat that as a match — something = cannot do:

-- Find rows where old and new status are the same (including both NULL)
WHERE old_status IS NOT DISTINCT FROM new_status

🎯 The Rules to Internalize

  1. NULL = NULL is NULL, not TRUE. Nothing equals NULL. Not even NULL itself.

  2. WHERE filters on TRUE. Any NULL result from a condition silently excludes the row.

  3. NOT IN with a subquery containing NULL returns nothing. Use NOT EXISTS instead.

  4. Aggregates ignore NULL. COUNT(*) and COUNT(column) are not the same thing.

  5. COALESCE for defaults. NULLIF for sentinel-to-NULL conversion. IS DISTINCT FROM when you need NULL-safe equality.


✨ Final Thoughts

The = NULL mistake is one of those things that feels like a gotcha until you understand the reasoning behind it — and then it becomes completely logical. NULL is unknown. Comparing unknown to anything produces unknown. The WHERE clause, which only wants TRUE, discards the row. No error, because nothing broke. SQL did exactly what three-valued logic required.

What I find interesting is how consistent this model is once you accept it. The NOT IN trap, the aggregate behaviour, the COALESCE patterns — they all follow from the same underlying principle. NULL is not a value. It's the absence of one. Every rule about NULL in SQL flows from that single decision.

The trap isn't the rule. The trap is not knowing the rule exists.