11-slowly-changing-dimensions

Analytics asks, “What did we believe about this customer on June 1?” Your current table only keeps the latest address, so answers drift over time. You need to keep each historical version with a validity window for accurate as‑of queries.

Core Concept

Store one row per version with [valid_from, valid_to) where valid_to is NULL for the current version. Update the previous open row’s valid_to on change and INSERT a new row starting at the change time.

Implementation (step by step SQL)

Table

CREATE TABLE customer_dim (
  customer_id INT NOT NULL,
  name        TEXT NOT NULL,
  address     TEXT NOT NULL,
  valid_from  TIMESTAMPTZ NOT NULL,
  valid_to    TIMESTAMPTZ,
  PRIMARY KEY (customer_id, valid_from)
);

Insert first version

INSERT INTO customer_dim (customer_id, name, address, valid_from)
VALUES (1,'Alice','123 Main St','2025-01-01');

Apply change (close old row, add new)

UPDATE customer_dim
  SET valid_to = '2025-07-01'
WHERE customer_id=1 AND valid_to IS NULL;

INSERT INTO customer_dim (customer_id, name, address, valid_from)
VALUES (1,'Alice','456 Oak Ave','2025-07-01');

Current snapshot

SELECT * FROM customer_dim WHERE customer_id=1 AND valid_to IS NULL;

As of a date

SELECT * FROM customer_dim
WHERE customer_id=1
  AND '2025-06-01' >= valid_from
  AND ('2025-06-01' < valid_to OR valid_to IS NULL);

Notes

Optional helpers

CREATE OR REPLACE VIEW customer_current AS
SELECT * FROM customer_dim WHERE valid_to IS NULL;

Trigger: enforce continuous, valid intervals (tstzrange + range_agg)

Use a simple trigger to check three things per customer_id when inserting/updating a version row:

This uses tstzrange to build ranges and range_agg to get a normalized multirange we can scan for gaps.

CREATE OR REPLACE FUNCTION customer_dim_enforce_continuity()
RETURNS trigger LANGUAGE plpgsql AS $$
DECLARE
  mr tstzmultirange;
  gaps int;
  _cid int := NEW.customer_id;
  _new_range tstzrange := tstzrange(NEW.valid_from, NEW.valid_to, '[)');
BEGIN
  -- 1) Per-row validity
  IF NEW.valid_to IS NOT NULL AND NEW.valid_from >= NEW.valid_to THEN
    RAISE EXCEPTION USING
      MESSAGE = '[SCD_0001] invalid validity window',
      DETAIL  = json_build_object('customer_id', _cid, 'valid_from', NEW.valid_from, 'valid_to', NEW.valid_to)::text,
      HINT    = 'Use half-open [valid_from, valid_to) with valid_from < valid_to';
  END IF;

  -- 2) Explicit overlap check against existing rows (allow adjacency only)
  IF EXISTS (
    SELECT 1
    FROM customer_dim d
    WHERE d.customer_id = _cid
      AND (TG_OP = 'INSERT' OR (d.valid_from, d.valid_to) IS DISTINCT FROM (OLD.valid_from, OLD.valid_to))
      AND tstzrange(d.valid_from, d.valid_to, '[)') && _new_range
      AND NOT upper(tstzrange(d.valid_from, d.valid_to, '[)')) = lower(_new_range)
      AND NOT upper(_new_range) = lower(tstzrange(d.valid_from, d.valid_to, '[)'))
  ) THEN
    RAISE EXCEPTION USING
      MESSAGE = '[SCD_0002] overlapping versions for customer',
      DETAIL  = json_build_object('customer_id', _cid)::text,
      HINT    = 'Close previous row at valid_to = new.valid_from, then insert the new version';
  END IF;

  -- 3) Continuity check (no gaps): aggregate all ranges (including NEW) and
  -- ensure every next range starts exactly at the previous end
  SELECT range_agg(r ORDER BY lower(r))
  INTO mr
  FROM (
    SELECT tstzrange(valid_from, valid_to, '[)') AS r
    FROM customer_dim
    WHERE customer_id = _cid
    UNION ALL
    SELECT _new_range
  ) q;

  -- Count places where next.lower <> prev.upper
  SELECT count(*) INTO gaps
  FROM (
    SELECT upper(r) AS prev_end,
           lead(lower(r)) OVER (ORDER BY lower(r)) AS next_start
    FROM unnest(mr) AS r
  ) s
  WHERE next_start IS NOT NULL AND next_start <> prev_end;

  IF gaps > 0 THEN
    RAISE EXCEPTION USING
      MESSAGE = '[SCD_0003] non-contiguous history for customer',
      DETAIL  = json_build_object('customer_id', _cid, 'gaps', gaps)::text,
      HINT    = 'Ensure each version starts at the previous valid_to (half-open [) semantics)';
  END IF;

  RETURN NEW;
END;
$$;

DROP TRIGGER IF EXISTS trg_customer_dim_enforce ON customer_dim;
CREATE TRIGGER trg_customer_dim_enforce
  BEFORE INSERT OR UPDATE ON customer_dim
  FOR EACH ROW
  EXECUTE FUNCTION customer_dim_enforce_continuity();

Write helpers: close-and-insert (CTE and MERGE)

WITH close AS (
  UPDATE customer_dim
     SET valid_to = $1
   WHERE customer_id = $2
     AND valid_to IS NULL
)
INSERT INTO customer_dim (customer_id, name, address, valid_from, valid_to)
VALUES ($2, $3, $4, $1, NULL);
-- Bind $1=new_valid_from, $2=customer_id, $3=name, $4=address

Alternative with MERGE (PostgreSQL 15+): use MERGE to close the open row, then INSERT. MERGE cannot both update and insert for the same source row in one statement.

-- Close the open row if any
MERGE INTO customer_dim t
USING (VALUES ($1::int, $2::timestamptz)) s(customer_id, new_from)
ON (t.customer_id = s.customer_id AND t.valid_to IS NULL)
WHEN MATCHED THEN
  UPDATE SET valid_to = s.new_from;

-- Then insert the new current version
INSERT INTO customer_dim (customer_id, name, address, valid_from, valid_to)
VALUES ($1, $3, $4, $2, NULL);
-- Bind $1=customer_id, $2=new_valid_from, $3=name, $4=address

Recap (Short Summary)

Type 2 lets you reconstruct past truth and audit changes cleanly. Each new version is an append; nothing destructive to history.

Limits of explicit valid_from and valid_to

When change frequency is modest these are acceptable; if changes are very frequent, consider an append-only snapshot table and derive ranges at query time.

Alternate: snapshot approach (append-only)

Prefer a minimal-write log and derive intervals on read? See the dedicated follow-up chapter:

Slowly Changing Dimensions (Type 2 History Made Simple)

Problem / Context