Blogmark

Validating Data Types from Semi-Structured Data Loads in Postgres with pg_input_is_valid

via jbranchaud@gmail.com

https://www.crunchydata.com/blog/validating-data-types-from-semi-structured-data-loads-in-postgres-with-pg_input_is_valid
PostgreSQL Data Validation

The pg_input_is_valid function is a handy way to check for whether rows of data conform to (are castable to) a specific data type. Let's say you have a ton of records with a payload text column. Those fields mostly represent valid JSON, but some are non-JSON error messages. You could write an update SQL statement like this:

UPDATE my_table
SET json_payload = CASE 
    WHEN pg_input_is_valid(payload, 'jsonb') 
    THEN payload::jsonb 
    ELSE '{}'::jsonb 
    END;

The post also shows off a nice technique for loading a ton of CSV data in where you may not be sure that every single row conforms to the various data types. It would be a shame to run a copy statement that loads 95% of the data and then suddenly fails and rolls back because of an errant field.

Instead, load everything in as text:

CREATE TEMP TABLE staging_customers (
    customer_id TEXT,
    name TEXT,
    email TEXT,
    age TEXT,
    signup_date TEXT
);

-- copy in the data to the temp table
COPY staging_customers FROM '/path/to/customers.csv' CSV HEADER;

Then use an approach similar to what I described in the first code block to migrate the valid values over to other rows in the same or a different table.

In Elizabeth's example, the invalid records are ignored while the rest are moved to the new table:

INSERT INTO customers (name, email, age, signup_date)
SELECT name, email, age::integer, signup_date::date
FROM staging_customers
WHERE pg_input_is_valid(age, 'integer')
  AND pg_input_is_valid(signup_date, 'date');