Blogmark
Validating Data Types from Semi-Structured Data Loads in Postgres with pg_input_is_valid
via jbranchaud@gmail.com
The pg_input_is_valid
function is a handy way to check for whether rows of data conform to (are castable to) a specific data type. Let's say you have a ton of records with a payload text
column. Those fields mostly represent valid JSON, but some are non-JSON error messages. You could write an update
SQL statement like this:
UPDATE my_table
SET json_payload = CASE
WHEN pg_input_is_valid(payload, 'jsonb')
THEN payload::jsonb
ELSE '{}'::jsonb
END;
The post also shows off a nice technique for loading a ton of CSV data in where you may not be sure that every single row conforms to the various data types. It would be a shame to run a copy
statement that loads 95% of the data and then suddenly fails and rolls back because of an errant field.
Instead, load everything in as text
:
CREATE TEMP TABLE staging_customers (
customer_id TEXT,
name TEXT,
email TEXT,
age TEXT,
signup_date TEXT
);
-- copy in the data to the temp table
COPY staging_customers FROM '/path/to/customers.csv' CSV HEADER;
Then use an approach similar to what I described in the first code block to migrate the valid values over to other rows in the same or a different table.
In Elizabeth's example, the invalid records are ignored while the rest are moved to the new table:
INSERT INTO customers (name, email, age, signup_date)
SELECT name, email, age::integer, signup_date::date
FROM staging_customers
WHERE pg_input_is_valid(age, 'integer')
AND pg_input_is_valid(signup_date, 'date');