Solveddbt core Record failing rows for tests into an auditable table
βοΈAccepted Answer
I support and would use the use case of reporting on failed tests. I am more interested in using them for auditing than for debugging βΒ I would like a nice easy table to query via Looker for a conditional "Go Fix ETL" alert.
When I went looking to do this feature I naturally reached for hooks, as I'm using them already for audit purposes much as described in the documentation.
I'm logging the individual model run:
on-run-start:
- "create table if not exists {{ target.schema }}.audit_dbt_runs (model text, state text, time timestamp)"
models:
pre-hook:
- "insert into {{ target.schema }}.audit_dbt_runs (model, state, time) values ('{{this.name}}', 'start', getdate())"
post-hook:
- "insert into {{ target.schema }}.audit_dbt_runs (model, state, time) values ('{{this.name}}', 'end', getdate())"
As well as the overall results:
on-run-start:
- "create table if not exists {{ target.schema }}.audit_dbt_results (node text, status text, execution_time decimal(10,2), time timestamp)"
on-run-end:
- "insert into {{ target.schema }}.audit_dbt_results (node, status, execution_time, time) values {{ results_values(results) }}"
(I have a simple macro results_values
to turn the results
into insertable values.)
In this case I would generally like the exact same thing for tests. Something exactly analogous to the on-run-start
/on-run-end
with the end run context results
would suit my needs:
on-test-start:
- "create table if not exists {{ target.schema }}.audit_dbt_test_results (node text, status text, execution_time decimal(10,2), time timestamp)"
on-test-end:
- "insert into {{ target.schema }}.audit_dbt_test_results (node, status, execution_time, time) values {{ results_values(results) }}"
Other Answers:
Hi @Aylr and @gordonhwong!
In case it's still useful for anyone, the following macro is what I use:
{% macro results_values(results) %}
{% for res in results -%}
{% if loop.index > 1 %},{% endif %}
('{{ res.node.alias }}', '{{ res.status }}', {{ res.execution_time }}, getdate())
{% endfor %}
{% endmacro %}
This pairs with:
on-run-start:
- "create table if not exists {{ target.schema }}.audit_dbt_results (node text, status text, execution_time decimal(10,2), time timestamp)"
on-run-end:
- "insert into {{ target.schema }}.audit_dbt_results (node, status, execution_time, time) values {{ results_values(results) }}"
Hey @bashyroger, thanks for bringing this topic back to the fore. I just opened a new issue to sketch out our latest thinking about how dbt should store tests in the database: #2593
Feature
Feature description
dbt should record rows that fail schema or data tests into tables, to be used for auditing. There are two general ways this could work:
errors
table with a schema like:The baked-in schema tests will need to be rewritten to provide debuggable information in these tables.
This should probably be an opt-in feature for tests. It could be configured either in
schema.yml
, though it would also be good to configure whole swaths of tests at once.Who will this benefit?
dbt users could use this table to quickly determine why their tests failed. At present, it's surprisingly difficult to debug a failed test.