Decisions Made on Bad Data Are Worse Than No Decision.
The Silent Data Quality Problem
Bad data is rarely dramatic. It does not cause a system outage or a visible error. It appears in a dashboard as a number that is slightly wrong — 9.4 million in revenue instead of 10.2 million, because two days of records were dropped by a pipeline that retried without deduplication. It appears as a customer count that is 3% higher than the actual count because a migration introduced duplicates that no one caught. It appears as a report that the finance team stopped trusting — not because anyone proved it was wrong, but because the numbers don't match what they expected, and nobody can explain why.
Data quality problems erode trust. Once business stakeholders stop trusting a dashboard, they stop using it — and the data platform investment delivers no value regardless of how technically sound the underlying infrastructure is.
A data quality framework prevents this by making data problems visible and actionable before they reach business consumers.
The Four Dimensions We Address
Completeness
Is the data all there? Are records missing? Are required fields null when they should not be? Completeness checks identify gaps in extraction (source system records not captured), gaps in transformation (records dropped by a transformation error), and gaps in loading (partial loads that didn't complete).
Validity
Does the data conform to expected formats, ranges, and business rules? A date column that contains future dates for historical records. A price column that contains negative values. An order status that contains a value not in the valid status list. Validity checks surface data that has been technically ingested but is logically incorrect.
Freshness
Is the data as current as it should be? A dashboard that claims to show "yesterday's sales" but is actually showing data from three days ago is a data quality problem. Freshness monitoring tracks when each table was last successfully updated and alerts when tables fall behind their expected refresh schedule.
Consistency
Does the data agree with itself across tables and datasets? Total order count in the orders table matches the sum of line items in the order lines table. Customer count in the CRM export matches the customer dimension in the warehouse. Consistency checks detect integration failures and transformation bugs that cause the same entity to be counted differently in different places.
Tools We Use
dbt tests: schema tests (not null, unique, accepted values, referential integrity) and custom data tests configured in the dbt project alongside the transformation models. Cloud Dataplex data quality rules for warehouse-level validation. Google Cloud Monitoring for freshness SLA alerting. Great Expectations for organizations that need a standalone data quality framework outside of dbt.
- dbt schema test configuration: not null, unique, accepted values, relationships
- Custom dbt data tests for business rule validation
- Cloud Dataplex data quality rule configuration
- Data freshness monitoring and SLA alerting
- Completeness validation: missing records and null field detection
- Validity checks: format, range, and business rule enforcement
- Consistency checks: cross-table and cross-dataset reconciliation
- Anomaly detection for statistical outliers in data volumes and values
- Data quality dashboard design: health scores, failure rates, trend tracking
- Data quality incident runbook: failure classification and remediation procedures
How we deliver this service.
Quality Requirements Definition
For each data domain and each critical metric, we define what "good data" means: which fields must not be null, which values are valid, what the expected refresh frequency is, and what cross-table consistency should look like. This is a business exercise, not a technical one.
Quality Rule Design
Quality rules designed to cover the four dimensions for each critical table and pipeline. Prioritized by business impact: rules that protect the metrics used in board-level reporting get built first.
Framework Implementation
Quality rules implemented in dbt, Dataplex, or a standalone framework depending on the data platform architecture. Freshness monitoring configured. Alerting channels connected to the data engineering team.
Baseline Validation
Quality rules run against existing data to establish a baseline: what quality issues already exist, what their severity is, and what remediation is needed before the framework is used for ongoing monitoring.
Operational Handover
Data quality dashboards, alerting policies, and a runbook for data quality incident response. The data engineering team trained on how to add new quality rules as new pipelines are built.