A migration is only as good as what you can prove about it. Moving data from a source system to a target, whether that’s a legacy ERP to a cloud data lake, a CRM consolidation, or a full-scale database migration on AWS, doesn’t complete when the last row transfers. It completes when you can demonstrate that what arrived matches what left: in count, structure, content, and business meaning.
That proof is data migration validation, and it is the discipline most teams underinvest in. The consequences show up later: reports that don’t reconcile, dashboards that contradict each other, compliance gaps in regulated data, and the organizational damage of a business that stops trusting its own numbers. Gartner has estimated that poor data quality costs organizations an average of $12.9 million per year, and a migration that moves bad data (or introduces new errors in transit) compounds that cost rather than reducing it.
This guide covers how data migration validation and data migration testing work in practice: what to check, when to check it, which techniques apply at each layer, and how a foundation-first approach to data migration services ensures validation is built into the process rather than bolted on at the end.
Table of Contents
Why Data Migration Validation Is Not Optional
The assumption that data migrates cleanly is one of the most persistent and costly myths in data engineering. In practice, data inconsistencies emerge from multiple sources during migration: format differences between source and target systems, datatype mapping errors, character encoding mismatches, lost referential integrity between related tables, and transformation logic that behaves differently against production data than it did in a test environment.
The risks compound at the edges. Slowly changing dimensions in historical datasets may not carry forward correctly. Orphan records (rows that reference a primary key or foreign key that no longer exists in the target) create silent integrity failures that only surface when a downstream application tries to join across them. Precision and scale differences in numeric fields can introduce rounding errors that look correct in isolation but produce materially wrong aggregates in a financial report.
For organizations subject to regulatory compliance requirements the stakes are higher still. PII that lands in the wrong access tier, or that loses required encryption in transit or at rest, can become a compliance event. Audit trail continuity and data lineage documentation must survive the migration intact or be reconstructed in the target. These aren’t edge cases, they are predictable failure modes that a disciplined validation process is specifically designed to catch. For a structured look at how these risks surface across project types and how to address them, see our guide to data migration risks and mitigation.
The Layers of Data Migration Validation
Effective data migration testing operates at multiple layers simultaneously. Each layer catches a different class of error — and missing any one of them leaves a category of risk undetected.
Schema Validation
Before any data moves, schema validation confirms that the structural contract between source and target is well-defined and compatible. This means verifying that every table, column, datatype mapping, primary key, foreign key, and constraint in the source has a corresponding and compatible representation in the target.
Common failure modes at this layer include character encoding differences (UTF-8 vs. Latin-1), precision and scale mismatches in numeric columns, time zones not normalized consistently, and SQL dialect gaps between source and target platforms. These are cheap to fix before migration begins and expensive to discover in production. Data normalization and data cleansing applied during the ETL or ELT process can resolve many of these proactively, but only if they’re identified first through thorough data profiling.
Record-Level and Field-Level Validation
Once data has moved, record-level validation and field-level validation compare source and target at the row and column level. Row counts are the starting point: the number of records in each table should match between source and target, with any discrepancies traced to explicit transformation rules (deduplication, filtering) rather than silent data loss.
Beyond counts, checksums and hash comparison provide a stronger integrity signal. A checksum on a row’s combined field values will detect any modification, including ones that don’t change the record count. Field-level spot checks and statistical sampling validate that value distributions, ranges, and formats match expectations. Duplicate detection confirms that deduplication logic hasn’t collapsed distinct records, and that records haven’t been replicated unexpectedly. Orphan records are caught through referential integrity checks across related tables.
Business Rules and KPI Parity
Technical validation confirms that data arrived structurally intact. Business rules validation confirms that it means the same thing. This is where data reconciliation and KPI parity testing become essential: comparing aggregates, totals, and key metrics between source and target to confirm that the business logic embedded in reports and dashboards produces the same results against both datasets.
This layer is often where the most consequential errors surface. A semantic layer that calculates revenue differently across two systems, because a field name is the same but the business definition isn’t, can pass every structural validation check and fail every business one. BI regression testing against critical dashboards and reports is the discipline that catches this: running production reports against migrated data and comparing outputs against a known-good baseline from the source system.
For organizations migrating ERP data — NetSuite, SAP, Deltek — this step is particularly critical. Financial fields, project cost allocations, and revenue recognition logic are frequently calculated differently between platforms, and the discrepancies compound when viewed in aggregate. The validation standard here is not just structural parity but data parity against the business metrics leadership uses to run the organization.
Validation Across the Migration Lifecycle
Validation is not a single phase; it is a continuous discipline that runs from pre-migration through hypercare. The question at each stage is different, but the need for structured verification is constant.
Pre-Migration: Data Profiling and Baseline Establishment
Before any data moves, a thorough data assessment and data profiling exercise establishes the baseline. This means documenting the actual state of your source data (completeness rates, value distributions, duplicate detection results, data gaps, and critical data elements) so that post-migration validation has a clean reference point to compare against.
This step also surfaces validation rules and thresholds that should be agreed upon with business stakeholders before migration begins.
What row count variance is acceptable? What numeric tolerances apply to financial aggregates? Which fields are absolutely critical to match exactly, and which can accommodate controlled transformation?
Defining tolerances in advance prevents the post-migration debate about whether a 0.3% variance in a calculated field is acceptable or a blocker: a conversation that always takes longer than it should when it happens under cutover pressure. Decisions made without agreed thresholds also drive scope expansion and rework, which are among the leading contributors to elevated data migration costs.
Test Migration: Parallel Validation Before Cutover
A parallel run (running source and target systems simultaneously and comparing outputs) is the gold standard for pre-cutover validation. It’s also the most operationally demanding approach, because it requires maintaining two environments in sync, often using CDC (Change Data Capture) to propagate ongoing changes from source to target during the validation window.
For organizations that cannot sustain a full parallel run, representative test migrations, moving a subset of data that covers the full range of data types, edge cases, and transformation complexity, provide a meaningful signal at lower operational cost. The output of each test migration is a structured exception report: records that failed validation rules, fields that didn’t match, aggregates that diverged beyond tolerances. Each exception is either resolved in the transformation logic or explicitly accepted with stakeholder sign-off before the production migration proceeds.
This is also where idempotency ( the property that running the migration multiple times produces the same result) becomes critical. A migration pipeline that produces different outputs on different runs introduces validation ambiguity that is very difficult to resolve under cutover pressure. Building idempotent pipelines is a discipline, not a default.
Cutover Validation: Zero Downtime and Rollback Readiness
Cutover validation runs under the most time pressure of any validation phase. The cutover window is when the final reconciliation happens: confirming that the last state of the source matches the first state of the production target.
At this stage, access controls, encryption in transit and encryption at rest configurations, and data lineage documentation should all be verified in the production environment before the source system is retired.
Post-Migration: Audit Trail and Hypercare
After go-live, validation shifts to monitoring. Audit trail integrity, confirming that the chain of custody for sensitive and regulated data is documented and intact, must be verified in the production environment. Data lineage in tools like Databricks Unity Catalog should reflect the migration event and the current state of all governed tables.
Hypercare monitoring watches for the issues that only emerge at production scale: aggregation errors that didn’t appear in test sampling, data inconsistencies that surface when real users run real queries against real edge cases. BI regression testing continues through this period, with stakeholders actively comparing dashboard outputs against pre-migration baselines until confidence is established.
A Foundation-First Approach to Migration Validation
The organizations that validate most effectively are the ones that treat validation as a design requirement, not a post-migration activity. That means establishing validation rules and thresholds during the architecture phase, building data profiling into the pre-migration assessment, and structuring data reconciliation checkpoints at every stage of the migration lifecycle rather than running a single validation pass before cutover.
This is the foundation-first approach dbSeer brings to every data migration consulting engagement. Our data assessment process begins by profiling source data in earnest — identifying data gaps, duplicate detection results, referential integrity gaps, and critical data elements — before any architecture decisions are made. Validation requirements are defined alongside migration requirements, not after them.
Whether your migration involves CRM consolidation, ERP modernization, data lake buildout, or a complex multi-source integration, the validation discipline is the same: establish the baseline, define the rules, test iteratively, and don’t call it done until the business trusts the numbers. That is how we approach it, and it is what separates migrations that finish on time from those that require prolonged remediation.
If your team is planning a migration and wants to understand what a structured validation framework looks like for your environment, start with a conversation.
