Autonomous in the loop. Accountable at the gates. An agentic offering by Plainsight

Migrate to Databricks with an AI agent fleet.

The fleet documents your SSIS, ADF, Synapse and T-SQL estate into one knowledge base, then builds, tests and operates it on Databricks: delivered as dbt or in your own framework, iterating until the tests pass.

Databricks logo

What migrating to Databricks actually involves

Databricks is a unified analytics and data-engineering platform built on the lakehouse architecture: open Delta Lake table storage on cloud object storage, governed centrally by Unity Catalog, with Apache Spark as the compute engine. The pieces a Microsoft estate lands on are specific and current. Transformation logic runs either as notebooks (Python, SQL, Scala, R) or as Lakeflow Declarative Pipelines; queries are served through Databricks SQL on SQL warehouses; ingestion comes through Lakeflow Connect, Auto Loader or COPY INTO; and everything is orchestrated by Workflows. Governance isn’t assembled from SQL Server schemas, roles and Azure RBAC the way it was on the source. It’s built in: Unity Catalog gives one three-level catalog.schema.object namespace with access control, automatic data lineage, audit logging, and discovery across tables, views, volumes, functions and models.

That difference is the whole point, and it’s also where a naive lift-and-shift goes wrong. Almost none of a Microsoft warehouse’s physical design survives a literal port. A Synapse dedicated pool’s hash distributions, round-robin staging, replicated dimensions and single-column partition switching are tuned to a 60-distribution MPP architecture that simply doesn’t exist on the lakehouse: physical layout is a Delta concern, handled by liquid clustering and OPTIMIZE, and the Spark optimizer decides broadcast joins on its own. The T-SQL dialect isn’t Spark SQL either: function names, type names, identifier quoting and IDENTITY semantics all differ. And native SSIS execution isn’t a GA lift-and-shift target here, so packages are rebuilt, not re-hosted on a runtime that doesn’t exist.

What the fleet changes is the order of operations and the accountability. The Documenter reads the source logic (not the labels) and writes it into the knowledge base: the access patterns behind a distribution key, the SCD transform behind a wizard, the dependency graph behind a set of precedence constraints. A human signs that inventory off at the assessment gate. The same logic is then rebuilt in your chosen flavor: as a dbt project against a SQL warehouse, where the merge strategy compiles to MERGE INTO and snapshots produce dbt_valid_from / dbt_valid_to history; or in your framework as Delta tables, notebooks, Lakeflow Declarative Pipelines with expectations, and Workflows. Then the build-test-run loop runs the suite and iterates until green (row counts, SCD spans and aggregate parity all matching source) before a second human approves the design and a third approves promotion. Autonomous in the loop, accountable at the gates. Counts come straight from config, so you reach a full inventory in minutes, not weeks.

Your legacy estate
DimCustomer.dtsxpipeline.jsonload_sales.ipynbdbo.usp_loadEXTERNAL TABLEvw_ordersadf_triggerfact_sales.dtsxOPENROWSET
SSIS · ADF · Synapse · T-SQL
dbt or your framework

Two delivery flavors on Databricks.

Already deep into dbt? We build dbt, open-source and free to run. Built your own framework over the years? We build that.

What gets generated

dbt
  • A dbt project running against a Databricks SQL warehouse via dbt-databricks (+file_format: delta), with the ref()/source()-built DAG, sources.yml, macros and generated docs and lineage
  • Models materialized as Delta table, view, incremental, or materialized_view (the latter requiring Unity Catalog) in the customer's catalog and schemas
  • incremental models using incremental_strategy='merge' (the Delta default, compiles to MERGE INTO on unique_key); insert_overwrite, replace_where or microbatch chosen where partition-aware or selective rewrites fit better
  • dbt **snapshots** (timestamp or check strategy) on Delta for SCD Type 2, producing dbt_valid_from / dbt_valid_to / dbt_scd_id history columns
  • Generic and singular **tests** (unique, not_null, accepted_values, relationships plus in-project assertions), seeds for static lookups, and a generated docs site

Pattern mapping

Source patterns mapped to deliberate Databricks targets.

Source pattern
Databricks · dbt
Databricks · your framework
SSIS Data Flow (source → transforms → destination)
dbt model(s): SELECT-based staging/marts as table, view, or incremental on Delta
Lakeflow Declarative Pipelines (streaming tables / materialized views) or a Spark SQL / PySpark notebook
SSIS SCD Wizard / Slowly Changing Dimension (Type 2)
dbt **snapshot** (timestamp or check strategy) → dbt_valid_from / dbt_valid_to on Delta
MERGE INTO against a Delta dimension implementing SCD2, or Lakeflow Declarative Pipelines change processing
ADF / Synapse Copy activity (raw ingest)
dbt does not ingest raw files: pair with a load step, then model
Lakeflow Connect connector, Auto Loader (cloudFiles), COPY INTO, or CREATE TABLE AS (CTAS)
OLE DB Command / per-row upsert
dbt incremental model with incremental_strategy='merge' on unique_key
MERGE INTO in Spark SQL or DeltaTable.merge in PySpark
Synapse dedicated SQL pool, hash distribution on a chosen column
dbt table / incremental on Delta; liquid_clustered_by or partition_by for layout (distribution discarded)
Delta table in Unity Catalog; liquid clustering and OPTIMIZE; the 60-distribution key re-derived, not copied
Synapse serverless SQL OPENROWSET / external table over the lake
dbt view / table model over a Unity Catalog table; raw files declared in sources.yml
Unity Catalog external table + view over Delta; Auto Loader for incremental file landing
T-SQL stored procedure (procedural logic, dynamic SQL)
Decompose into ordered, idempotent models + pre/post-hooks; complex logic to a dbt Python model
SQL scripting and CREATE PROCEDURE in Unity Catalog (confirm Runtime gates), or a PySpark notebook
Precedence constraints / pipeline dependsOn ordering
dbt DAG (ref() dependencies) determines run order; the schedule runs dbt build
**Workflows** tasks with dependencies and conditional control flow
Package-level orchestration / SQL Agent schedule / pipeline triggers
dbt invoked as a scheduled, orchestrated task
**Workflows** (Lakeflow Jobs): jobs, tasks and triggers: scheduled, file-arrival, event
The Databricks crew

The agents that build, test and operate on Databricks.

The Documenters and the knowledge base are shared; the builder follows your delivery flavor above: only the Build, Test and Operate crews differ per platform.

Databricks dbt Builder

databricks-dbt-builder

The Databricks dbt Builder generates a dbt project that runs on Databricks SQL warehouses: models materialized to Delta tables, snapshots for slowly changing dimensions, your macros and tests. Incremental models use Databricks' MERGE so large tables update in place rather than rebuilding.

Target Design Spec → a dbt project targeting Databricks SQL

Databricks Test Agent

databricks-test-agent

The Databricks Test Agent writes and runs the suite on Databricks: data quality expectations, schema and row-count parity against documented legacy behavior, and the gates that decide whether an asset is trustworthy. Failures feed straight back into the loop rather than onto a list.

built assets + documented behavior → a passing/failing test suite

Databricks Operator

databricks-operator

The Databricks Operator runs the same closed loop on Databricks: deploy to DEV, run the Workflows and pipelines, diagnose each failure, patch through the builders and re-run. It keeps iterating until the suite is green, logging each pass.

built assets → a green run in DEV (via the build-test-run loop)

Cutover Agent

cutover-agent

The Cutover Agent owns the endgame. It plans the parallel-run period, keeps watermarks in sync between old and new, sequences the switchover itself, and produces the legacy decommissioning checklist, so going live is a controlled step, not a leap.

a green, signed-off build → a planned, executed cutover
Runs in Every migration
Good questions

Migrating to Databricks, answered.

Should we deliver as dbt or in our own framework on Databricks?

Both run natively, so the choice is about what you want to own afterwards. The dbt flavor delivers a dbt project against a Databricks SQL warehouse through `dbt-databricks`: portable `SELECT`-based models, `ref()`-built lineage, tests, snapshots for SCD2, and a generated docs site: the most fully featured adapter for incremental work, with the widest strategy set. Your-framework delivers native Databricks objects (PySpark / Spark SQL notebooks, Lakeflow Declarative Pipelines, Delta tables in Unity Catalog, and Workflows for orchestration) which fits estates with heavy procedural logic or an existing Databricks convention to match. The fleet rebuilds the same documented logic either way; the build-test-run loop iterates until green against the same parity tests regardless of flavor.

What does the `merge` incremental strategy actually compile to on Databricks?

It compiles to Delta `MERGE INTO`. With `incremental_strategy='merge'` and a `unique_key`, dbt generates a `MERGE` that joins the existing target to the staged new rows on the key: matched rows are updated, unmatched rows inserted, SCD Type 1 overwrite semantics. `merge` is the default strategy for Delta in `dbt-databricks`, and omitting `unique_key` makes it behave like `append`. When you need history preserved rather than overwritten, that is a dbt snapshot (SCD Type 2), not a merge. The adapter also offers `append`, `insert_overwrite` (partition-aware), `replace_where` for selective overwrites, `delete+insert`, and `microbatch`, so the fleet picks the strategy that matches the source pattern rather than forcing everything through merge.

How does the fleet handle T-SQL stored procedures on Databricks?

It re-expresses them rather than porting them verbatim, because T-SQL is not Spark SQL: `ISNULL` becomes `coalesce`/`nvl`, `GETDATE()` becomes `current_timestamp()`, `TOP n` becomes `LIMIT n`, `[bracketed]` identifiers become backticks, and every function is checked against the Spark SQL built-in set. On the dbt flavor a procedure decomposes into ordered, idempotent models with pre/post-hooks, and genuinely procedural logic becomes a dbt Python model. On your-framework, Databricks now offers native SQL scripting (`BEGIN...END`, `IF`, `WHILE`, condition handlers) and `CREATE PROCEDURE` in Unity Catalog, but these are gated behind recent Databricks Runtime versions, so the fleet confirms the customer's Runtime before relying on them rather than assuming availability, falling back to PySpark notebooks otherwise.

What replaces SSIS or ADF data validation on Databricks?

Data-quality logic that legacy estates scripted in SSIS data flows or T-SQL check steps becomes **expectations** on a Lakeflow Declarative Pipelines dataset. An expectation is a named SQL Boolean constraint evaluated per record with one of three actions: retain (the default: invalid records are still written, but pass/fail metrics are tracked), `ON VIOLATION DROP ROW` (invalid records dropped before write), or `ON VIOLATION FAIL UPDATE` (the update fails and the transaction rolls back atomically). Metrics are queryable from the pipeline event log. On the dbt flavor the same intent is expressed as generic and singular tests in the build-test-run loop. Either way the validation is declared as part of the rebuild, not bolted on afterwards.

Can we lift-and-shift SSIS packages straight onto Databricks?

No: there is no GA runtime that executes native SSIS packages on Databricks, so the honest answer is rebuild, not re-host. A `.dtsx` package is read for its real intent (the data flows, the SCD transforms, the precedence constraints, the connection managers) and rebuilt as Databricks objects: data flows become Lakeflow Declarative Pipelines or notebooks, control-flow ordering becomes Workflows task dependencies, connection managers become Unity Catalog connections, external locations and secret scopes, and SQL Agent or package-level schedules become Workflows triggers. The Documenter captures all of this in the knowledge base first, and a human signs it off at the design gate before any rebuild begins.

How is ingestion handled, since dbt doesn't load raw data?

dbt transforms data that is already landed, so the copy or ingest step is handled separately and then modeled. On Databricks the ADF / SSIS copy step maps to a **Lakeflow Connect** connector for managed source ingestion (enterprise apps and databases including SQL Server, with built-in CDC and schema evolution), **Auto Loader** (the `cloudFiles` source) for incremental file landing zones (recommended for high-volume ingestion of millions of files over time) `COPY INTO` for simpler idempotent SQL-driven loads of thousands of files, or `CREATE TABLE AS` for straightforward batch loads. The fleet picks the mechanism that matches the source volume and shape, lands the raw data in Delta under Unity Catalog, and the dbt or framework models build from there.

Let's talk

Ready to migrate to Databricks?

Tell us about your estate and we'll show you exactly what the fleet builds on Databricks, in dbt or your framework.

Plan my migration

A short form, no spam. We usually reply within one business day.

Plan my migration