Question 1

Should we deliver as dbt or in our own framework on Databricks?

Accepted Answer

Both run natively, so the choice is about what you want to own afterwards. The dbt flavor delivers a dbt project against a Databricks SQL warehouse through `dbt-databricks`: portable `SELECT`-based models, `ref()`-built lineage, tests, snapshots for SCD2, and a generated docs site: the most fully featured adapter for incremental work, with the widest strategy set. Your-framework delivers native Databricks objects (PySpark / Spark SQL notebooks, Lakeflow Declarative Pipelines, Delta tables in Unity Catalog, and Workflows for orchestration) which fits estates with heavy procedural logic or an existing Databricks convention to match. The fleet rebuilds the same documented logic either way; the build-test-run loop iterates until green against the same parity tests regardless of flavor.

Question 2

What does the `merge` incremental strategy actually compile to on Databricks?

Accepted Answer

It compiles to Delta `MERGE INTO`. With `incremental_strategy='merge'` and a `unique_key`, dbt generates a `MERGE` that joins the existing target to the staged new rows on the key: matched rows are updated, unmatched rows inserted, SCD Type 1 overwrite semantics. `merge` is the default strategy for Delta in `dbt-databricks`, and omitting `unique_key` makes it behave like `append`. When you need history preserved rather than overwritten, that is a dbt snapshot (SCD Type 2), not a merge. The adapter also offers `append`, `insert_overwrite` (partition-aware), `replace_where` for selective overwrites, `delete+insert`, and `microbatch`, so the fleet picks the strategy that matches the source pattern rather than forcing everything through merge.

Question 3

How does the fleet handle T-SQL stored procedures on Databricks?

Accepted Answer

It re-expresses them rather than porting them verbatim, because T-SQL is not Spark SQL: `ISNULL` becomes `coalesce`/`nvl`, `GETDATE()` becomes `current_timestamp()`, `TOP n` becomes `LIMIT n`, `[bracketed]` identifiers become backticks, and every function is checked against the Spark SQL built-in set. On the dbt flavor a procedure decomposes into ordered, idempotent models with pre/post-hooks, and genuinely procedural logic becomes a dbt Python model. On your-framework, Databricks now offers native SQL scripting (`BEGIN...END`, `IF`, `WHILE`, condition handlers) and `CREATE PROCEDURE` in Unity Catalog, but these are gated behind recent Databricks Runtime versions, so the fleet confirms the customer's Runtime before relying on them rather than assuming availability, falling back to PySpark notebooks otherwise.

Question 4

What replaces SSIS or ADF data validation on Databricks?

Accepted Answer

Data-quality logic that legacy estates scripted in SSIS data flows or T-SQL check steps becomes **expectations** on a Lakeflow Declarative Pipelines dataset. An expectation is a named SQL Boolean constraint evaluated per record with one of three actions: retain (the default: invalid records are still written, but pass/fail metrics are tracked), `ON VIOLATION DROP ROW` (invalid records dropped before write), or `ON VIOLATION FAIL UPDATE` (the update fails and the transaction rolls back atomically). Metrics are queryable from the pipeline event log. On the dbt flavor the same intent is expressed as generic and singular tests in the build-test-run loop. Either way the validation is declared as part of the rebuild, not bolted on afterwards.

Question 5

Can we lift-and-shift SSIS packages straight onto Databricks?

Accepted Answer

No: there is no GA runtime that executes native SSIS packages on Databricks, so the honest answer is rebuild, not re-host. A `.dtsx` package is read for its real intent (the data flows, the SCD transforms, the precedence constraints, the connection managers) and rebuilt as Databricks objects: data flows become Lakeflow Declarative Pipelines or notebooks, control-flow ordering becomes Workflows task dependencies, connection managers become Unity Catalog connections, external locations and secret scopes, and SQL Agent or package-level schedules become Workflows triggers. The Documenter captures all of this in the knowledge base first, and a human signs it off at the design gate before any rebuild begins.

Question 6

How is ingestion handled, since dbt doesn't load raw data?

Accepted Answer

dbt transforms data that is already landed, so the copy or ingest step is handled separately and then modeled. On Databricks the ADF / SSIS copy step maps to a **Lakeflow Connect** connector for managed source ingestion (enterprise apps and databases including SQL Server, with built-in CDC and schema evolution), **Auto Loader** (the `cloudFiles` source) for incremental file landing zones (recommended for high-volume ingestion of millions of files over time) `COPY INTO` for simpler idempotent SQL-driven loads of thousands of files, or `CREATE TABLE AS` for straightforward batch loads. The fleet picks the mechanism that matches the source volume and shape, lands the raw data in Delta under Unity Catalog, and the dbt or framework models build from there.

Migrate to Databricks with an AI agent fleet.

What migrating to Databricks actually involves

Every Microsoft legacy source, onto Databricks.

SSIS

ADF

Azure Synapse

T-SQL

Two delivery flavors on Databricks.

What gets generated

Pattern mapping

The agents that build, test and operate on Databricks.

Databricks dbt Builder

Databricks Framework Builder

Databricks Test Agent

Databricks Operator

Cutover Agent

Migrating to Databricks, answered.

Ready to migrate to Databricks?