Migrate T-SQL to Databricks.
Procedural T-SQL, dynamic SQL and stored procedures rebuilt as Delta-native logic on Databricks, accountable at every gate.
Why T-SQL to Databricks is a dialect crossing, not a copy
Moving T-SQL onto Databricks looks deceptively simple (it’s SQL going to a SQL engine), but it crosses a dialect boundary. T-SQL and Spark SQL share keywords and diverge underneath them. A naive lift-and-shift that pastes a stored procedure into a notebook will fail on the first GETDATE(), the first ISNULL, the first TOP n, the first [bracketed] identifier. Worse, the failures that don’t throw (the ones that run and return subtly wrong numbers) are the expensive ones. The agent fleet treats this pair as a rebuild that preserves behavior, not a syntax find-and-replace, and it proves equivalence with a test suite rather than asserting it.
The bulk of the business logic in a Microsoft estate usually sits in the T-SQL layer: stored procedures, views, scalar and table-valued functions, and ad-hoc batches. The Documenter reads that layer for what it does, not just what it is named. It separates business logic from plumbing, traces dependency chains across objects and databases, and flags the constructs that will not survive the move unchanged: dynamic SQL, cursors and procedural loops, cross-database and linked-server references, and dialect-specific built-ins. That documentation becomes the contract the builders work against and the Reconciler checks against.
What actually breaks in a lift-and-shift
Three categories drive the rewrite, and each is in the legacy code precisely because it was convenient in SQL Server.
Procedural control flow. Long imperative procedures - cursors, WHILE loops, IF/ELSE branching, multi-step transactions, temp tables - are the costliest to convert. Historically Spark SQL had no procedural form at all, which is why such procedures often map to notebooks or PySpark. That has changed: Databricks now supports SQL scripting (the SQL/PSM standard) inside BEGIN...END blocks with IF, CASE, WHILE, REPEAT, LOOP, FOR, LEAVE/ITERATE, local variables and condition handlers that SIGNAL/RESIGNAL. It also supports native CREATE PROCEDURE stored procedures in Unity Catalog. But these arrived in recent Databricks Runtime versions, and cursor processing (OPEN/FETCH/CLOSE) is gated to an even more recent one. The fleet never assumes a Runtime; the Architect confirms it at the design gate, and where the feature isn’t present, set-based rewrites and PySpark fill the gap.
Dynamic SQL. Strings assembled at runtime and run through EXEC() or sp_executesql are the single most common home for undocumented behavior, because the executed statement is never present statically. The Documenter traces how each string is built before anything is rebuilt. Where the dynamic shape was really a template, it becomes compile-time Jinja on the dbt path; where it must run at runtime, the your-framework path uses EXECUTE IMMEDIATE in Spark SQL or spark.sql(...) in PySpark. The parameterization and security model differ from sp_executesql, so this is reviewed case by case rather than mechanically translated.
Constructs with no clean equivalent. A T-SQL MERGE maps to Delta Lake MERGE INTO - semantically close, but Delta-specific and without the T-SQL OUTPUT clause. Triggers have no equivalent and must be re-expressed as explicit pipeline or job steps. Scalar UDFs and multi-statement table-valued functions become SQL UDFs, Python UDFs, or set-based logic, with different semantics and performance. Three- and four-part / linked-server names become Unity Catalog’s catalog.schema.table three-level namespace and federation - a redesign, not a port. IDENTITY and sequences map to Delta GENERATED ... AS IDENTITY or hash-based surrogate keys, which behave differently from their T-SQL counterparts.
Two flavors, one provable result
Both delivery flavors land on Delta Lake under Unity Catalog; what differs is the shape of the deliverable. On the dbt path, each transformation is a SELECT-based model and dbt owns materialization, run order via ref(), and schema management. Upserts become incremental models with incremental_strategy='merge' (the Delta default), which compiles to MERGE INTO on a unique_key, SCD Type 1 overwrite semantics. History-preserving SCD Type 2 becomes a dbt snapshot, which maintains dbt_valid_from, dbt_valid_to and dbt_scd_id and requires the Delta file format on Databricks. Imperative procedures are decomposed into ordered, idempotent models, with Python models reserved for logic pure SQL can’t express. Runtime control flow lives in the orchestrator, never in a model: Jinja is compile-time templating, not runtime branching.
On the your-framework path, the fleet builds into the way your team already ships on Databricks: Spark SQL and PySpark notebooks, SQL scripting blocks, native stored procedures, and Lakeflow Jobs for the orchestration that nested procedures used to carry. The before/after above shows the same hand-rolled SCD2 procedure two ways, as a declarative dbt snapshot, and as an effective-dating MERGE INTO in a notebook task.
Whichever flavor you choose, the loop is the same. The Operator deploys to DEV and runs for real; the Test Agent asserts schema parity, row-count reconciliation, SCD effective dating, and null and key integrity; the Reconciler runs legacy and rebuilt side by side until the delta is zero or explained. A failure goes back to the builders, not onto a backlog, and the loop re-runs, iterating until green. Autonomous in the loop, accountable at the gates: your experts hold the assessment, design and promotion gates, and the build-test-run loop does the iterating in between. If your target is Microsoft Fabric instead, the same approach is described in Migrate T-SQL to Microsoft Fabric.
T-SQL to Databricks, stage by stage.
For T-SQL → Databricks, the same five stages run with the agents that handle this pair named and linked below. Inside the loop they iterate on their own; your experts hold the three gates.
Orchestrated end-to-end by The Conductor , The Librarian and The Chronicler across every T-SQL → Databricks asset.
Built your way, on Databricks.
Take T-SQL to Databricks as an open-source dbt project (free to run), or built into the framework your team has trusted for years: same migration, your standards either way.
What gets generated
- dbt models on a Databricks SQL warehouse via the dbt-databricks adapter (
view,table,incremental,materialized_view) - Incremental models using the
mergestrategy (the Delta default), compiling to Delta LakeMERGE INTOon aunique_key - dbt snapshots for SCD Type 2 over mutable dimensions, with
dbt_valid_from/dbt_valid_to/dbt_scd_id - Macros that replace repeated dynamic-SQL string-building with compile-time Jinja
- Generic and singular data tests, sources in
sources.yml, and generated docs / lineage
- Spark SQL and PySpark notebooks for procedural logic that has no declarative form
- SQL scripting
BEGIN...ENDblocks (IF,WHILE,CASE, condition handlers) on a recent Databricks Runtime CREATE PROCEDUREstored procedures in Unity Catalog, invoked withCALLand governed byGRANT- Delta Lake
MERGE INTOfor upserts and SCD logic against managed Delta tables - Lakeflow Jobs for orchestration and control flow that used to live in nested stored procedures
Pattern mapping
Each T-SQL pattern, mapped to a deliberate Databricks target.
CREATE PROCEDURE with multi-step writesCREATE PROCEDURE in Unity Catalog running a SQL-scripting BEGIN...END block, or a notebook taskWHILE loop over a row setSELECT; runtime iteration not a dbt conceptOPEN/FETCH/CLOSE only on a very recent RuntimeMERGE upsert into a dimensionincremental model, incremental_strategy='merge' on unique_key (SCD-1 overwrite)MERGE INTO, or DeltaTable.merge in PySparkMERGE INTO implementing effective-dating against a Delta dimensionsp_executesql / EXEC()EXECUTE IMMEDIATE (Spark SQL) or string-built spark.sql(...) in PySparkSELECTIF/ELSE control-of-flow in a procif for compile-time branching; runtime branching to the orchestratorIF/CASE in a BEGIN...END block, or PySparksource() / ref() against Unity Catalog catalog.schema.table#temp table stagingGETDATE(), ISNULL, TOP n, [brackets]current_timestamp(), coalesce/nvl, LIMIT n, backticked identifiersBefore & after
A real T-SQL snippet and what the builders generate on Databricks.
CREATE PROCEDURE dbo.usp_LoadDimCustomer
AS
BEGIN
SET NOCOUNT ON;
-- close superseded versions, then insert the new ones (hand-rolled SCD2)
DECLARE @now DATETIME = GETDATE();
UPDATE d
SET d.ValidTo = @now, d.IsCurrent = 0
FROM dbo.DimCustomer d
INNER JOIN stg.Customer s ON s.CustomerKey = d.CustomerKey
WHERE d.IsCurrent = 1
AND (ISNULL(d.Email, '') <> ISNULL(s.Email, '')
OR ISNULL(d.Segment, '') <> ISNULL(s.Segment, ''));
INSERT INTO dbo.DimCustomer (CustomerKey, Email, Segment, ValidFrom, ValidTo, IsCurrent)
SELECT s.CustomerKey, s.Email, s.Segment, @now, NULL, 1
FROM stg.Customer s
LEFT JOIN dbo.DimCustomer d
ON d.CustomerKey = s.CustomerKey AND d.IsCurrent = 1
WHERE d.CustomerKey IS NULL
OR ISNULL(d.Email, '') <> ISNULL(s.Email, '')
OR ISNULL(d.Segment, '') <> ISNULL(s.Segment, '');
END
snapshots:
- name: dim_customer
relation: ref('stg_customer')
config:
unique_key: customer_key
strategy: check
check_cols: ['email', 'segment']
# delta file format is required for snapshots on databricks
file_format: delta
# dbt closes the old version and inserts a new one, maintaining
# dbt_valid_from / dbt_valid_to / dbt_scd_id, no hand-rolled effective dating.
-- SQL scripting + Delta MERGE INTO, run as a notebook task in Lakeflow Jobs.
-- Close changed current rows, then insert new versions (SCD Type 2).
MERGE INTO main.gold.dim_customer AS d
USING main.silver.stg_customer AS s
ON d.customer_key = s.customer_key
AND d.is_current = true
AND (coalesce(d.email, '') <> coalesce(s.email, '')
OR coalesce(d.segment, '') <> coalesce(s.segment, ''))
WHEN MATCHED THEN
UPDATE SET d.valid_to = current_timestamp(), d.is_current = false;
INSERT INTO main.gold.dim_customer
(customer_key, email, segment, valid_from, valid_to, is_current)
SELECT s.customer_key, s.email, s.segment, current_timestamp(), NULL, true
FROM main.silver.stg_customer s
LEFT JOIN main.gold.dim_customer d
ON d.customer_key = s.customer_key AND d.is_current = true
WHERE d.customer_key IS NULL
OR coalesce(d.email, '') <> coalesce(s.email, '')
OR coalesce(d.segment, '') <> coalesce(s.segment, '');
Iterated until green, and logged.
A failure on this pair doesn't go on a list. It goes back to the builders, and the loop re-runs until the suite passes.
Green on iteration 4, SCD2 effective dating and row counts reconcile to a zero delta against the legacy run.
T-SQL → Databricks, answered.
Does Databricks support stored procedures, or does every T-SQL proc become a notebook?
Both paths exist. Databricks now offers native `CREATE PROCEDURE` stored procedures in Unity Catalog (invoked with `CALL`, governed by `GRANT`) and SQL scripting (`BEGIN...END`, `IF`, `WHILE`, condition handlers), so a procedural T-SQL proc can map to a SQL stored procedure rather than always becoming PySpark. These capabilities are gated behind recent Databricks Runtime versions, so the Architect confirms the customer's Runtime at the design gate before committing to a SQL-scripting build; where it isn't available, or where logic is genuinely programmatic, the proc lands as a notebook instead.
How do cursors and `WHILE` loops translate to Spark SQL?
Wherever possible the fleet rewrites the loop as set-based logic, which is faster and more idiomatic on Spark. Spark SQL does support `OPEN`/`FETCH`/`CLOSE` cursor processing, but only on a very recent Databricks Runtime, so the fleet does not assume it. On the dbt path runtime iteration isn't a concept at all: the logic is re-expressed as a `SELECT`, and any genuine row-by-row need moves to a Python model or the orchestrator.
What happens to dynamic SQL built with `sp_executesql` or `EXEC()`?
It depends on why the SQL was dynamic. If the string was assembled to template a repeated shape, that becomes compile-time Jinja in dbt or a parameterized pattern in your framework. If it must run at runtime, the your-framework path uses `EXECUTE IMMEDIATE` in Spark SQL or string-built `spark.sql(...)` in PySpark. The parameterization and security model differ from `sp_executesql`, so the Documenter traces how the string was constructed first, dynamic SQL is one of the highest-risk constructs because the executed statement is not present statically.
Will a T-SQL `MERGE` port one-for-one to Delta `MERGE INTO`?
Semantically close, but not a literal port. Delta Lake `MERGE INTO` performs the upsert, but it is Delta-specific and does not carry the T-SQL `MERGE` `OUTPUT` clause. For SCD Type 1 (latest value overwrites) the dbt `incremental` model with `incremental_strategy='merge'` on a `unique_key` is the natural fit; for SCD Type 2 the fleet uses a dbt snapshot or an effective-dating `MERGE INTO`, because a snapshot preserves history while a plain merge overwrites it.
How are scalar UDFs and T-SQL-specific functions handled?
Every T-SQL function is checked against the Spark SQL built-in set. Common ones remap cleanly: `ISNULL` to `coalesce`/`nvl`, `GETDATE()` to `current_timestamp()`, `TOP n` to `LIMIT n`, bracketed identifiers to backticks. Scalar UDFs are re-expressed as SQL UDFs, Python UDFs, or inlined set-based logic, and their performance characteristics differ, so the Test Agent reconciles output rather than assuming equivalence. Functions with no direct equivalent are flagged at documentation time, not discovered at runtime.
Keep exploring
T-SQL migration
How the Documenter reads T-SQL, with both destinations beyond the Databricks build on this page.
Migrate to Databricks
What the fleet builds when you take T-SQL to Databricks, in dbt or your framework.
Migrate T-SQL to Fabric
The same source, the other destination, compare the fit.
Migrate SSIS to Databricks
Another source onto Databricks, run by the same fleet.
Migrate ADF to Databricks
Another source onto Databricks, run by the same fleet.
Ready to migrate T-SQL to Databricks?
Tell us about your T-SQL estate and we'll run the assessment, the Surveyor scores it before you commit to Databricks.
Plan my migrationA short form, no spam. We usually reply within one business day.