Autonomous in the loop. Accountable at the gates. An agentic offering by Plainsight

Migrate T-SQL to Databricks.

Procedural T-SQL, dynamic SQL and stored procedures rebuilt as Delta-native logic on Databricks, accountable at every gate.

Databricks logo

Why T-SQL to Databricks is a dialect crossing, not a copy

Moving T-SQL onto Databricks looks deceptively simple (it’s SQL going to a SQL engine), but it crosses a dialect boundary. T-SQL and Spark SQL share keywords and diverge underneath them. A naive lift-and-shift that pastes a stored procedure into a notebook will fail on the first GETDATE(), the first ISNULL, the first TOP n, the first [bracketed] identifier. Worse, the failures that don’t throw (the ones that run and return subtly wrong numbers) are the expensive ones. The agent fleet treats this pair as a rebuild that preserves behavior, not a syntax find-and-replace, and it proves equivalence with a test suite rather than asserting it.

The bulk of the business logic in a Microsoft estate usually sits in the T-SQL layer: stored procedures, views, scalar and table-valued functions, and ad-hoc batches. The Documenter reads that layer for what it does, not just what it is named. It separates business logic from plumbing, traces dependency chains across objects and databases, and flags the constructs that will not survive the move unchanged: dynamic SQL, cursors and procedural loops, cross-database and linked-server references, and dialect-specific built-ins. That documentation becomes the contract the builders work against and the Reconciler checks against.

What actually breaks in a lift-and-shift

Three categories drive the rewrite, and each is in the legacy code precisely because it was convenient in SQL Server.

Procedural control flow. Long imperative procedures - cursors, WHILE loops, IF/ELSE branching, multi-step transactions, temp tables - are the costliest to convert. Historically Spark SQL had no procedural form at all, which is why such procedures often map to notebooks or PySpark. That has changed: Databricks now supports SQL scripting (the SQL/PSM standard) inside BEGIN...END blocks with IF, CASE, WHILE, REPEAT, LOOP, FOR, LEAVE/ITERATE, local variables and condition handlers that SIGNAL/RESIGNAL. It also supports native CREATE PROCEDURE stored procedures in Unity Catalog. But these arrived in recent Databricks Runtime versions, and cursor processing (OPEN/FETCH/CLOSE) is gated to an even more recent one. The fleet never assumes a Runtime; the Architect confirms it at the design gate, and where the feature isn’t present, set-based rewrites and PySpark fill the gap.

Dynamic SQL. Strings assembled at runtime and run through EXEC() or sp_executesql are the single most common home for undocumented behavior, because the executed statement is never present statically. The Documenter traces how each string is built before anything is rebuilt. Where the dynamic shape was really a template, it becomes compile-time Jinja on the dbt path; where it must run at runtime, the your-framework path uses EXECUTE IMMEDIATE in Spark SQL or spark.sql(...) in PySpark. The parameterization and security model differ from sp_executesql, so this is reviewed case by case rather than mechanically translated.

Constructs with no clean equivalent. A T-SQL MERGE maps to Delta Lake MERGE INTO - semantically close, but Delta-specific and without the T-SQL OUTPUT clause. Triggers have no equivalent and must be re-expressed as explicit pipeline or job steps. Scalar UDFs and multi-statement table-valued functions become SQL UDFs, Python UDFs, or set-based logic, with different semantics and performance. Three- and four-part / linked-server names become Unity Catalog’s catalog.schema.table three-level namespace and federation - a redesign, not a port. IDENTITY and sequences map to Delta GENERATED ... AS IDENTITY or hash-based surrogate keys, which behave differently from their T-SQL counterparts.

Two flavors, one provable result

Both delivery flavors land on Delta Lake under Unity Catalog; what differs is the shape of the deliverable. On the dbt path, each transformation is a SELECT-based model and dbt owns materialization, run order via ref(), and schema management. Upserts become incremental models with incremental_strategy='merge' (the Delta default), which compiles to MERGE INTO on a unique_key, SCD Type 1 overwrite semantics. History-preserving SCD Type 2 becomes a dbt snapshot, which maintains dbt_valid_from, dbt_valid_to and dbt_scd_id and requires the Delta file format on Databricks. Imperative procedures are decomposed into ordered, idempotent models, with Python models reserved for logic pure SQL can’t express. Runtime control flow lives in the orchestrator, never in a model: Jinja is compile-time templating, not runtime branching.

On the your-framework path, the fleet builds into the way your team already ships on Databricks: Spark SQL and PySpark notebooks, SQL scripting blocks, native stored procedures, and Lakeflow Jobs for the orchestration that nested procedures used to carry. The before/after above shows the same hand-rolled SCD2 procedure two ways, as a declarative dbt snapshot, and as an effective-dating MERGE INTO in a notebook task.

Whichever flavor you choose, the loop is the same. The Operator deploys to DEV and runs for real; the Test Agent asserts schema parity, row-count reconciliation, SCD effective dating, and null and key integrity; the Reconciler runs legacy and rebuilt side by side until the delta is zero or explained. A failure goes back to the builders, not onto a backlog, and the loop re-runs, iterating until green. Autonomous in the loop, accountable at the gates: your experts hold the assessment, design and promotion gates, and the build-test-run loop does the iterating in between. If your target is Microsoft Fabric instead, the same approach is described in Migrate T-SQL to Microsoft Fabric.

How the fleet runs it

T-SQL to Databricks, stage by stage.

For T-SQL → Databricks, the same five stages run with the agents that handle this pair named and linked below. Inside the loop they iterate on their own; your experts hold the three gates.

Build → Test → Operate iterates until green on Databricks

Orchestrated end-to-end by The Conductor , The Librarian and The Chronicler across every T-SQL → Databricks asset.

dbt or your framework

Built your way, on Databricks.

Take T-SQL to Databricks as an open-source dbt project (free to run), or built into the framework your team has trusted for years: same migration, your standards either way.

What gets generated

dbt
  • dbt models on a Databricks SQL warehouse via the dbt-databricks adapter (view, table, incremental, materialized_view)
  • Incremental models using the merge strategy (the Delta default), compiling to Delta Lake MERGE INTO on a unique_key
  • dbt snapshots for SCD Type 2 over mutable dimensions, with dbt_valid_from / dbt_valid_to / dbt_scd_id
  • Macros that replace repeated dynamic-SQL string-building with compile-time Jinja
  • Generic and singular data tests, sources in sources.yml, and generated docs / lineage

Pattern mapping

Each T-SQL pattern, mapped to a deliberate Databricks target.

Source pattern
Databricks · dbt
Databricks · your framework
CREATE PROCEDURE with multi-step writes
Decomposed into ordered, idempotent models + pre/post-hooks; complex logic to Python models
CREATE PROCEDURE in Unity Catalog running a SQL-scripting BEGIN...END block, or a notebook task
Cursor / WHILE loop over a row set
Re-expressed set-based as a SELECT; runtime iteration not a dbt concept
Set-based Spark SQL where possible; cursor OPEN/FETCH/CLOSE only on a very recent Runtime
MERGE upsert into a dimension
incremental model, incremental_strategy='merge' on unique_key (SCD-1 overwrite)
Delta Lake MERGE INTO, or DeltaTable.merge in PySpark
Hand-rolled SCD Type 2 history table
dbt **snapshot** (timestamp or check strategy) on Delta
MERGE INTO implementing effective-dating against a Delta dimension
Dynamic SQL via sp_executesql / EXEC()
Jinja templating to generate the SQL at compile time
EXECUTE IMMEDIATE (Spark SQL) or string-built spark.sql(...) in PySpark
Scalar UDF in a SELECT
Macro for reusable SQL, or inline the logic in the model
SQL UDF or Python UDF, semantics and performance differ from T-SQL
IF/ELSE control-of-flow in a proc
Jinja if for compile-time branching; runtime branching to the orchestrator
SQL scripting IF/CASE in a BEGIN...END block, or PySpark
Three/four-part / linked-server reference
source() / ref() against Unity Catalog catalog.schema.table
Unity Catalog three-level namespace; federation, not a literal port
#temp table staging
Ephemeral model (inlined CTE) or a staging model
Temporary view, CTE, or a staging Delta table
GETDATE(), ISNULL, TOP n, [brackets]
current_timestamp(), coalesce/nvl, LIMIT n, backticked identifiers
Same Spark SQL built-ins; every T-SQL function checked against the Spark set

Before & after

A real T-SQL snippet and what the builders generate on Databricks.

usp_LoadDimCustomer.sql legacy
CREATE PROCEDURE dbo.usp_LoadDimCustomer
AS
BEGIN
    SET NOCOUNT ON;

    -- close superseded versions, then insert the new ones (hand-rolled SCD2)
    DECLARE @now DATETIME = GETDATE();

    UPDATE d
    SET d.ValidTo = @now, d.IsCurrent = 0
    FROM dbo.DimCustomer d
    INNER JOIN stg.Customer s ON s.CustomerKey = d.CustomerKey
    WHERE d.IsCurrent = 1
      AND (ISNULL(d.Email, '') <> ISNULL(s.Email, '')
        OR ISNULL(d.Segment, '') <> ISNULL(s.Segment, ''));

    INSERT INTO dbo.DimCustomer (CustomerKey, Email, Segment, ValidFrom, ValidTo, IsCurrent)
    SELECT s.CustomerKey, s.Email, s.Segment, @now, NULL, 1
    FROM stg.Customer s
    LEFT JOIN dbo.DimCustomer d
      ON d.CustomerKey = s.CustomerKey AND d.IsCurrent = 1
    WHERE d.CustomerKey IS NULL
       OR ISNULL(d.Email, '') <> ISNULL(s.Email, '')
       OR ISNULL(d.Segment, '') <> ISNULL(s.Segment, '');
END
dbt
snapshots/dim_customer.yml dbt
snapshots:
  - name: dim_customer
    relation: ref('stg_customer')
    config:
      unique_key: customer_key
      strategy: check
      check_cols: ['email', 'segment']
      # delta file format is required for snapshots on databricks
      file_format: delta
# dbt closes the old version and inserts a new one, maintaining
# dbt_valid_from / dbt_valid_to / dbt_scd_id, no hand-rolled effective dating.
The loop that closes itself

Iterated until green, and logged.

A failure on this pair doesn't go on a list. It goes back to the builders, and the loop re-runs until the suite passes.

t-sql-to-databricks · iteration log build-test-operate
iter 1 iter 1, FAIL: `ISNULL(d.Email,'')` ported verbatim; Spark SQL has no `ISNULL` two-arg form. Rewritten to `coalesce`.
iter 2 iter 2, FAIL: `GETDATE()` and `[bracketed]` identifiers rejected by the parser. Swapped to `current_timestamp()` and backticks; new versions duplicated on re-run.
iter 3 iter 3, FAIL: MERGE match predicate missed `is_current = true`, so old versions were reopened. Predicate tightened.
iter 4 iter 4, GREEN: row counts reconcile, SCD2 effective dating matches legacy, delta 0 against the Reconciler baseline.

Green on iteration 4, SCD2 effective dating and row counts reconcile to a zero delta against the legacy run.

Questions for this migration

T-SQL → Databricks, answered.

Does Databricks support stored procedures, or does every T-SQL proc become a notebook?

Both paths exist. Databricks now offers native `CREATE PROCEDURE` stored procedures in Unity Catalog (invoked with `CALL`, governed by `GRANT`) and SQL scripting (`BEGIN...END`, `IF`, `WHILE`, condition handlers), so a procedural T-SQL proc can map to a SQL stored procedure rather than always becoming PySpark. These capabilities are gated behind recent Databricks Runtime versions, so the Architect confirms the customer's Runtime at the design gate before committing to a SQL-scripting build; where it isn't available, or where logic is genuinely programmatic, the proc lands as a notebook instead.

How do cursors and `WHILE` loops translate to Spark SQL?

Wherever possible the fleet rewrites the loop as set-based logic, which is faster and more idiomatic on Spark. Spark SQL does support `OPEN`/`FETCH`/`CLOSE` cursor processing, but only on a very recent Databricks Runtime, so the fleet does not assume it. On the dbt path runtime iteration isn't a concept at all: the logic is re-expressed as a `SELECT`, and any genuine row-by-row need moves to a Python model or the orchestrator.

What happens to dynamic SQL built with `sp_executesql` or `EXEC()`?

It depends on why the SQL was dynamic. If the string was assembled to template a repeated shape, that becomes compile-time Jinja in dbt or a parameterized pattern in your framework. If it must run at runtime, the your-framework path uses `EXECUTE IMMEDIATE` in Spark SQL or string-built `spark.sql(...)` in PySpark. The parameterization and security model differ from `sp_executesql`, so the Documenter traces how the string was constructed first, dynamic SQL is one of the highest-risk constructs because the executed statement is not present statically.

Will a T-SQL `MERGE` port one-for-one to Delta `MERGE INTO`?

Semantically close, but not a literal port. Delta Lake `MERGE INTO` performs the upsert, but it is Delta-specific and does not carry the T-SQL `MERGE` `OUTPUT` clause. For SCD Type 1 (latest value overwrites) the dbt `incremental` model with `incremental_strategy='merge'` on a `unique_key` is the natural fit; for SCD Type 2 the fleet uses a dbt snapshot or an effective-dating `MERGE INTO`, because a snapshot preserves history while a plain merge overwrites it.

How are scalar UDFs and T-SQL-specific functions handled?

Every T-SQL function is checked against the Spark SQL built-in set. Common ones remap cleanly: `ISNULL` to `coalesce`/`nvl`, `GETDATE()` to `current_timestamp()`, `TOP n` to `LIMIT n`, bracketed identifiers to backticks. Scalar UDFs are re-expressed as SQL UDFs, Python UDFs, or inlined set-based logic, and their performance characteristics differ, so the Test Agent reconciles output rather than assuming equivalence. Functions with no direct equivalent are flagged at documentation time, not discovered at runtime.

Let's talk

Ready to migrate T-SQL to Databricks?

Tell us about your T-SQL estate and we'll run the assessment, the Surveyor scores it before you commit to Databricks.

Plan my migration

A short form, no spam. We usually reply within one business day.

Plan my migration