Why dbt Isn't Just SQL With Extra Steps

When I first encountered dbt, my instinct was dismissive. It's just SQL in Jinja templates, right? You still write SELECT statements. You're still just transforming tables. What's the point of the abstraction?

That view is wrong, and it took me about six months of real use to understand why.

The Problem dbt Is Actually Solving

Most data teams I've worked with or talked to have some version of the same mess: a collection of SQL scripts in a shared folder (or worse, inside a BI tool), no clear ownership, transformations duplicated across three different dashboards with slightly different logic, and no way to know if anything is actually working correctly.

This isn't a tooling problem. It's an engineering discipline problem applied to data work. dbt is what happens when software engineering principles finally reach the SQL layer.

Version Control Is the Foundation

The most underrated feature of dbt isn't the DAG visualisation or the macros or even the testing framework. It's the fact that your transformations live in a git repository.

This sounds trivial. It isn't. When your SQL lives in git, you get:

A full history of every change to every model
Pull request reviews for data logic changes (which is where most data bugs are born)
Branches, so you can develop a new metric definition without breaking production
CI/CD pipelines that run your tests before anything hits the warehouse

Previously, changing a metric definition meant editing a view in Snowflake directly, hoping you remembered what it did three months later, and telling no one.

Testing Changes Everything

dbt's testing framework is deceptively simple — you can assert that a column is not null, that values are unique, that foreign keys are valid, that a column only contains values from a defined set. But the downstream effect is profound.

When I added dbt tests to our existing models, we found four bugs in the first day. Joins that were producing duplicate rows. A date field that was sometimes NULL when the business logic said it couldn't be. A customer ID that was joining on the wrong grain.

None of these would have been caught by a human reviewing a dashboard. They were invisible in the output but wrong in the logic. Tests made the invisible visible.

Documentation That Actually Gets Read

dbt generates a documentation site from your model YAML files. More importantly, because the docs live next to the code, they stay in sync. The description of a model and its columns is in the same pull request as the model itself.

I've seen too many data catalogues that were built with good intentions and abandoned within six months because maintaining them was a separate job from doing the work. dbt makes the two the same job.

The AI-Ready Data Layer

Here's the part that's become important to me recently. When you want an AI to reason over your data, the quality of your semantic layer is everything. An AI that has access to well-named, well-described, well-tested dbt models — with metric definitions and business logic documented in YAML — can produce dramatically better outputs than one pointed at raw warehouse tables.

This is not a coincidence. dbt's design philosophy, built around explicit models, tests, and docs, happens to be exactly what you need to build a reliable AI-native data layer. The investment compounds.

My Take

dbt isn't just SQL with extra steps. It's SQL with version control, tests, documentation, and a dependency graph. That's not extra steps — that's the difference between a script and a system.