Why dbt Isn't Just SQL With Extra Steps
Every data team I know started with a folder of SQL files. dbt isn't a fancier way to run that SQL — it's a fundamentally different way to think about your data layer.
Why dbt Isn't Just SQL With Extra Steps
When I first encountered dbt, my instinct was dismissive. It's just SQL in Jinja templates, right? You still write SELECT statements. You're still just transforming tables. What's the point of the abstraction?
That view is wrong, and it took me about six months of real use to understand why.
The Problem dbt Is Actually Solving
Most data teams I've worked with or talked to have some version of the same mess: a collection of SQL scripts in a shared folder (or worse, inside a BI tool), no clear ownership, transformations duplicated across three different dashboards with slightly different logic, and no way to know if anything is actually working correctly.
This isn't a tooling problem. It's an engineering discipline problem applied to data work. dbt is what happens when software engineering principles finally reach the SQL layer.
Version Control Is the Foundation
The most underrated feature of dbt isn't the DAG visualisation or the macros or even the testing framework. It's the fact that your transformations live in a git repository.
This sounds trivial. It isn't. When your SQL lives in git, you get:
- A full history of every change to every model
- Pull request reviews for data logic changes (which is where most data bugs are born)
- Branches, so you can develop a new metric definition without breaking production
- CI/CD pipelines that run your tests before anything hits the warehouse
Previously, changing a metric definition meant editing a view in Snowflake directly, hoping you remembered what it did three months later, and telling no one.
Testing Changes Everything
dbt's testing framework is deceptively simple — you can assert that a column is not null, that values are unique, that foreign keys are valid, that a column only contains values from a defined set. But the downstream effect is profound.
When I added dbt tests to our existing models, we found four bugs in the first day. Joins that were producing duplicate rows. A date field that was sometimes NULL when the business logic said it couldn't be. A customer ID that was joining on the wrong grain.
None of these would have been caught by a human reviewing a dashboard. They were invisible in the output but wrong in the logic. Tests made the invisible visible.
Documentation That Actually Gets Read
dbt generates a documentation site from your model YAML files. More importantly, because the docs live next to the code, they stay in sync. The description of a model and its columns is in the same pull request as the model itself.
I've seen too many data catalogues that were built with good intentions and abandoned within six months because maintaining them was a separate job from doing the work. dbt makes the two the same job.
The AI-Ready Data Layer
Here's the part that's become important to me recently. When you want an AI to reason over your data, the quality of your semantic layer is everything. An AI that has access to well-named, well-described, well-tested dbt models — with metric definitions and business logic documented in YAML — can produce dramatically better outputs than one pointed at raw warehouse tables.
This is not a coincidence. dbt's design philosophy, built around explicit models, tests, and docs, happens to be exactly what you need to build a reliable AI-native data layer. The investment compounds.
My Take
dbt isn't just SQL with extra steps. It's SQL with version control, tests, documentation, and a dependency graph. That's not extra steps — that's the difference between a script and a system.
Newsletter
Enjoyed this?
I write occasionally about data, AI, and building things that actually work. No noise, just signal.
Questions or thoughts? nithinprasad93@gmail.com