SQL to dbt: Get Started with Analytics Engineering

There are many data analysts stuck in old ways of using SQL for data analytics when they could be learning the mindset and tools for Analytics Engineering that could change things dramatically.

Aug 23, 2025

Hi there! Alejandro here 😊
Suscribe if you like to read about technical data & AI learnings, deep dives!
Enjoy the reading and let me know in the comments what you think about it 👨🏻‍💻

SQL was my first analytics tool, just like for many data pros.

I wrote custom SQL queries in Looker Studio, created aggregations, and debugged 1000-line SQL queries someone else had over-engineered—then got excited when I discovered cool features like WINDOW FUNCTIONS.

But SQL also caused some headaches.

Scheduled queries (also called stored procedures) are a pain. They often contain long blocks of repeated code, changed only by a CASE WHEN condition.

There was no clear documentation explaining what each TABLE or VIEW did or what the column metadata meant. In the best-case scenario, some good soul was adding a TABLE description that you could check to have some context.

When errors happened, we played "guess which script broke the pipeline" or wondered if a colleague's latest changes had ruined our carefully crafted transformations.

Developers fighting custom requests often apply patches "here and there," which result in Frankensteins scattered all over the place. This is not a SQL thing per se, but it's usually how SQL creates technical debt.

Because of all this, dbt can become your new best friend.

⚠️ Disclaimer: I've used dbt for over three years and haven't explored other options like SDF (acquired by dbt), SQLMesh, or Dataform deeply enough to compare. Saying good things about dbt doesn't mean the others aren't good. I encourage you to try them yourself!

Let's get started.

Generated image — Image generated by ChatGPT.

🎯 You Already Know 90% Of What You Need

dbt isn't a new language. It's SQL with "superpowers." You keep using SQL but within a context where everything is properly connected to enhance logic transformations chains across all your data layers.

Whether you use PostgreSQL, BigQuery, Snowflake, or Redshift, the transition is smooth.

You'll learn skills like Jinja templating, dbt commands, and data modeling techniques. These will help you get the most out of moving from plain SQL to dbt.

At its core, dbt runs dependent SQL queries in a logical order. It sounds simple, yet it took a long time for this to appear in the market.

dbt transforms data to make it ready for analytics.

dbt handles the T in ETL (Extract, Transform, Load). Many papers credit dbt as a key reason for the shift from ETL to ELT, where transformations happen after loading.

🔍 SQL Vs. dbt: A Deep Dive On Queries

Your current SQL might look like this:

-- some_important_analysis.sql 

SELECT customer_id, 
       SUM(order_amount) as total_spent 
  FROM (
      SELECT table_1.column1, 
             table_b.column2, 
             table_a.column3,... 
      FROM table_a 
      JOIN table_b 
        ON table_a.id = table_b.id
      ) as raw_orders 
 WHERE order_date >= '2024-01-01' 
 GROUP BY customer_id;

In dbt, the same logic becomes:

-- models/marts/customer_summary.sql 

SELECT customer_id, 
       SUM(order_amount) as total_spent 
  FROM {{ ref('staging_orders') }} 
 WHERE order_date >= '2024-01-01' 
 GROUP BY customer_id

The {{ ref('staging_orders') }} handles dependency management for you.

In the first example, raw_orders is a VIEW in your database with some logic, probably created by other analyses or backend processes.

dbt compiles SQL cleanly, making your code easy to trace and ready for new team members.

⚔️ What Concepts Does dbt Bring To The Table?

The biggest improvements are modularity, dependency management, testing, version control, and Jinja templating.

🧘 Modularity: Break The Problem Into Pieces

Old way: SQL files scattered across your database. You open a schema and see a list of TABLEs and VIEWs with no context or order. Good luck finding anything without debug mode.

dbt way: Everything lives in a structured project with organized folders. The only risk is messy projects where people ignore clean structure and put files anywhere.

Thanks for reading The Pipe & The Line! This post is public so feel free to share it.

🆘 No More "Table Not Found" Panic Attacks

dbt adds the ref() macro, which builds a dependency graph.

dbt runs everything in the right order, every time.

Remember that earlier code? The ref() macro makes sure everything compiles cleanly and runs smoothly across your project.

You might write a 500-line file, but with ref(), it looks like a 50-line file.

With AI-powered IDEs like Cursor, you can add rules to scan dependencies faster. This speeds up understanding projects when making precise changes.

✅ Fast Testing And Deployment

dbt has built-in tests to catch nulls, duplicates, and broken relationships before production.

You can even come up with your own custom checks to test edge cases for your particular business needs.

You can run these tests during development and deployment to keep everything on track.

⚒️ Version Control With Git

One big problem in database development is tracking changes over time. Without version control, tracing issues is hard.

dbt solves this by being repository-driven, so you know exactly when changes happened and rollback without all the struggle of figuring out what happened when.

🧙 Macros With Jinja Templating

This is probably the only new concept you need to learn, and it's intuitive enough for you to follow.

Jinja lets you write dynamic SQL that adapts based on conditions, variables, and logic.

These are dbt macros ways to reuse code without rewriting it.

If you need variables or dynamic logic, you set them inside a macro. The code then behaves as you want without complex edge-case handling.

For example:

{% set important_date = '2024-01-01' %} 

SELECT * 
  FROM {{ ref('orders') }} 
 WHERE order_date >= '{{ important_date }}' 
{% if var('include_refunds', false) %} 
   AND status != 'refunded' 
{% endif %}

At runtime, this compiles to:

SELECT * 
  FROM (
    SELECT columns 
      FROM orders
     ) 
 WHERE order_date >= '2024-01-01'

Why? important_date is a static value. include_refunds defaults to false, so the extra status is not considered during compilation.

The beauty is in the simplicity: you're still writing SQL, just with some template magic.

Imagine a 20-condition CASE WHEN using Jinja templates. Amazing!

What dbt isn’t

There's a common idea that dbt = Analytics Engineering, but that's not quite right.

Yes, dbt is mostly used by that role, but it's also a tool for data modeling and that doesn't mean dbt equals data modeling.

I explain why having a strong cross-functional skill set is valuable for your data career in this one in case you are interested:

Why Being A Data Analyst "On Steroids" Is The Best Data Career Path To Pursue

Alejandro Aboy

Aug 13

Read full story

🥇 The Medallion Architecture

If you've worked with data modeling, you know concepts like normalization, denormalization, star schema, dimensions, and facts. These are clean ways to design database schemas with business in mind.

This post isn't about data modeling, but I'll show how dbt helps improve it using medallion architecture principles.

Qué es la arquitectura Medallion en un contexto Data Lakehouse? — Source: https://blog.bismart.com/arquitectura-medallion-data-lakehouse

This architecture breaks down into Bronze, Silver, and Gold layers. dbt usually calls these staging, intermediate, and marts/facts:

🥉 Bronze Layer (Raw Ingestion - Staging)

Use source() to reference raw tables.

FROM {{ source('raw_system', 'customer_data') }} as stg_customers

🥈 Silver Layer (Cleaned & Standardized - Intermediate)

Clean and standardize data in staging models.

SELECT UPPER(TRIM(customer_name)) as customer_name, 
       PARSE_DATE('%Y-%m-%d', date_string) as clean_date 
  FROM {{ ref('stg_customers') }} as int_customers

🥇 Gold Layer (Business Ready Analytics - Marts/Facts)

Aggregate data for business use.

SELECT region, 
       SUM(revenue) as total_revenue 
  FROM {{ ref('int_customer') }} 
  LEFT JOIN {{ ref('int_sales') }} 
    ON int_customer.customer_id = int_sales.customer_id 
GROUP BY region

🚀 My Learning Path Recommendation To Become An Analytics Engineer

Take the dbt fundamentals course. It's free and easy to follow.
Find a real project and slowly implement dbt layers: source, staging, intermediate, and so on.
Remember common issues and use built-in or custom dbt tests to catch them before production.
Identify repeated patterns and convert them into macros.

Plan this as a 3 month journey, depending on your workload.

Work with real data. Sample projects are good to start but can leave you stuck if you don't try real scenarios.

Learning dbt doesn't just make you more relevant.

It transforms you from a data analyst into an analyst on steroids or a data engineer with strong analytical skills (or Analytics Engineer).

But being Analytics Engineer is not dependant on knowing dbt, it’s about transforming business logic into analytics ready assets with the right mix of data modeling and technology.

You become someone who can build production level data pipelines, create self-documenting data models and scale analytics across entire organizations.

📝 TL;DR

• dbt is 90% SQL you already know, plus 10% templating magic
• ref() and source() replace hard-coded table names and manage dependencies
• Use Jinja templates to add variables and logic to your SQL without losing sanity
• Catch data quality issues before they reach production
• Use the Medallion Architecture for clean, self-documenting data pipelines
• Transform from a "SQL analyst" into an Analytics Engineer

If you enjoyed the content, hit the like ❤️ button, share, comment, repost, and all those nice things people do when like stuff these days. Glad to know you made it to this part!

Hi, I am Alejandro Aboy. I am currently working as a Data Engineer. I started in digital marketing at 19. I gained experience in website tracking, advertising, and analytics. I also founded my agency. In 2021, I found my passion for data engineering. So, I shifted my career focus, despite lacking a CS degree. I'm now pursuing this path, leveraging my diverse experience and willingness to learn.

Follow me on Linkedin

The Pipe & The Line