Run Adhoc queries in DBT with Jinja
DBT Models are great in separating SQL into their own enclosures which can be summoned anytime by Directed A-cyclic Graph code scripted in Python.
I have mostly witnessed plain SQL sprinkled with a few built-in DBT methods such as ref() and source() within the model files. I did wonder if the logic within these models can be changed during run time like we do with DAG code (obviously via Airflow’s variables).
With SQL injection option out of the window; Conditional block provided by Jinja was the obvious answer.
Jinja?
For Novice users; Jinja is a templating language that references certain variables; values for which are then rendered during runtime! In short, it is possible to advice DBT on how to handle certain conditions that might occur during runtime.
This link should provide more enlightenment over Jinja templating.
Where do we use this?
There are specific use cases where this trick comes in handy:
- Adhoc code execution: when DBT is expected to run a specific logic only on a special occasion or for a bug that doesn’t occur too often. An if-else block here could house code for both the usual as well as the Adhoc logic.
- Re-instating only certain parts of a dataset: This could be useful to re-instate only a specific section or partition of the table to cut down on costs or while dealing with with large datasets.
Cool! How do I set it up?
Apart from the usual step of establishing an integration between DBT and Airflow [link of the article];
we will need the following pieces:
- Variable defined within the project YML file to perform the switch between the if and the else block.
In this case execution_mode defined within the project yml file will be used to perform the switch.
Note: Providing a default value in the yml file provides DBT a context on the data type of the variable. Also, DBT uses this variable during compilation.
2. The if-else code block within the DBT model file (referencing the variable defined in the previous step). link
The model file has an if block which will execute a partial refresh for 2 out of 5 categories of the Market_Segment column.
Followed by the else block which will then refresh metrics for all the categories of the Market_Segment column.
Remember to end the block with endif statement.
3. Code to pass a value to the DBT variable from the Airflow DAG file.
The vars section of the bash command will specify the variable name (defined within the DBT project) along with the value to be passed. In this case, partial_refresh will be passed to trigger the if block of the model.
A change in the value of the variable from partial_refresh to complete_refresh will then trigger the else block of the model.
In case this concept is new or confusing; it is always helpful to execute the DBT models in the local CLI/airflow environment to get a deeper understanding of the inner workings of Jinja in DBT.
To illustrate the above mentioned effect and to debug, the following exercise can be done:
- Modify the value of the variable in the project yml file to either
- partial_refresh
- complete_refresh
2. run dbt compile in the terminal window.
The following result should be visible within the project’s target/compiled folder.
Notice that the SQL within the compiled model file changes as and when the variable value is modified and compiled.
In short Jinja provides a slick way of choosing a particular block for execution during runtime, making DBT more versatile and dynamic.
Imagine the possibilities!
Let’s Airflow it!
While most of the tweaking happens on the DBT side; the changes on the scheduling end are minimum.
The only job that Airflow performs here is to pass a value to the Variable defined within the DBT project file.
checkout the git code to follow along.
Results in snowflake can be viewed with this SQL code:
To start the illustration an initial run is performed by passing complete_refresh to the switch variable.
This completely refreshes the snowflake table.
Modifying the variable value from complete_refresh to partial_refresh and re-executing the DAG will refresh only part of the table.
Checking out the snowflake table after the run:
Notice that metrics for only 2 categories named AUTOMOBILE & MACHINERY have been refreshed while others stay untouched.
What just happened?
If the above didn’t make sense then do not fret! It does take a couple of runs to get a grip of Jinja’s actual capabilities.
To summarize, it is possible to divide the DBT model file into multiple sections with a conditional block that depends on a variable (defined in the project YML file).
When Airflow/Terminal window passes a value to the DBT variable while performing a run command; DBT re-compiles the model file with the code from either the if block or the else block.
Thus Jinja templating can be used to infuse dynamic behavior into DBT.
Future Enhancements
Airflow DAG code was passing a hard coded value to the DBT variable (named execution_mode). This was done to illustrate the value being passed during each run.
Note: Hard coding values in Production code isn’t advisable.
As an enhancement; try storing this switch variable value in an airflow variables file (usually from Airflow UI) and passing that variable directly into the bash command.
This makes the DAG more dynamic and avoids the need to refresh the DAG code every time the variable value is changed.