As a data engineer, I spend a lot of time working with YAML files. These files contain information about the data that we’re working with, including the column names, data types, and other metadata. But I’ve always found it frustrating to manually reorganize these files, especially when I’m dealing with large datasets.
That’s when I took inspiration from an existing dbt model called generate_model_yaml
and decided to create a custom macro that would automatically generate my YAML for new dbt models, as well as organizing the columns. The reason for organizing it in such a way is that it aligns with our company YAML naming standards. This new macro adds the much-needed functionality to the existing macro, which includes unnecessary boilerplate code, always adds description rows and does not organize the columns in any way. In addition, I’ve added error handling and validation to the macro, which checks if the model_name provided as an argument exists in the dbt project. This will help prevent potential issues due to incorrect input.
Now let’s dive into the custom macro implementation.
This macro organizes columns into four categories based on their names:
- ID columns: Columns that end with
_id
- Date columns: Columns that end with
_date
or_at
- Fivetran metadata columns: Columns that start with an underscore
_*
- Everything else: All other columns
The macro then sorts the columns alphabetically within each category and concatenates them into a single ordered list. Finally, the macro generates a YAML file with the organized columns.
To use the generate_ordered_yaml
macro, include it in your dbt project under the /macros
directory. Then, run the macro using the following command, replacing <your_dbt_model>
with the name of the dbt model for which you want to generate the YAML file
Here’s an example of the generated YAML file after running the macro:
Creating custom dbt macros like generate_ordered_yaml
can significantly streamline your data engineering tasks and help you work more efficiently with large datasets. By leveraging the power of Jinja and dbt, you can easily generate and organize YAML files for your dbt models. For more information on creating custom dbt macros, including the fundamentals of Jinja, check out this tutorial by Madison Mae