How to implement blue green deployment in your data warehouse"

Author

Pedro Roque

Advisor, Analytics Engineer

1. What is Blue-Green Deployment?

2. How does blue-green deployment in a data warehouse work?

3. What if the tests indicate that the update was unsuccessful?

4. How to implement Blue-Green deployment?

5. Configure dbt cloud job and environment

Conclusion

1. What is Blue-Green Deployment?

In modern data pipelines, the source data is ingested continuously, and the data warehouse is built repeatedly based on the latest data. A new data warehouse version is created for the users with the latest transformation specifications with ever-shortening intervals: weekly, daily, multiple times a day, and even hourly, based on the situation.

From time to time, these updates in data and transformation may result in unintended consequences, and failures may happen. These failures may cause some or all of the final exposed data to be missing or wrong. Of course, there usually are tests (standard and custom tests for previously known scenarios) that catch and alert to possible failures. These tests warn the owners that something went wrong - while the data reaches production in that state, with errors. While the action is taken and fixes are applied, the data will continue to be exposed with errors or unavailable. This is not a desired outcome at all.

Ideally, there needs to be a final checkpoint to ensure everything has run as expected. If yes, the new data warehouse version can be exposed to the users. Blue-green Deployment does just that with the steps below:

Step 1. Create a separate secondary data warehouse (green)
Step 2. Build the data transformation of this data warehouse independent from the one in production (blue)
Step 3. And when all the processes have run correctly and all tests have passed, replace the production version with the new one. (blue and green database swap)

‍

2. How does blue-green deployment in a data warehouse work?

While one environment (Blue) is in production, available for its dependencies, the other environment (Green) is waiting for the following data refresh.

When the data is updated, the data transformation steps are complete, and all tests indicate that the update was successful; the two environments "swap" places.

This happens momentarily in the data platform, and now your ML models, reports, and dashboards all point to this new data version - without interruption and with minimized (hopefully eliminated) risk of disruption.

3. What if the tests indicate that the update was unsuccessful?

In this case, the “swap” does not take place. The applications continue consuming data from the Blue environment. Data unavailability or inaccurate data scenarios are avoided. During this time, the root cause of the test failure is searched, addressed, and the refresh will be rerun.

4. How to implement Blue-Green deployment?

Please note: The implementation steps below are based on the dbt and snowflake environment. This article will still be helpful regarding the essential concepts if dbt and Snowflake are not part of your current data stack.

Below we will walk you through all the steps to implement blue/green in a dbt project (let's say the project is called my_transformation ) that uses Snowflake as Data Warehouse.

1º Prepare your Database Set Up
2º Configure your dbt_project.yml file
3º Create the macros you need
1. ref()
2. get_snapshot_write_schema()
3. ref_snapshot
4. swap_database
4º Configure dbt cloud job and environment
1. Add operation
2. Set target name
3. Set the database name

Prepare your Database Set Up

First, we must ensure that we have the necessary structure for the blue-green deployment. For that, we need the following:

A blue,
a green,
and a snapshots database

In this example, we will use the following:

Emil

Tobias

All applications should read from the “blue”, and your development branches should be merged to the green database.

We'll use the same schema name, my_schema, for the blue and green databases.

Configure your dbt_project.yml file.

Add configuration for snapshots for the target_database


snapshots:
    +target_database: my_transformation_snapshots

Create the macros you need

a. ref()

This macro will overwrite the default behavior of the dbt ref macro. Now instead of getting the full path for a table, it will not include the database name. This is important since we will eventually swap the blue and green databases - and we do not want the green database reference to persist.

E.g.,

Default Behavior

Database_name.schema_name.table_name

New Behavior

schema_name.table_name


{% macro ref(model_name) %}

    {% set rel = builtins.ref(model_name) %}

    {% if rel.database == 'my_transformation_snapshots' %}
        {% do return(builtins.ref(model_name)) %}
    {% else %}
        {% do return(builtins.ref(model_name).include(database=false)) %}
    {% endif %}

{% endmacro %}

b. get_snapshot_write_schema()

Based on the dbt target name, this macro will return the schema_name where the snapshot model will be written. The behavior we want is to write to the production schema only when we are in production.


{% macro get_snapshot_write_schema() %}

    {% if target.name in ('prod') %}

        {% set var_target_schema = 'snapshots' %}

    {% else %}

        {% set var_target_schema = 'snapshots_' ~  target.name %}

    {% endif %}
    
    {% do return(var_target_schema) %}

{% endmacro %}

Make sure you add the following config to all snapshot models in your dbt_project


{{
  config(
    target_schema=get_snapshot_write_schema()
  )
}}

c. ref_snapshot

This macro will get the relation for the snapshot model. Even in a dev environment, we want our models to read from production snapshots. This helps us emulate the production environment better while in development environments by

Accessing the production snapshots to “read” data
While ‘writing” to development snapshots without creating unnecessary records in the production environment.

{% macro ref_snapshot(snapshot_name) %}

    {% set snapshot_relation=ref(snapshot_name) %}

    {% set prod_snapshot_relation = adapter.get_relation(
        database='my_transformation_snapshots',
        schema='snapshots',
        identifier=snapshot_relation.identifier
    ) %}

    {{ return(prod_snapshot_relation) }}

{% endmacro %}

Make sure you replace the ref macro to ref_snapshot when referencing a snapshot model e.g.

with some_cte as (
    select * from {{ref('non_snapshot_model')}}
),

some_snapshot as (
	select * from {{ref_snapshot('my_snapshot_model')}}
)

...

d. Swap_database

This macro implements the snowflake feature for swapping databases. This macro is used with a run-operation command in the next step.


{% macro swap_database() %}

    {% set sql='alter database  swap with ' %}
    {% do run_query(sql) %}
    {{ log(""database swapped"", info=True) }}

{% endmacro %}"

5. Configure dbt cloud job and environment

Add the swap database operation to the commands list in your dbt cloud job after the dbt build comment.

dbt build
dbt run-operation swap_database

Set the target name

Ensure the target name under your environment equals “prod.”

Set the database name.

Make sure the database name under your environment is equal to

“my_transformation_prod_green”

6. Conclusion

Whenever your project runs with the above blue-green implementation, the following events occur:

The models are created
Tests are run
If all the models are created without errors, and tests pass for your green database, the blue database will be swapped with the green - which will happen instantly without downtime.

This checkpoint before deployment approach is one of the essential elements in data quality in your data warehouse.

Blue-green deployment can be further optimized when a comprehensive test set is built in your transformation project. You can find 205 Data Lab data quality checklist in this blog post: Data Quality Part I

Additionally, if you have any questions or would like to learn more about our specialized solutions, please don't hesitate to reach out to us at hello@205datalab.com

Conclusion

By leveraging bulk data from providers like data.ai, B2B companies selling to mobile app publishers and developers can optimize their sales efforts and achieve scalable results. Implementing a data-driven approach, integrating it with CRM systems, and continuously refining the strategy based on feedback will ensure that your sales team targets high-potential apps and publishers, ultimately leading to increased revenue and success. By moving away from manual and inefficient processes, you'll be able to achieve greater consistency and effectiveness in your sales efforts.

Are you interested in exploring this approach for your company and need implementation help? With over five years of experience working with mobile app data, our team can help you implement this type of solution, even if you don't have a data warehouse or access to internal data resources. Start a conversation with us today, and let us help your team sell more effectively and efficiently!

‍

2 min

Blue-Green Deployment: Say Goodbye to Disruptions In Your Data Warehouse

Author

Pedro Roque

Table of contents

1. What is Blue-Green Deployment?

2. How does blue-green deployment in a data warehouse work?

3. What if the tests indicate that the update was unsuccessful?

4. How to implement Blue-Green deployment?

5. Configure dbt cloud job and environment

Conclusion

1. What is Blue-Green Deployment?

2. How does blue-green deployment in a data warehouse work?

3. What if the tests indicate that the update was unsuccessful?

4. How to implement Blue-Green deployment?

Prepare your Database Set Up

Configure your dbt_project.yml file.

Create the macros you need

5. Configure dbt cloud job and environment

6. Conclusion

Conclusion

Building Trustworthy Artificial Intelligence: Anjali Bansal on the Critical Role of Governance

Data With a Human Touch: How Analytics Is Powering Every Layer of Modern Event Management

Bridging Strategy and Execution: CRM Insights from Both Sides of the Table with Ryan McGuire

How NYC Uses Data to Shape Fairer Laws

The ETL Build vs. Buy Decision: Finding the Middle Path for Data Integration

What AI Can’t Fix: Lessons from a Cross-Industry Data Leader

Scaling Data Science and Experimentation: Lessons from Shuling Ding’s Journey at Tubi

Bridging R&D and Data Science – A Conversation with Maxwell Dylla

Data Leaders Uncovered: Tackling Pharmaceutical Data Challenges with Prem K. Narasimhan

Data Leaders Uncovered: Data-Driven Excellence in Retail

Introducing "Data Leaders Uncovered"

Why Your CRM Is No Longer Your Single Source of Truth

The Fractional Shift: The Future of Data Teams

Unlocking Your GTM Potential: Why Investing in a Data Warehouse is the Next Step

How to structure data quality actions in your cloud data warehouse?

How to Leverage Bulk Mobile App Data for B2B Sales Success

AI-Powered Periodical Management Reports :

Don’t Miss Out On Future Articles

2 min

Blue-Green Deployment: Say Goodbye to Disruptions In Your Data Warehouse

Author

Pedro Roque

Table of contents

1. What is Blue-Green Deployment?

2. How does blue-green deployment in a data warehouse work?

3. What if the tests indicate that the update was unsuccessful?

4. How to implement Blue-Green deployment?

5. Configure dbt cloud job and environment

Conclusion

1. What is Blue-Green Deployment?

2. How does blue-green deployment in a data warehouse work?

3. What if the tests indicate that the update was unsuccessful?

4. How to implement Blue-Green deployment?

Prepare your Database Set Up

Configure your dbt_project.yml file.

Create the macros you need

5. Configure dbt cloud job and environment

6. Conclusion

Conclusion

Related Posts

Building Trustworthy Artificial Intelligence: Anjali Bansal on the Critical Role of Governance

Data With a Human Touch: How Analytics Is Powering Every Layer of Modern Event Management

Bridging Strategy and Execution: CRM Insights from Both Sides of the Table with Ryan McGuire

How NYC Uses Data to Shape Fairer Laws

The ETL Build vs. Buy Decision: Finding the Middle Path for Data Integration

What AI Can’t Fix: Lessons from a Cross-Industry Data Leader

Scaling Data Science and Experimentation: Lessons from Shuling Ding’s Journey at Tubi

Bridging R&D and Data Science – A Conversation with Maxwell Dylla

Data Leaders Uncovered: Tackling Pharmaceutical Data Challenges with Prem K. Narasimhan

Data Leaders Uncovered: Data-Driven Excellence in Retail

Introducing "Data Leaders Uncovered"

Why Your CRM Is No Longer Your Single Source of Truth

The Fractional Shift: The Future of Data Teams

Unlocking Your GTM Potential: Why Investing in a Data Warehouse is the Next Step

How to structure data quality actions in your cloud data warehouse?

How to Leverage Bulk Mobile App Data for B2B Sales Success

AI-Powered Periodical Management Reports :

Don’t Miss Out On Future Articles