2 min

Blue-Green Deployment: Say Goodbye to Disruptions In Your Data Warehouse

In modern data pipelines, continuous data integration and data frequent updates may cause unintended data errors. Blue-green Deployment ensures a final checkpoint for accurate results. In this blog, details are provided on how to implement blue green deployment in a data warehouse project.

Applied AI

Data Analytics

Data Engineering

Have you ever experienced an unexpected breakdown in your data pipeline that caused your dependencies to crash? Did it result in blank charts on your dashboards and inconsistent KPIs, or machine learning models trained with incomplete or outdated data?​Discovering data quality issues from users is never a pleasant experience for the data team. Even worse, it can cause internal clients to lose confidence in the data. If you have faced this situation before, Blue-Green Deployment is a solution that can help.

1. What is Blue-Green Deployment?

In modern data pipelines, the source data is ingested continuously, and the data warehouse is built repeatedly based on the latest data. A new data warehouse version is created for the users with the latest transformation specifications with ever-shortening intervals: weekly, daily, multiple times a day, and even hourly, based on the situation.

From time to time, these updates in data and transformation may result in unintended consequences, and failures may happen. These failures may cause some or all of the final exposed data to be missing or wrong. Of course, there usually are tests (standard and custom tests for previously known scenarios) that catch and alert to possible failures. These tests warn the owners that something went wrong - while the data reaches production in that state, with errors. While the action is taken and fixes are applied, the data will continue to be exposed with errors or unavailable. This is not a desired outcome at all.

Ideally, there needs to be a final checkpoint to ensure everything has run as expected. If yes, the new data warehouse version can be exposed to the users. Blue-green Deployment does just that with the steps below:

  • Step 1. Create a separate secondary data warehouse (green)
  • Step 2. Build the data transformation of this data warehouse independent from the one in production (blue)
  • Step 3. And when all the processes have run correctly and all tests have passed, replace the production version with the new one. (blue and green database swap)

2. How does blue-green deployment in a data warehouse work?

While one environment (Blue) is in production, available for its dependencies, the other environment (Green) is waiting for the following data refresh.

When the data is updated, the data transformation steps are complete, and all tests indicate that the update was successful; the two environments "swap" places.

This happens momentarily in the data platform, and now your ML models, reports, and dashboards all point to this new data version - without interruption and with minimized (hopefully eliminated) risk of disruption.

3. What if the tests indicate that the update was unsuccessful?

In this case, the “swap” does not take place. The applications continue consuming data from the Blue environment. Data unavailability or inaccurate data scenarios are avoided. During this time, the root cause of the test failure is searched, addressed, and the refresh will be rerun.

4. How to implement Blue-Green deployment?

Please note: The implementation steps below are based on the dbt and snowflake environment. This article will still be helpful regarding the essential concepts if dbt and Snowflake are not part of your current data stack.

Below we will walk you through all the steps to implement blue/green in a dbt project (let's say the project is called my_transformation ) that uses Snowflake as Data Warehouse.

  • 1º Prepare your Database Set Up
  • 2º Configure your dbt_project.yml file
  • 3º Create the macros you need
    1. ref()
    2. get_snapshot_write_schema()
    3. ref_snapshot
    4. swap_database
  • 4º Configure dbt cloud job and environment
    1. Add operation
    2. Set target name
    3. Set the database name

Prepare your Database Set Up

First, we must ensure that we have the necessary structure for the blue-green deployment. For that, we need the following:

  • A blue,
  • a green,
  • and a snapshots database

In this example, we will use the following:

Emil Tobias

All applications should read from the “blue”, and your development branches should be merged to the green database.

We'll use the same schema name, my_schema, for the blue and green databases.

Configure your dbt_project.yml file.

Add configuration for snapshots for the target_database

snapshots: +target_database: my_transformation_snapshots

Create the macros you need

a. ref()

This macro will overwrite the default behavior of the dbt ref macro. Now instead of getting the full path for a table, it will not include the database name. This is important since we will eventually swap the blue and green databases - and we do not want the green database reference to persist.

E.g.,

Default Behavior

Database_name.schema_name.table_name

New Behavior

schema_name.table_name

{% macro ref(model_name) %} {% set rel = builtins.ref(model_name) %} {% if rel.database == 'my_transformation_snapshots' %} {% do return(builtins.ref(model_name)) %} {% else %} {% do return(builtins.ref(model_name).include(database=false)) %} {% endif %} {% endmacro %}

b. get_snapshot_write_schema()

Based on the dbt target name, this macro will return the schema_name where the snapshot model will be written. The behavior we want is to write to the production schema only when we are in production.

{% macro get_snapshot_write_schema() %} {% if target.name in ('prod') %} {% set var_target_schema = 'snapshots' %} {% else %} {% set var_target_schema = 'snapshots_' ~ target.name %} {% endif %} {% do return(var_target_schema) %} {% endmacro %}

Make sure you add the following config to all snapshot models in your dbt_project

{{ config( target_schema=get_snapshot_write_schema() ) }}

c. ref_snapshot

This macro will get the relation for the snapshot model. Even in a dev environment, we want our models to read from production snapshots. This helps us emulate the production environment better while in development environments by

  • Accessing the production snapshots to “read” data
  • While ‘writing” to development snapshots without creating unnecessary records in the production environment.
{% macro ref_snapshot(snapshot_name) %} {% set snapshot_relation=ref(snapshot_name) %} {% set prod_snapshot_relation = adapter.get_relation( database='my_transformation_snapshots', schema='snapshots', identifier=snapshot_relation.identifier ) %} {{ return(prod_snapshot_relation) }} {% endmacro %}

Make sure you replace the ref macro to ref_snapshot when referencing a snapshot model e.g.

with some_cte as ( select * from {{ref('non_snapshot_model')}} ), some_snapshot as ( select * from {{ref_snapshot('my_snapshot_model')}} ) ...

d. Swap_database

This macro implements the snowflake feature for swapping databases. This macro is used with a run-operation command in the next step.

{% macro swap_database() %} {% set sql='alter database swap with ' %} {% do run_query(sql) %} {{ log(""database swapped"", info=True) }} {% endmacro %}"

5. Configure dbt cloud job and environment

Add the swap database operation to the commands list in your dbt cloud job after the dbt build comment.

dbt build dbt run-operation swap_database

Set the target name

Ensure the target name under your environment equals “prod.”

Set the database name.

Make sure the database name under your environment is equal to

“my_transformation_prod_green”

6. Conclusion

Whenever your project runs with the above blue-green implementation, the following events occur:

  • The models are created
  • Tests are run
  • If all the models are created without errors, and tests pass for your green database, the blue database will be swapped with the green - which will happen instantly without downtime.

This checkpoint before deployment approach is one of the essential elements in data quality in your data warehouse.

Blue-green deployment can be further optimized when a comprehensive test set is built in your transformation project. You can find 205 Data Lab data quality checklist in this blog post: Data Quality Part I

Additionally, if you have any questions or would like to learn more about our specialized solutions, please don't hesitate to reach out to us at hello@205datalab.com

Conclusion

By leveraging bulk data from providers like data.ai, B2B companies selling to mobile app publishers and developers can optimize their sales efforts and achieve scalable results. Implementing a data-driven approach, integrating it with CRM systems, and continuously refining the strategy based on feedback will ensure that your sales team targets high-potential apps and publishers, ultimately leading to increased revenue and success. By moving away from manual and inefficient processes, you'll be able to achieve greater consistency and effectiveness in your sales efforts.

Are you interested in exploring this approach for your company and need implementation help? With over five years of experience working with mobile app data, our team can help you implement this type of solution, even if you don't have a data warehouse or access to internal data resources. Start a conversation with us today, and let us help your team sell more effectively and efficiently!

Don’t Miss Out On Future Articles

Stay in the loop with everything you need to know.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.