Automation is the backbone of modern development processes! 📸 Photo by Dominik Scythe on Unsplash

How to Automate your DWH Deployments

Efficient development processes are based on automated processes to test, integrate, and deploy applications into production. With typical applications, this is quite an easy task. For database applications, it’s a bit harder. This article explains why and how to execute this process successfully.

Jasmin Fluri
codeburst
Published in
6 min readSep 24, 2020

--

Some time ago, I wrote about the Challenges in Automating Multi-Schema Database Deployments. It was about why automated database deployments are crucial to efficient development processes and why they are challenging to establish with multi-schema applications.

After implementing many automated DWH deployment processes, I now summarized the most important steps to get there and explain why not only technological implementations have to be made but also process changes and why they are crucial.

A deployment process from hell

Manual deployments include all the processes that require manual steps to install your software into production. Those manual steps can be:

  • Emailing the install scripts to other people (lack of centralized repository).
  • Manual execution of installation scripts (lack of automated pipelines).
  • Copying files around (lack of centralized repository).
  • Sending emails (lack of automated processes).
  • People outside the team execute the installation (lack of ownership).
Example of a manual deployment process where communication happens over email.

In the picture above, an example of a manual deployment process is shown. The team member sends an installation package over email to the DBA. The DBA then manually installs the scripts on the target database. He then sends an email back to the team member with a message if the deployment was successful or not. If it wasn’t, he attaches a log file with the error. The team member then has to start looking for the problem and sends a fix again to the DBA…

Such a deployment process has many problems:

  • The involvement of other teams is very time-consuming. The waiting times for the development team members are very long. This waiting time makes the process very expensive if repeated often.
  • Manual executions are prone to errors. They are not repeatable and therefore, don’t reduce risk throughout the deployment pipeline.
  • Automated testing is not possible because no mechanism ensures automatic test execution.

Multi-Schema Database Applications

Multi-Schema Database Applications are applications that include more than one database schema. They share data or access data across schemas but are physically on the same database instance. Typically this is seen in data warehouses as shown in the picture below where each non-grey box shows a database schema. Those schemas are used by different applications that share and exchange data (DWH Core and Marts).

Example of a Data Warehouse Application with a Staging Layer and Data Marts on Top.

Suppose a data warehouse contains separate schemas for its layers, there’s most likely one schema that contains all the metadata and configuration data for the functionality of the ETL pipelines and the data loads. Also, standard functionality like logging is usually installed in such a separate schema.

This approach leads to an application that stretches across multiple schemas and is therefore called a multi-schema database application.

The Data Marts will most likely also have some conformed dimensions that have been physically referenced in multiple tables. They allow references to the same structure, attributes, values, definitions and concepts like time and dates. Because of this, they are used across many fact tables; so even if the single data marts could be split into separate schemas, there would still be the dependency to the conformed dimensions that all the marts have.

Advantages of automated deployment

The goal of continuous integration and automated deployments is to achieve stable releases with high quality. We also save time during the process by eliminating repetitive manual tasks through automation. With improved stability and higher quality, it should be possible to achieve less unplanned work. And the automation is a huge timesaver that also saves money because the developers’ time can be used to create actual value for the product on which we are working.

In their book “Accelerate — The Science of DevOps” Nicole Fosgren, Jez Humble and Gene Kim published their research findings of their “State of DevOps Reports”. They found out that high performance in development teams is possible with all kinds of systems and is independent of the technologies they use.

“High performance is possible with all kind of systems and is independent of the technologies the teams use! “— Accelerate — The Science of DevOps

Another interesting finding is that — where code deployments are most painful — one can find the most deficient organizational performance and culture. So if your deployment process hurts, most likely the culture around it isn’t the best either.

“Where code deployments are most painful, you’ll find the poorest organizational performance and culture”— Accelerate — The Science of DevOps

The third finding I want to highlight is that software delivery performance positively impacts organizational performance. This finding implies that with automation and better development and delivery processes, not only do the software teams improve their performance but the whole organization profits from those improvements.

“Software delivery performance positively impacts organizational performance!”— Accelerate — The Science of DevOps

Three stages of optimizing your database deployment

Now, if you want to improve your continuous integration and delivery process then certain preconditions have to be established. The most basic is in regards to how changes are handled. If multiple changes are put into one change package (see the image below — left stage), this is the first thing that can be optimized. Every change in the application code needs to be treated as an individual change. For example, one change or deployment per implemented feature. This way, failing features don’t stop the successful features from getting installed in production. This splitting should then result in the second stage in the image below. The third step is automating deployment for all those changes.

The three stages of optimizing your deployment! From a big deployment package, someone else deploys to single deployment packages that get deployed automatically.

How to get there

But how to accomplish automated deployments? There are two crucial things to implement:

  1. Establish the Practices of Continuous Integration
    Martin Fowler describes very well what is necessary to implement a continuous integration process that’s robust and reliable.
  2. Build Deployment Pipelines
    Pipelines will be needed to automate the installations until production. There are three kinds of pipelines that are most likely required described in the image below.
Example of Continuous Integration and Deployment Pipelines.

Continuous Integration Pipelines will assure that changes on feature branches are deployable and don’t break stuff.

If your application is a database, a build is the same thing as a deployment.

Nightly Build Pipelines assure the health status of the files in your main branch and ensure that the whole installation is installable.

Deployment Pipelines provide you with the automated functionality to deploy automatically through your environment pipeline from test to integration until production in a safe and repeatable way. Always the same way.

More about the DevOps Report Studies can be found on the DORA Research Program website.

Sources / Links

--

--

👩‍💻Database & Automation Engineer @ schaltstelle.ch 💻 Oracle ACE Pro♠ — Writes about databases, automation and software engineering— 🐦@jasminfluri