Deployment and Rollback Strategies Cloud Applications

Ram Vadranam
2 min readSep 3, 2020

--

Release Guidelines:

  • Defining mandatory rules for releasing into production by setting up roll back alarms. Rollback triggered during the deployment process if any one of the alarms got triggered.
  • Rules are defined using a set of lambda functions to rollback services at each stage of the rollback process
  • Adopting rules for all the pipelines over a certain period of time
  • Pipelines that are not adopted the rule should be put in quarantine.
  • Using break glass approach for pipelines to support critical releases
  • Classifying pipelines and opt-out pipelines which are not directly impacting the customers
  • Pre prod testing before canary deployment
  • Promoting deployments with canary and traffic shifts to target a limited number of customers and test the impact of the release.
  • Run soak tests to validate the canary deployment
  • Using synthetic traffic to mimic customer experiences, validating metrics, and rollback alarm.

Classifying Pipelines:

  • Customer impacting pipelines
  • Non-customer impacting pipelines

Setting Rollback Approach standard metrics:

Anomaly detection with standard service metrics

  • Anomaly detection using fault codes HTTP 5XX metrics
  • Anomaly detection using error code HTTP 4XX metrics
  • Anomaly detection using traffic latency

Anomaly detection with standard instance metrics

  • Anomaly detection using CPU utilization
  • Anomaly detection using Disk utilization
  • Anomaly detection using Memory utilization

Anomaly detection with standard run time metrics

  • Anomaly detection using Heap utilization
  • Anomaly detection using Garbage collection
  • Anomaly detection using Thread metrics

Training Anomaly detectors prior to deployment:

Prior to deployment check metric generated over 3 hr of time and train anomaly detector using average and standard deviation to make anomaly detector working as expected. Rollback deployment based on alarm threshold breach.

Rollback using tariff drop:

Using anomaly detection for the sustained drop after release and trigger rollback based on the impact of the event.

Rollback using combination multiple anomaly detectors:

A rollback can be triggered based on a combination of anomaly detectors instead of a single anomaly detector.

Rollback using fault spike:

the spike in HTTP 5XX and 4XX errors

--

--

Ram Vadranam
Ram Vadranam

Written by Ram Vadranam

Blending Technology and Innovation: Navigating the Cloud, Unraveling AI Mysteries, and Empowering Entrepreneurial Journeys

No responses yet