Mastering Automated MLOps on AWS: Build, Deploy, and Monitor Machine Learning Pipelines

A Step-by-Step Guide with Best Practices, Code Examples, and Local Setup for Production-Ready ML Models

5 min readSep 19, 2024

In today’s data-driven world, machine learning models are crucial for making real-time decisions, optimizing processes, and gaining valuable insights. However, developing models is just one piece of the puzzle. Managing machine learning models in production requires scalability, monitoring, and automation to ensure that models deliver consistent value over time. This is where MLOps (Machine Learning Operations) comes into play.

MLOps bridges the gap between machine learning development and operationalization by enabling seamless integration, continuous deployment, and monitoring of models. With AWS providing a comprehensive set of tools, implementing an MLOps pipeline that is robust, scalable, and automated is more achievable than ever.

In this blog, we’ll explore how to architect an end-to-end MLOps pipeline using AWS services like SageMaker, Lambda, Step Functions, and CodePipeline. We’ll walk through a retail demand forecasting use case, provide detailed code examples, and outline architectural best practices for running models both locally and in production.

Business Use Case: Retail Demand Forecasting with MLOps

Imagine you’re working for a retail chain that needs to predict product demand across various stores. Accurately forecasting product demand can help optimize inventory, minimize wastage, and avoid stockouts. The MLOps pipeline needs to handle:

Data ingestion: Gather sales data, weather conditions, holidays, and promotions.
Data preprocessing: Clean, transform, and normalize data.
Model training: Train machine learning models for forecasting.
Model deployment: Serve the model in production for real-time API predictions.
Monitoring and retraining: Track model performance and trigger retraining based on drift detection.

To streamline this process, we will set up the MLOps pipeline on AWS and run part of it locally for development and testing.

Detailed Project Structure

To build a production-ready MLOps pipeline, let’s first define the project structure:

mlops-forecasting
│
├── data/                   # Store raw and processed data
│   ├── raw/                # Raw sales data
│   └── processed/          # Processed and cleaned data
│
├── models/                 # Directory for saving models
│   └── model.joblib        # Trained model
│
├── notebooks/              # Jupyter notebooks for experimentation
│   └── exploratory.ipynb
│
├── src/                    # Source code for the ML pipeline
│   ├── preprocessing.py    # Data cleaning and transformation
│   ├── train.py            # Training script
│   ├── inference.py        # Inference logic
│   └── monitor.py          # Monitoring and drift detection
│
├── scripts/                # Utility scripts for CI/CD and automation
│   ├── buildspec.yml       # AWS CodeBuild configuration
│   └── deploy.sh           # Deployment script
│
├── config/                 # Configuration files for environment variables
│   └── config.yml          # Hyperparameters and environment settings
│
├── tests/                  # Unit tests for the pipeline
│   ├── test_preprocessing.py
│   ├── test_train.py
│   └── test_inference.py
│
├── terraform/              # Infrastructure as code for AWS setup
│   ├── s3.tf               # S3 bucket setup
│   ├── sagemaker.tf        # SageMaker model deployment
│   └── monitoring.tf       # Model monitoring setup
│
├── Dockerfile              # Dockerfile for containerized development
├── requirements.txt        # Python dependencies
└── README.md               # Project documentation

Step-by-Step Workflow

1. Data Ingestion and Preprocessing (src/preprocessing.py)

We’ll start by cleaning and transforming the raw sales data.

import pandas as pd


def preprocess_data(input_path, output_path):
    # Load raw sales data
    df = pd.read_csv(input_path)
    
    # Drop missing values
    df.dropna(inplace=True)
    
    # Feature engineering: Moving average of sales
    df['sales_moving_avg'] = df['sales'].rolling(window=7).mean()
    
    # Normalize sales data
    df['sales'] = (df['sales'] - df['sales'].mean()) / df['sales'].std()
    
    # Save processed data
    df.to_csv(output_path, index=False)

if __name__ == "__main__":
    preprocess_data('data/raw/sales_data.csv', 'data/processed/cleaned_sales_data.csv')

You can run this preprocessing script locally:

python src/preprocessing.py

Best Practice: Ensure that data validation checks are performed to prevent schema mismatches or missing values from affecting your pipeline.

2. Model Training and Versioning (src/train.py)

Now, let’s train a Random Forest model to forecast sales.

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
import joblib
import pandas as pd

def train_model(data_path, model_path):
    df = pd.read_csv(data_path)
    
    # Split data into features and target
    X = df.drop(columns=['sales'])
    y = df['sales']
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    
    # Train Random Forest model
    model = RandomForestRegressor(n_estimators=100)
    model.fit(X_train, y_train)
    
    # Evaluate the model
    y_pred = model.predict(X_test)
    mae = mean_absolute_error(y_test, y_pred)
    print(f"Model MAE: {mae}")
    
    # Save the trained model
    joblib.dump(model, model_path)

if __name__ == "__main__":
    train_model('data/processed/cleaned_sales_data.csv', 'models/model.joblib')

Run the training script:

python src/train.py

Best Practice: Always perform cross-validation and hyperparameter tuning for optimizing model performance.

3. Unit Testing

Testing is critical in MLOps to ensure the integrity of each component. Here’s an example for testing the preprocessing step:

import pandas as pd
from src.preprocessing import preprocess_data


def test_preprocessing():
    data = {'sales': [200, 300, 400, None, 500]}
    df = pd.DataFrame(data)
    
    preprocess_data(df, 'data/processed/test_cleaned_data.csv')
    processed_df = pd.read_csv('data/processed/test_cleaned_data.csv')
    
    assert processed_df.isnull().sum().sum() == 0

Run the test using pytest:

pytest tests/test_preprocessing.py

Best Practice: Ensure that all modules (preprocessing, training, inference) have corresponding unit tests and that you maintain high test coverage.

4. Docker for Local Development

To ensure consistent environments for local development, containerize the project using Docker.

Dockerfile:

FROM python:3.8-slim

WORKDIR /app

COPY . /app

RUN pip install -r requirements.txt]


CMD ["python", "src/train.py"]

Build the Docker image:

docker build -t mlops-forecasting .

Run the container:

docker run -v $(pwd)/data:/app/data mlops-forecasting

Best Practice: Use multi-stage builds in Docker to optimize image size and security.

5. CI/CD with AWS CodePipeline

To automate the training and deployment pipeline, set up AWS CodePipeline with CodeBuild.

buildspec.yml:

version: 0.2

phases:
  install:
    commands:
      - pip install -r requirements.txt
  build:
    commands:
      - python src/train.py
  post_build:
    commands:
      - aws s3 cp models/model.joblib s3://my-bucket/model/
artifacts:
  files:
    - models/model.joblib

Best Practice: Ensure that CI/CD pipelines include automated validation and testing before deploying models.

6. Deployment and Monitoring on SageMaker

Deploy the trained model to SageMaker for real-time predictions and implement monitoring for drift detection.

import sagemaker
from sagemaker.sklearn import SKLearnModel

model_uri = 's3://my-bucket/model/model.joblib'

model = SKLearnModel(model_data=model_uri,
                     role='arn:aws:iam::123456789012:role/SageMakerRole',
                     entry_point='src/inference.py')

predictor = model.deploy(instance_type='ml.m5.large', initial_instance_count=1)

To set up drift detection using SageMaker Model Monitor:

from sagemaker.model_monitor import DataCaptureConfig

data_capture_config = DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=100,
    destination_s3_uri='s3://my-bucket/data-capture/'
)

predictor.update_data_capture_config(data_capture_config=data_capture_config)

Best Practice: Enable model explainability and bias detection to ensure fairness in your model predictions.

Conclusion

In this blog, we walked through building a comprehensive MLOps pipeline on AWS, from local development to deployment and monitoring. By following world-class best practices such as modularity, versioning, automated monitoring, and secure deployments, your MLOps pipeline will be scalable, automated, and ready for production. Start small by implementing a local setup, then deploy your MLOps pipeline to AWS for fully automated machine learning model lifecycle management.

Need Help with MLOps? Contact ThamesTech AI for Consulting

If you’re looking to implement MLOps pipelines or need consulting on AWS SageMaker, machine learning, or cloud infrastructure, ThamesTech AI can help. Our team specializes in creating scalable, automated, and production-ready machine learning solutions tailored to your business needs.

Contact us for a consultation:

Website: thamestech.ai
Email: info@thamestech.ai

Let us help you accelerate your journey to AI-driven success!