Stopping / Starting EC2 & RDS Automatically

Stopping / Starting EC2 & RDS Automatically

In my previous blog post, I discussed the importance of AWS cost housekeeping by releasing unallocated Elastic IPs. Following that, I received a customer request to automate the process of stopping their AWS infrastructure during non-business hours and restarting it before their staff came online.

This request highlights the significance of cost-cutting measures during the COVID-19 pandemic, as businesses try to optimize their expenses in various areas.

ref — blog.lewislovelock.com/checks-for-unallocat..


Let's begin by analyzing the customer's AWS architecture.

The architecture includes a front-end proxy and jump-box (which also functions as a NAT gateway), an application server (both on EC2), and a back-end RDS database. The proxy is located in a public subnet, while the application server is in a private subnet.

To ensure a smooth connection to the RDS, it's crucial to start the JAVA applications running on EC2 last and stop them first. That's why there is a 15-minute gap configured between the start and stop times.

The Customer Requirement

Now we understand the architecture we are working with, we can review the customer request:

Turn off the EC2 & RDS ‘out of hours’ to reduce AWS costs

  • Monday — Friday — Switch off 21:00–06:00 (allowing 15 mins on each side for the RDS to be ready first)

  • Saturday — Sunday — Switch off


Solution

Automating the start/stop process for both an EC2 instance and an RDS database can be achieved with ease by creating a Lambda function using Python. This function can then be scheduled to run at specific times using a cron expression in EventBridge.

EC2

  • First, we need to allow Lambda access to EC2 with an IAM Policy
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:Start*",
        "ec2:Stop*"
      ],
      "Resource": "*"
    }
  ]
}
  • Create a Lambda role using the above policy

  • Create the lambda function to stop each EC2

import boto3
region = 'us-west-1'
instances = ['i-12345cb6de4f78g9h', 'i-08ce9b2d7eccf6d26']
ec2 = boto3.client('ec2', region_name=region)

def lambda_handler(event, context):
    ec2.stop_instances(InstanceIds=instances)
    print('stopped your instances: ' + str(instances))

Boto3 is the name of the Python SDK for AWS. It allows you to directly create, update, and delete AWS resources from your Python scripts

  • Create a second Lambda function to stop the EC2 instances again
import boto3
region = 'us-west-1'
instances = ['i-12345cb6de4f78g9h', 'i-08ce9b2d7eccf6d26']
ec2 = boto3.client('ec2', region_name=region)

def lambda_handler(event, context):
    ec2.start_instances(InstanceIds=instances)
    print('started your instances: ' + str(instances))

RDS

  • Create an IAM policy to allow Lambda access to RDS
  "Version": "2012-10-17",
  "Statement": [
      {
          "Sid": "VisualEditor0",
          "Effect": "Allow",
          "Action": [
              "rds:DescribeDBClusterParameters",
              "rds:StartDBCluster",
              "rds:StopDBCluster",
              "rds:DescribeDBEngineVersions",
              "rds:DescribeGlobalClusters",
              "rds:DescribePendingMaintenanceActions",
              "rds:DescribeDBLogFiles",
              "rds:StopDBInstance",
              "rds:StartDBInstance",
              "rds:DescribeReservedDBInstancesOfferings",
              "rds:DescribeReservedDBInstances",
              "rds:ListTagsForResource",
              "rds:DescribeValidDBInstanceModifications",
              "rds:DescribeDBInstances",
              "rds:DescribeSourceRegions",
              "rds:DescribeDBClusterEndpoints",
              "rds:DescribeDBClusters",
              "rds:DescribeDBClusterParameterGroups",
              "rds:DescribeOptionGroups"
          ],
          "Resource": "*"
      }
  ]
}
  • Create a Lambda role using the above policy

  • Create the lambda function to stop the RDS instance

import boto3
import os
import sys
import time
from datetime import datetime, timezone
from time import gmtime, strftime

def shut_rds_all():
    region=os.environ['REGION']
    key=os.environ['KEY']
    value=os.environ['VALUE']


    client = boto3.client('rds', region_name=region)
    response = client.describe_db_instances()
    v_readReplica=[]
    for i in response['DBInstances']:
        readReplica=i['ReadReplicaDBInstanceIdentifiers']
        v_readReplica.extend(readReplica)

    for i in response['DBInstances']:
#The if condition below filters aurora clusters from single instance databases as boto3 commands defer to stop the aurora clusters.
        if i['Engine'] not in ['aurora-mysql','aurora-postgresql']:
#The if condition below filters Read replicas.
            if i['DBInstanceIdentifier'] not in v_readReplica and len(i['ReadReplicaDBInstanceIdentifiers']) == 0:
                arn=i['DBInstanceArn']
                resp2=client.list_tags_for_resource(ResourceName=arn)
#check if the RDS instance is part of the Auto-Shutdown group.
                if 0==len(resp2['TagList']):
                    print('DB Instance {0} is not part of autoshutdown'.format(i['DBInstanceIdentifier']))
                else:
                    for tag in resp2['TagList']:
#If the tags match, then stop the instances by validating the current status.
                        if tag['Key']==key and tag['Value']==value:
                            if i['DBInstanceStatus'] == 'available':
                                client.stop_db_instance(DBInstanceIdentifier = i['DBInstanceIdentifier'])
                                print('stopping DB instance {0}'.format(i['DBInstanceIdentifier']))
                            elif i['DBInstanceStatus'] == 'stopped':
                                print('DB Instance {0} is already stopped'.format(i['DBInstanceIdentifier']))
                            elif i['DBInstanceStatus']=='starting':
                                print('DB Instance {0} is in starting state. Please stop the cluster after starting is complete'.format(i['DBInstanceIdentifier']))
                            elif i['DBInstanceStatus']=='stopping':
                                print('DB Instance {0} is already in stopping state.'.format(i['DBInstanceIdentifier']))
                        elif tag['Key']!=key and tag['Value']!=value:
                            print('DB instance {0} is not part of autoshutdown'.format(i['DBInstanceIdentifier']))
                        elif len(tag['Key']) == 0 or len(tag['Value']) == 0:
                            print('DB Instance {0} is not part of auroShutdown'.format(i['DBInstanceIdentifier']))
            elif i['DBInstanceIdentifier'] in v_readReplica:
                print('DB Instance {0} is a Read Replica. Cannot shutdown a Read Replica instance'.format(i['DBInstanceIdentifier']))
            else:
                print('DB Instance {0} has a read replica. Cannot shutdown a database with Read Replica'.format(i['DBInstanceIdentifier']))

    response=client.describe_db_clusters()
    for i in response['DBClusters']:
        cluarn=i['DBClusterArn']
        resp2=client.list_tags_for_resource(ResourceName=cluarn)
        if 0==len(resp2['TagList']):
            print('DB Cluster {0} is not part of autoshutdown'.format(i['DBClusterIdentifier']))
        else:
            for tag in resp2['TagList']:
                if tag['Key']==key and tag['Value']==value:
                    if i['Status'] == 'available':
                        client.stop_db_cluster(DBClusterIdentifier=i['DBClusterIdentifier'])
                        print('stopping DB cluster {0}'.format(i['DBClusterIdentifier']))
                    elif i['Status'] == 'stopped':
                        print('DB Cluster {0} is already stopped'.format(i['DBClusterIdentifier']))
                    elif i['Status']=='starting':
                        print('DB Cluster {0} is in starting state. Please stop the cluster after starting is complete'.format(i['DBClusterIdentifier']))
                    elif i['Status']=='stopping':
                        print('DB Cluster {0} is already in stopping state.'.format(i['DBClusterIdentifier']))
                elif tag['Key'] != key and tag['Value'] != value:
                    print('DB Cluster {0} is not part of autoshutdown'.format(i['DBClusterIdentifier']))
                else:
                    print('DB Instance {0} is not part of auroShutdown'.format(i['DBClusterIdentifier']))

def lambda_handler(event, context):
    shut_rds_all()
  • Create a lambda function to start the RDS, this time using def start_rds_all()

  • The RDS instance called is defined in the Environment Variable set on each instance


EventBridge (Formerly CloudWatch Rules)

Finally, I created 4 EventBridge rules to start and stop both the EC2 and RDS configured with cron expressions:

M-F 21:00 - Stop EC2 - 00 21 ? * MON-FRI *

M-F 21:15 - Stop RDS - 15 21 ? * MON-FRI *

M-F 05:45 - Start RDS - 45 05 ? * MON-FRI *

M-F 06:00 - Start EC2 – 00 06 ? * MON-FRI *