Run scheduled tasks with AWS Lambda and cron jobs in AWS EC2 Container Service

Scheduled events using AWS Lambda & Cron Jobs

Almost any application needs to be able to run tasks on a schedule. This is usually achieved by setting up cron jobs on a server to run periodically.

When we bring containers into the mix, we’ll notice quickly that running a cron daemon inside a container goes against one of the principles of (at least) Docker: “Only run one command in a container”.

Sure, there are ways to get around this easily. You could concatenate commands and run them in a sub-shell, or get fancy by setting up a process manager like supervisord. But I’m trying to keep it simple.

Different types of schedules tasks

After reviewing several applications, I was putting scheduled tasks into two groups:

  1. Scheduled tasks that need to run once per application
  2. Scheduled tasks that need to run once per container

This distinction was important to me because we run applications using several containers and multiple Docker hosts behind a load balancer.

Scheduled tasks that need to run once per application

The simplest example for this type are scheduled tasks that trigger a long-running query in the database; hence it should only run once per application, and not once per container.

A co-worker told me that using a distributed cron using a Redis lock would be one option of preventing multiple containers from running the same scheduled tasks several times, but I wanted to find a solution that works for any type of application. I landed on AWS Lambda with CloudWatch Events.

AWS Lambda

From Amazon’s own description, a Lambda function lets you run code without provisioning or managing servers. Currently, you can write this code in C#, Java8, Node, and Python. A Lambda (function) can be as simple or complex as you want it to be, as long as it finishes in running in 300 seconds.

Lambda executions are charged per second, and tend to be relatively cheap. There are several pricing calculators out there, but here’s an example calculation: 100K executions using 128MB of RAM that finish in 30 seconds cost $0.03/month. From a pricing perspective, this was good enough for me to use for scheduled tasks that need to run as often as every second.

I decided on writing a simple Python script that uses two environment variables defined in the Lambda: The target URL and the User Agent. The script is versatile enough to work for different applications.

import os
from datetime import datetime
import urllib.request

# URL and user agent, stored in Lambda environment variables.
URL = os.environ['url']
USER_AGENT = os.environ['user_agent']

# Lambda handler that will be called during lambda execution.
def lambda_handler(event, context):
    print('Checking {} at {}...'.format(URL, event['time']))
    try:
        req = urllib.request.Request(URL,data=None,headers={'User-Agent': USER_AGENT})
        res = urllib.request.urlopen(req)
        if (res.status != 200):
            raise Exception('Call failed')
    except:
        print('Connection failed!')
        raise
    else:
        print('Connection succeeded!')
        return event['time']
    finally:
        print('Completed at {}'.format(str(datetime.now())))

CloudWatch Events

AWS CloudWatch Events are a relatively new addition to the suite of AWS tools. They can be used to schedule automated actions that self-trigger at certain times using cron or rate expressions. The AWS documentation has more details: Schedule Expressions for Rules – Amazon CloudWatch Events

We use these events as the trigger that kicks off our Lambda function, in essence replacing the functionality of a classic cron daemon. In regular AWS fashin, Lambdas require a specific policy statement that explicitly allows an event to trigger a lambda function. An example policy document looks as following:

  • arn:aws:lambda:us-east-1:666666666:function:lambda-name is the ARN of our Lambda function
  • arn:aws:events:us-east-1:666666666:rule/cloudwatch-rule-name is the ARN of the CloudWatch rule
{
    "Version": "2012-10-17",
    "Id": "default",
    "Statement": [
        {
            "Sid": "some-random-sid",
            "Effect": "Allow",
            "Principal": {
                "Service": "events.amazonaws.com"
            },
            "Action": "lambda:InvokeFunction",
            "Resource": "arn:aws:lambda:us-east-1:666666666:function:lambda-name",
            "Condition": {
                "ArnLike": {
                    "AWS:SourceArn": "arn:aws:events:us-east-1:666666666:rule/cloudwatch-rule-name"
                }
            }
        }
    ]
}

Extra Points: CloudFormation

You can, of course, create the Lambda, the CloudWatch Event rule and the policy document manually through the AWS Console, but it’s more fun with CloudFormation. We need three Resources to create a replacement for a single cron job:

  1. The Lambda function (ApplicationLambda): Instead of hardcoding the code of our Lambda, we upload it to S3 and have CloudFormation download it and then use it when it creates the Lambda for us
  2. The CloudWatch rule that kicks off the Lambda function (CloudWatchEvent)
  3. The Lambda’s policy document that allows our CloudWatch event to trigger the Lambda (LambdaPermission)
Resources:
  ApplicationLambda:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        S3Bucket: lambda-bucket
        S3Key: lambda_request.zip
        S3ObjectVersion: adsfhgadfshjgasdfhjgk.asdfasd
      Description: Run some cron job
      Environment:
        Variables:
          url: http://www.myapplication.com/cron_endpoint
          user_agent: Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us)
      Handler: lambda_request_user_agent.lambda_handler
      Role: arn:aws:iam::6666666666:role/lambda_basic_execution
      Runtime: python3.6
      Timeout: 30
      Tags:
        - Key: application
          Value: my-application
  CloudWatchEvent:
    Type: "AWS::Events::Rule"
    Properties:
      Description: Triggers application cron
      ScheduleExpression: "cron(0/15 * * * ? *)"
      State: ENABLED
      Targets:
        - Arn: !GetAtt ApplicationLambda.Arn
          Id: "TargetFunctionV1"
  LambdaPermission:
    Type: "AWS::Lambda::Permission"
    Properties:
      FunctionName: !Ref ApplicationLambda
      Action: "lambda:InvokeFunction"
      Principal: "events.amazonaws.com"
      SourceArn: !GetAtt CloudWatchEvent.Arn

Scheduled tasks that need to run once per container

These are scheduled tasks that should run on every container that runs the application. An example would be that you’re pulling in external data for your application that gets updated frequently and stored as a text file that your application reads and you want to update that file every 15 minutes.

Since all of our applications run on a Docker ECS host, I decide to use the host’s crontab to execute these commands in the container. I’m executing a bash script that finds all containers with a specific name and run the command I want inside of the container. The entry in the crontab would be following the standard crontab syntax and could look like this:

*/15 * * * * /root/script/do_something_on_every_container.sh > /var/log/cron 2>&1

The contents of do_something_on_every_container.sh would contain a simple loop that finds all containers by name, then executes our target command in every container.

# Loop over all app containers and download a file.
for i in $(docker ps --quiet --filter "name=app"); do
    docker exec $i curl --silent -f http://otherwebsite.com/data_file.txt -o /var/www/myapplication/data_file.txt
done