In this blog we will show you how to configure a systemd cfn-signal unit, to signal a successful or failed completion of the cloud-init run.
We have been managing a dozen autoscaling groups for instances of ECS clusters using CloudFormation since 2017. To configure an virtual machine instance for a specific cluster, we use cloud-init. With cloud-init, we configure the ECS agent and configure logging to go to the correct log groups. This means that we have a cloud-init configuration embedded as user-data in our CloudFormation template. This works pretty well most of the times, but sometimes when the cloud-init configuration has an error, a success signal was still send leaving us with an incorrectly configured virtual machine.
With systemd we can configure a cfn-signal unit to report the status of the cloud-init, after cloud-init has run. You can do this to! We will show you:
- how to determine the successful completion of cloud-init
- how to make cfn-signal dependent on cloud-init
- how to define the cfn-signal unit
- integrate this into the CloudFormation template
Determining successful completion of cloud-init
To determine whether cloud-init completed without errors, we use the following command:
cloud-init status --wait
It will print one of the following status:
status: not started
status: running
status: done
status: error - done
status: error - running
status: degraded done
status: degraded running
status: disabled
So the command to get cfn-signal to report the correct completion status looks like this:
cloud-init status --wait | grep -q '^status: done$'
/opt/aws/bin/cfn-signal \
--stack "$STACK_NAME" \
--resource "$LOGICAL_RESOURCE_ID" \
--region "$AWS_REGION" \
--exit-code $?
How to make cfn-signal dependent on cloud-init
All systemd units which together provide the cloud-init functionality, are grouped by the cloud-init.target.
cloud-init.target
├─cloud-config.service
├─cloud-final.service
├─cloud-init-local.service
└─cloud-init.service
To run cfn-signal after cloud-init has completed we define the service to be dependent on the cloud-init.target, using the Wants and After instructions in the systemd service unit:
Description=Signal completion of cloud-init to CFN
Wants=cloud-init.target
After=cloud-init.target
We use Wants instead of Required, because we want to run, even if cloud-init failed.
The cfn-signal service unit
The entire cfn-signal service unit now looks like this:
#
# When cloud-init completed successfully, report this to CFN
# using cfn-signal
#
[Unit]
Description=Signal completion of cloud-init to CFN
Want=cloud-init.target
After=cloud-init.target
[Service]
Type=oneshot
ExecStart=/bin/sh -xc '\
cloud-init status --wait | grep -q '^status: done$'; \
/opt/aws/bin/cfn-signal \
--stack "$STACK_NAME" \
--resource "$LOGICAL_RESOURCE_ID" \
--region "$AWS_REGION" \
--exit-code $? \
'
[Install]
WantedBy=default.target
You can bake this service into your own VM base image, or include it in your CloudFormation template.
Integration into the CloudFormation template
The cfn-signal service is integrated into the CloudFormation template by adding the stack name, region and logical resource id as environment variables in the systemd service override configuration for the cfn-signal service in /etc/systemd/system/cfn-signal.service.d/override.conf:
ClusterASGLaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateName: ClusterASG
LaunchTemplateData:
ImageId: !Ref AL2023
InstanceType: t3.micro
IamInstanceProfile:
Arn: !GetAtt 'InstanceProfile.Arn'
UserData: !Base64
Fn::Sub: |
#cloud-config
write_files:
- path: /etc/systemd/system/cfn-signal.service.d/override.conf
permissions: '0644'
content: |
[Service]
Environment="AWS_REGION=${AWS::Region}"
Environment="STACK_NAME=${AWS::StackName}"
Environment="LOGICAL_RESOURCE_ID=ClusterASG"
runcmd:
- sudo systemctl daemon-reload
- sudo systemctl enable --now --no-block cfn-signal
If cloud-init fails to create the file, cfn-signal will fail to report a succesful completion and CloudFormation will time-out waiting for the signal. If the cloud-init succeeds to configure cfn-signal and there were errors in any other part of the cloud-init, a fail signal will be send immediately after cloud-init completes.
Demonstration
To see a full fledge demonstration, deploy the sample stack:
curl -sS -o sample.yaml \
https://gist.github.com/mvanholsteijn/a85dc4355985477a0aec2aeb3c27eab1
aws cloudformation deploy \
--stack-name cfn-signal \
--template-file ./sample.yaml \
--capabilities CAPABILITY_IAM \
--parameter-overrides ExitCode=0
Now, mimick an update with an error in the runcmd scripts by giving an ExitCode equal to 1:
aws cloudformation deploy \
--stack-name cfn-signal \
--template-file ./sample.yaml \
--capabilities CAPABILITY_IAM \
--parameter-overrides ExitCode=1
This will report a failure signal to CloudFormation and abort the update.
Conclusion
By using systemd we can configure a cfn-signal unit to report the status of the cloud-init, after cloud-init has run. This allows us to notify CloudFormation to continue, only if cloud-init initialized the instance successfully. This reduces the chances of incorrectly configured instances in our environment.
Image by WikimediaImages from Pixabay
Written by

Mark van Holsteijn
Mark van Holsteijn is a senior software systems architect at Xebia Cloud-native solutions. He is passionate about removing waste in the software delivery process and keeping things clear and simple.