I shared
some lessons from the initial learning curve towards more sophisticated CloudFormation
capabilities in part
1 of this post. While it is easy to get started mimicking an existing
design, it takes more in-depth understanding of bootstrapping in order to design
to your specific target behavior and to troubleshoot more effectively.
Build Incrementally
It may be
tempting to develop the full template and scripts all at once, and test full
feature set to target design. If you are lucky, then everything work the first
time. However, due to the many components involved, more often than not, some
troubleshooting will be involved. At that point, running the whole system every
time you change a snippet is actually more time-consuming, and often counterproductive
to isolating root cause.
In other
words, break the solution down into logical components, build incrementally. Start
testing and troubleshooting early, at the component level. When the components
have been tested, it will be a lot easier to assemble a complete system together
successfully.
Logically, an
incremental build may flow like this:
- Develop and valid a basic template that creates target resources
- Verify that the template launches target instance(s) and/or auto-scaling groups, ELBs, etc.
- Instance installs specified software and packages successfully
- Instance can access external data store (such as S3) and create local file structure per design
- Instance can run the specified command/script/code
- The specified command/script/code performs the desired function
- CloudFormation receives signal upon completion
Think Modular
An incremental
approach also encourages the development of reusable code. For example, you may
find it beneficial to capture a specific feature in a utility template, which
has been tested and proven. In the future, you may develop a new app calling
this nested template using parameters.
Disable Rollback
By default, CloudFormation
performs rollback if an error is received during stack creation. For
troubleshooting, it is often not sufficient just to look at CloudFormation
event log, but also necessary to preserve the failed instances in order to
collect more detailed clues. Therefore, it is essential to set DisableRollback
to true (or if creating stack using console, expand “advanced option” to
deselect default option).
After you
have examined failed instances, you can manually delete the stack which will
clean up the unwanted instances. You can then modify code and repeat the stack
creation process.
Troubleshoot on the instance
If things
don’t work as expected, the most specific and definitive information is always
on the instance itself. Using credential, log on to the instance itself.
Check
instance logs, for example, cfn-init logs, on linux: /var/log/cfn-init.log, on
windows: C:\cfn\log\cfn-init.log
Take out the guessing
While your
final product should be concise and elegant, you should feel free to generate
additional information and output to help pinpoint the issue during development
and troubleshooting. Why not make it obvious and easy for yourself?
You can
apply any development technique here. For example, insert lines into your
script or code to print to log file. I also find it more efficient to test the
script directly on the instance, which often reveals issues without going through
the lengthy steps of deleting and recreating stacks every time you make a
change. Because the instance is already in the target VPC, you can use the
command line directly to simulate bootstrapping process.
Tune timeout
Waitcondition
is used for CloudFormation to receive signal back. If you have experience long delay
for Waitcondition to report failure, check its times out value set. A typical bootstrapping
operation takes no more than 5 minutes, there is no point waiting much longer. By
decreasing timeout to less than 10 minutes, you will save a lot of time and
frustration.
Watch external dependencies
A lot of
times, a script that runs well locally may not work on bootstrapping. Think of
various conditions that the instance relies on externally, think of them as
necessary conditions for bootstrapping to run successfully:
- Internet access from VPC
- Security groups and policies applied to the instance
- Instance role and access privilege
- DNS
- External data store access protection
The more
sophisticated automation capabilities become, the more components are involved
in a complete sequence of events. Later, one process may pass variables to
another. There will be more error-handling, more nested templates, parameters,
more code, conditions, etc… But every
journey starts from somewhere, the lessons learned from bootstrapping provide a
good first step.
No comments:
Post a Comment