The ProblemLet’s follow the steps leading to scaling up application. First, conditions set by one of your monitoring alarms should be met. Generally, this is the consumption of one of the core resources like CPU or system memory exceeding configured threshold. Then, scaling up policy takes part and launches a new instance using specified AMI. You have to craft that AMI in such a way that will bring new instance in a ready to serve condition.Presumably, you have nicely working preconfigured AMI, but you have to bring your application code to the up-to-date state. Otherwise… Well, there is no “otherwise”, things must be correct on a new instance, and you simply can’t do this manually – this is ASG, and it should work automatically.
Chef Server to the rescueHere comes Chef Server that will help you to deal with it. The solution described below will also work nicely for projects without ASG.When using Chef Server, you’ll have chef-client agent on each instance. According to the documentation, you’re presented with two options for running it: it could run either as a daemon or cron task… Nice! You’re done. Your AMI is only required to have validation.pem key in order to register new instance on Server.Wait a second, does this mean that chef-client will perform useless runs most of the time? Yes, it is. Actually Chef architecture preserves you from useless work if the state of a node is up to date, but there’s a Deploy resource which probably you’ll be using actively. Provider for this resource will run callbacks no matter what is the current state of the deployed code on the node.Also keep in mind that you have to manage registering/unregistering node on Server, otherwise you will end up with trash stored on a server.
Tweaking Chef-ClientSo what would be the best to do? First, we decided to create our own init script for chef-client. Its responsibility is to register and converge node to the desired state during boot up and unregister node on Server during reboot/shutdown. Second, we opted to run chef-client only when we need, so no daemons or cron tasks.If you’ve installed Chef from package, before you proceed, you have to stop daemon and remove links from rc.d directories:
Here are the snippets from init script template used by our cookbook for its management:
sudo /etc/init.d/chef-client stop
sudo mv /etc/init.d/chef-client /etc/init.d/chef-client.disabled # or remove it completely
sudo update-rc.d chef-client remove
Overview of what it does:
NODE_ENV="<%= node.chef_environment %>"
INSTANCE_ID=`curl -s http://169.254.169.254/latest/meta-data/instance-id`
# aggregate options etc
options="-E $NODE_ENV -N $NODE_NAME -o $RUN_LIST -S $CHEF_SERVER_URL"
export SKIP_RESQUE_RESTART=1 # because we're at boot up stage
chef-client $options > $LOG_FILE
echo "`date +[%FT%T%:z]` `knife node delete $NODE_NAME -y -u $CHEF_CLIENT -k $CLIENT_KEY -c $CLIENT_CFG`" >> $LOG_FILE
echo "`date +[%FT%T%:z]` `knife client delete $NODE_NAME -y -u $CHEF_CLIENT -k $CLIENT_KEY -c $CLIENT_CFG`" >> $LOG_FILE
rm $CLIENT_KEY $CLIENT_CFG && echo "`date +[%FT%T%:z]` Removed client key at $CLIENT_KEY and config at $CLIENT_CFG" >> $LOG_FILE
echo "chef_server_url \"$CHEF_SERVER_URL\"" >> $CLIENT_CFG
echo "node_name \"$NODE_NAME\"" >> $CLIENT_CFG
case "$1" in
log_daemon_msg "Starting $DESC"
register_node || errcode=$?
log_daemon_msg "Stopping $DESC"
unregister_node || errcode=$?
- sets node name for instance (we have next naming convention project-environment___instance_id)
- creates and configures client.rb used by chef-client (sets two vital attributes – node_name and chef_server_url)
- performs initial run of chef-client during boot up
- unregisters instance on Server during reboot/shutdown
After covering startup/shutdown cases with this script, we can safely achieve the second goal – triggering chef-client via knife ssh command whenever we need.
chmod +x /etc/init.d/chef-client
update-rc.d chef-client defaults 80 20
ConclusionsSummarizing above said, your AMI must have:
- validation.pem key in order to register an instance on Server
- init script for chef-client