Blog by Railsware

Chef Server and Amazon Auto Scaling Groups

Chef Server and Amazon Auto Scaling Groups

Amazon Auto Scaling Groups (ASG) present a new requirement for applications that will utilize this feature. When a new instance spins up, it has to know what to do.

The Problem

Let’s follow the steps leading to scaling up application. First, conditions set by one of your monitoring alarms should be met. Generally, this is the consumption of one of the core resources like CPU or system memory exceeding configured threshold. Then, scaling up policy takes part and launches a new instance using specified AMI. You have to craft that AMI in such a way that will bring new instance in a ready to serve condition.

Presumably, you have nicely working preconfigured AMI, but you have to bring your application code to the up-to-date state. Otherwise… Well, there is no “otherwise”, things must be correct on a new instance, and you simply can’t do this manually – this is ASG, and it should work automatically.

Chef Server to the rescue

Here comes Chef Server that will help you to deal with it. The solution described below will also work nicely for projects without ASG.

When using Chef Server, you’ll have chef-client agent on each instance. According to the documentation, you’re presented with two options for running it: it could run either as a daemon or cron task… Nice! You’re done. Your AMI is only required to have validation.pem key in order to register new instance on Server.

Wait a second, does this mean that chef-client will perform useless runs most of the time? Yes, it is. Actually Chef architecture preserves you from useless work if the state of a node is up to date, but there’s a Deploy resource which probably you’ll be using actively. Provider for this resource will run callbacks no matter what is the current state of the deployed code on the node.

Also keep in mind that you have to manage registering/unregistering node on Server, otherwise you will end up with trash stored on a server.

Tweaking Chef-Client

So what would be the best to do? First, we decided to create our own init script for chef-client. Its responsibility is to register and converge node to the desired state during boot up and unregister node on Server during reboot/shutdown. Second, we opted to run chef-client only when we need, so no daemons or cron tasks.

If you’ve installed Chef from package, before you proceed, you have to stop daemon and remove links from rc.d directories:

sudo /etc/init.d/chef-client stop
sudo mv /etc/init.d/chef-client /etc/init.d/chef-client.disabled # or remove it completely
sudo update-rc.d chef-client remove

Here are the snippets from init script template used by our cookbook for its management:

NODE_ENV="<%= node.chef_environment %>"
INSTANCE_ID=`curl -s http://169.254.169.254/latest/meta-data/instance-id`
NODE_NAME="${NODE_ENV}___${INSTANCE_ID}"

...
register_node() {
 update_node_config

 # aggregate options etc
 options="-E $NODE_ENV -N $NODE_NAME -o $RUN_LIST -S $CHEF_SERVER_URL"
 export SKIP_RESQUE_RESTART=1 # because we're at boot up stage

 # shoot!
 chef-client $options > $LOG_FILE
...
}

unregister_node() {
 echo "`date +[%FT%T%:z]` `knife node delete $NODE_NAME -y -u $CHEF_CLIENT -k $CLIENT_KEY -c $CLIENT_CFG`" >> $LOG_FILE
 echo "`date +[%FT%T%:z]` `knife client delete $NODE_NAME -y -u $CHEF_CLIENT -k $CLIENT_KEY -c $CLIENT_CFG`" >> $LOG_FILE
 rm $CLIENT_KEY $CLIENT_CFG && echo "`date +[%FT%T%:z]` Removed client key at $CLIENT_KEY and config at $CLIENT_CFG" >> $LOG_FILE
}

update_node_config() {
 > $CLIENT_CFG
 echo "chef_server_url \"$CHEF_SERVER_URL\"" >> $CLIENT_CFG
 echo "node_name \"$NODE_NAME\"" >> $CLIENT_CFG
}

case "$1" in
 start)
   log_daemon_msg "Starting $DESC"
   errcode=0
   register_node || errcode=$?
   log_end_msg $errcode
   ;;
 stop)
   log_daemon_msg "Stopping $DESC"
   errcode=0
   unregister_node || errcode=$?
   log_end_msg $errcode
   ;;
...
esac

Overview of what it does:

Then, this script must be placed to rc.d directories to be executed on different runlevels:

chmod +x /etc/init.d/chef-client
update-rc.d chef-client defaults 80 20

After covering startup/shutdown cases with this script, we can safely achieve the second goal – triggering chef-client via knife ssh command whenever we need.

Conclusions

Summarizing above said, your AMI must have:

That’s it! After completing these steps, you’ll have a correctly working, predictable and easily maintainable ASG.

For the general Chef tips please refer to “Chef: DOs and DON’Ts” article.

Exit mobile version