

Update Lifecycle

There are several stages that all jobs (and their associated processes) on each VM go through during a deployment process.

When start is issued¶

Persistent disks are mounted on the VM, if configured and not yet mounted
All jobs and their dependent packages are downloaded and placed onto a machine
pre-start scripts run for all jobs on the VM in parallel
- (waits for all pre-start scripts to finish)
- if bpm is used, bpm's pre-start will run first and has a timeout of 30 seconds
- a jobs pre-start does not time out
monit start is called for each process in no particular order
- each job can specify zero or more processes
- times out based on canary_watch_time/update_watch_time settings
post-start scripts run for all jobs on the VM in parallel
- (waits for all post-start scripts to finish)
- does not time out
post-deploy scripts run for all jobs on all VMs in parallel
- (waits for all post-deploy scripts to finish)
- does not time out

Note

Scripts should not rely on the order they are run. Agent may decide to run them serially or in parallel.

When processes are running¶

Monit will automatically restart processes that failed their associated checks
- A common pattern used is a PID check: when no process ID can be found in any .pid file, or the process ID is not alive anymore, then the process is restarted.
- A usual pitfall arrise when the process ID is not properly written in the .pid file, in which case Monit looses its handle on the actual process state and things start diverging. Using bpm is an effective solution to avoid falling in that trap.

When stop is issued (or before update and subsequent start happens)¶

monit unmonitor is called for each process
pre-stop scripts run for all jobs on the VM in parallel
- (waits for all pre-stop scripts to finish)
- does not time out
- requires BOSH v269+ and minimum Xenial stemcell v315.x
drain scripts run for all jobs on the VM in parallel
- (waits for all drain scripts to finish)
- does not time out
monit stop is called for each process
- times out after 5 minutes as of bosh v258+ on 3302+ stemcells
- if bpm is used, it will send a SIGTERM, wait for 15 seconds for the process to stop gracefully, and if necessary send a SIGQUIT, wait for 2 seconds, and finally send a SIGKILL if anything still lives
post-stop scripts run for all jobs on the VM in parallel
- (waits for all post-stop scripts to finish)
- does not time out
- requires bosh v265+
Persistent disks are unmounted on the VM, if configured

Non-Bosh VM Operations¶

Any deployed VM may be rebooted due to infrastructure disruptions or other operations. In general, the deployment lifecycle hooks are not executed. Only local monitoring is invoked to restart jobs.

The VM reboot occurs, and VM is successfully booted. OS processes and services start.
monit starts running
monit begins starting processes registered. The job's start program is executed as per the monitrc file.

Note

pre-start, post-start, post-deploy are not executed, since the bosh lifecycle is not invoked. It is recommended that a job's monitrc start program perform all operations required to start a job without depending on pre-start executing.