The Resurrector is a plugin to the Health Monitor. It’s responsible for automatically recreating VMs that become inaccessible.

The Resurrector continuously cross-references VMs expected to be running against the VMs that are sending heartbeats. When resurrector does not receive heartbeats for a VM for a certain period of time, it will kick off a task on the Director (scan and fix task) to try to “resurrect” that VM. The Director may do one of two things:

  • create a new VM if the old VM is missing from the IaaS
  • replace a VM if the Agent on that VM is not responding to commands

Under certain conditions the Resurrector will consider the system in the “meltdown” and will stop sending requests to the Director. It will resume submitting scan and fix tasks to the Director once the conditions change.

Resurrection can be turned off per specific deployment job instance or for all VMs managed by the Director via bosh vm resurrection CLI command.

Note: The Health Monitor deploys with the Resurrector plugin disabled by default. To use it, you must enable the Resurrector plugin in your BOSH deployment manifest.


Enabling and Configuring the Resurrector

To enable the Resurrector:

  1. Change deployment manifest for the Health Monitor (typically colocated with the Director):

    properties:
      hm:
        resurrector_enabled: true
    
  2. Optionally change configuration values:

    • minimum_down_jobs [Integer, optional]: If the total number of instances that are down in a deployment (within time interval T) is below this number, the Resurrector will always request to fix instances. This decision takes precedence to the percent_threshold check. Default is 5.
    • percent_threshold [Float, optional]: If the percentage of instances that are down in a deployment (within time interval T) is greater than the threshold percentage, the Resurrector will not request to fix any instance. Going over this threshold is called “meltdown”. Default is 0.2 (20%).
    • time_threshold [Integer, optional]: Time interval (in seconds) used in the above calculations. Default is 600.
    properties:
      hm:
        resurrector_enabled: true
        resurrector:
          minimum_down_jobs: 5
          percent_threshold: 0.2
          time_threshold: 600
    
  3. Depending on how you configured Director user management, credentials are specified in the user and password properties or using a custom client to authenticate with the UAA.

    Option a) Using UAA User Management

    Define an additional client in the uaa.clients section of your manifest:

    properties:
      uaa:
        clients:
          hm:
            override: true
            authorized-grant-types: client_credentials
            scope: ""
            authorities: bosh.admin
            secret: "hm-password"
    

    Configure the Health Monitor to use the client with the defined secret to authenticate with the Director:

    properties:
      hm:
        director_account:
          client_id: hm
          client_secret: "hm-password"
    

    Option b) Using Preconfigured Users

    Create new Director user so that the Resurrector plugin can communicate with the Director and query/submit information about deployments.

    properties:
      director:
        user_management:
          provider: local
          local:
            users:
            - {name: admin, password: admin-password}
            - {name: hm, password: hm-password}
    

    Configure the Health Monitor to use the HM user and password to authenticate with the Director:

    properties:
      hm:
        director_account:
          user: hm
          password: hm-password
    
  4. Deploy.

Customizing for Your Deployment

For most deployments, you can use the default configuration values. In very small or very large deployments, the default values may need customization, as discussed in the following examples.

Small Deployment

If your deployment consists of only five VMs, you may not want the Resurrector to attempt to recreate your entire deployment in the event of a catastrophic failure. In this scenario, we recommend that you set minimum_down_jobs to 1 or 2.

Large Deployment

If your deployment consists of 1000 VMs, and you use the defaults, the Resurrector notifies the Director to recreate at least five VMs and up to 200 VMs. Depending on your deployment, you may consider even 100 down instances a catastrophic failure. In this scenario, set percent_threshold to 5% so that the Director resurrects 50 instances or fewer.


Disabling the Resurrector

To disable the Resurrector:

  1. Change deployment manifest for the Health Monitor (typically colocated with the Director):

    properties:
      hm:
        resurrector_enabled: false
    
  2. Redeploy.

  3. Optionally remove Director user created for the Health Monitor.


Viewing the Resurrector’s Activity

Since scan and fix tasks on the Director are regular tasks, you can use bosh tasks --no-filter command to view currently running/queued Resurrector’s activity and bosh tasks recent --no-filter to view finished tasks.


Next: Persistent disk snapshotting

Previous: Manual repair with Cloud Check


Contribute changes to this page