Scheduled VM Shutdown/Startup with Azure Automation and Tags

I wanted to take a look at how to save some money with IaaS running on Azure by shutting down certain VMs outside of business hours.

So lets start with some requirements:

  • I shouldn't need to maintain a list of VMs and how to handle them. The VM metadata should inform the desired shutdown/startup behaviour.
  • The scheduler shouldn't depend on IaaS itself to run and should leverage an Azure platform offering.

Seems pretty simple right?

To meet my first requirement I simply added a "operation" tag to my IaaS VMs.

  • "all" which indicated a 24/7 VM and,
  • "busonly" which indicated it should only run to support business hours operations.

To meet my second requirement I decided to look at using Azure Automation runbooks to do my startup/shutdowns and my scheduling.

Azure Automation is pretty easy to work with, its basically just a powershell workflow execution engine, so I quickly bashed together some assets and a basic workflow script to shut down some VMs

The workflows can use stored variable values and credentials that can be shared across runbooks. In this case I'm storing the credentials for the service account I use to execute the jobs, and the subscription ID of my target subscription.

workflow Shutdown-BusOnlyVMs  
{
$subscriptionID = Get-AutomationVariable -Name 'mysubid'
$azureCred = Get-AutomationPSCredential -Name 'myazureautomationserviceaccount'

Login-AzureRMAccount -Credential $azureCred  
Select-AzureRmSubscription -SubscriptionId $subscriptionID

$migrationVMs = Find-AzureRmResource -TagName "operation" -TagValue "busonly" | where {$_.ResourceType -like "Microsoft.Compute/virtualMachines"}

foreach($VM in $migrationVMs)  
{
    $vmName = $VM.Name
    $vmRG = $VM.ResourceGroupName
    $VMDetail = Get-AzureRmVM -ResourceGroupName $vmRG -Name $vmName -Status | Select-Object -ExpandProperty StatusesText | convertfrom-json
    $vmPowerstate = $VMDetail[1].Code
    if($vmPowerstate -like "PowerState/running")
    {        
        write-output "$vmName powerstate is $vmPowerstate, stopping VM"
        Stop-AzureRmVM -Name $vmName -ResourceGroupName $vmRG -Force

    }
}        
}

You may notice that the output of the Get-AzureRMVM gets handled in a slightly unusual way. When executing the command from your local powershell install you will get json based objects that can be traversed as normal powershell objects.

Get-AzureRmVM -ResourceGroupName $vmRG -Name $vmName -Status | Select-Object -ExpandProperty StatusesText  

Should return an array of objects that describe the status of the VM provisioning and power state. I discovered that, weirdly, Azure Automation's powershell context returns them as objects with no values, and then a separate matching value called .[property]Text which holds the json that ARM returns.

So you end up with

  • returnedvariable.Statuses as an array of empty objects
  • returnedvariable.StatusesText as the JSON output from ARM.

(I'm hoping a powershell grand wizard will explain this behavior to me one day)

Luckily, Powershell has a native JSON interpreter so we pipe the output of StatusText to convertfrom-json and we're in business again.

From there it's simply a matter of finding the object property that describes the current power state, and shutting down any VMs that return as "PowerState/running".

In Azure Automation I have this attached to a schedule that executes at 8PM every day.

The startup script that starts everything up again at the start of each day, as it turns out, has some additional complications.

workflow StartUp-BusOnlyVMs  
{
Function Get-LocalTime($UTCTime)  
{
    $strCurrentTimeZone = "AUS Eastern Standard Time"
    $TZ = [System.TimeZoneInfo]::FindSystemTimeZoneById($strCurrentTimeZone)
    $LocalTime = [System.TimeZoneInfo]::ConvertTimeFromUtc($UTCTime, $TZ)
    Return $LocalTime
}
$day = Get-LocalTime (get-date).ToUniversalTime()

$day = get-date -date $day -uformat "%A"
if(($day -like "Saturday") -or ($day -like "Sunday"))  
{
write-output (get-date -uformat "%Y / %m / %d / %A / %Z / %T")  
write-output "Its a $day in Australia, do nothing"  
}
else  
{

write-output (get-date -uformat "%Y / %m / %d / %A / %Z / %T")  
write-output "Its $day in Australia  - starting VMs"  
$subscriptionID = Get-AutomationVariable -Name 'prd1subscsriptionid'
$azureCred = Get-AutomationPSCredential -Name 'Azure Automation Service Account'
Login-AzureRMAccount -Credential $azureCred  
Select-AzureRmSubscription -SubscriptionId $subscriptionID

$migrationVMs = Find-AzureRmResource -TagName "operation" -TagValue "busonly" | where {$_.ResourceType -like "Microsoft.Compute/virtualMachines"}

foreach($VM in $migrationVMs)  
{

    $vmName = $VM.Name
    $vmRG = $VM.ResourceGroupName

    $VMDetail = Get-AzureRmVM -ResourceGroupName $vmRG -Name $vmName -Status | Select-Object -ExpandProperty StatusesText | convertfrom-json

    $vmPowerstate = $VMDetail[1].Code
    if($vmPowerstate -like "PowerState/deallocated")
    {

        write-output "$vmName powerstate is $vmPowerstate, starting VM"
        Start-AzureRmVM -Name $vmName -ResourceGroupName $vmRG

    }
}

}

}

The first thing you'll notice is that there's a time conversion function. This is needed for two reasons:

  1. Azure Automation's scheduler doesn't support weekday only scheduling, only "every day". This means you need to handle weekday only processing inside the workflow script.
  2. Azure Automation offers scheduling in localized time zones, but the jobs themselves still execute in UTC. If you're like me, and don't live in UTC, you will need to compensate for the UTC offset when calculating if it's still a business day when you're executing your script.

I slightly modified Tao Yang's excellent timezone conversion script to universally convert to UTC (incase Azure decides to change how it executes the script) and then convert it to my time zone.
From there, I check if its a weekend, and if it is, I do nothing. Otherwise, I run through the same logic as the shutdown script but in reverse. If the VM powerstate is "PowerState/deallocated" then I start the VM up.

This workflow then executes at 8AM every day, giving my business hours VMs roughly 12 hours of business day coverage, which hopefully will cater for all the early starters and late leavers.

This covers the basics of the requirements that I had. As for continuous improvement, I think the first revisit will be asynchronous shutdowns and startups. Right now the commands are execute in serial which means shutting down 20 VMs can take some time as each shutdown waits for the previous shutdown to complete.
I think building a shutdown and startup runbook and passing it action and VM parameters asynchronously will be the best way to solve this, although I wander if this will end up costing me more or less minutes in Azure Automation utilisation (pretty sure its more).
Some other improvements would be passing the credentials and subscription as parameters to the runbooks so they are more modular.

As an aside, to minimise the possible blast radius of the script going wrong, the service account I'm using the execute the scripts only has VM contributor rights on the resource groups that hold the non 24/7 VMs. Any production critical VMs are held in other groups which the account has no visibility or write access to.

comments powered by Disqus