Zero Downtime Deployments In An IIS World

This article was originally published on OfferZen.

Anne Gonschorek was a huge help in making this article come together. She’s looking for more contributers to the OfferZen blog. Take a look at Source for more info.


We live in a digital world where service downtime is not tolerated by customers, but at the same time, not everyone has the luxury of cold standby environments or Kubernetes clusters to enable this. I recently had the task of finding a way to make zero downtime deployments work using good old Microsoft IIS, as well as automating the deployments behind a single click. I’m sure there are others out there with similar constraints. If you’re one of them, then this is for you.

Project Codename: “Avoid weekend deployments”

Until recently, our team has been doing deployments out of the regular business hours (i.e just before 9am or after 5pm). This has worked really well for us, but the honeymoon is coming to an end. We’re now building systems that need to be up all the time as we never know when a customer may need to access them. The traditional deployment time in other parts of the company is Sunday morning at 5am, but ain't nobody got time for that.

Now, I know what you’re thinking: just use Docker with Kubernetes and set up a cluster in “The Cloud” so you can do rolling hot deployments! Easy. I really wish that was an option for us, but unfortunately, it just isn’t. Here’s why:

  • We may not host in the cloud because we work at a big financial institution with strict data requirements.
  • We may soon be able to use Docker for our deployments, but only Linux based and our stack is .net running in IIS. We’re moving slowly to .net core, but aren’t quite there yet.
  • Even if we used containers, we don’t yet have any high availability tools in place to do hot deployments.
  • We do run multiple instances of our production systems behind a load balancer, but unfortunately we don’t have any control over the load balancer as it is managed by the company’s IT infrastructure team.

Our saviour: Blue/Green Deployments

My research led me to a concept called Blue/Green deployments as well as an article on how to implement them with IIS on a single server.

The general concept of a blue/green deployment is the following:
  1. We set up two instances of the application (blue and green)
  2. We only expose one instance to customers at a time(live)
    The other instance is a staging instance that is running but inaccessible from the outside
  3. We deploy to the staging instance
  4. We warm up the staging instance so that all IIS boot up processes are complete
  5. We bring the staging instance online and for a brief period of time both instances are available to customers
  6. We drain the the old live instance by allowing all existing connections to finish while not allowing new connections
  7. We take the old instance offline and the other instance becomes the new default live

Here’s how you do this for yourself:

Blue/green deployment with IIS on a single server
IIS Prerequisites

First of all, we will need to install 2 modules into IIS:

Application Request Routing (aka ARR)

ARR enables IIS to work as a reverse proxy in front of a farm of IIS instances. In normal cases the other instances would be on other servers like a true server farm. This is a valid option for blue/green deployments but in our case, we wanted to keep it in our existing infrastructure, so the instances will all be on the same machine.

Url Rewrite

The server farm is not exposed on a port to the outside world by IIS. We need to use the Url Rewrite module to route traffic to the Server Farm instead of directly to the regular web applications. This is what gives us the power to take an instance offline without affecting users.

Once both modules are installed, we restart the IIS management console.

Configuring IIS

Before we get started with the new IIS configuration, we need to decide a few things:

  • What do we call our server farm? There isn’t really a science to choosing this name, it will be used in a few places though so keep it simple and descriptive. (in this example I used “my-website”)
  • What ports will each of the instances run on? Make sure they’re available on your server. (e.g. 8888,9999)
  • Where will each instance be deployed? IIS uses “C:\inetpub\wwwroot” by default but your company may have different standards. Use whatever works for your team. Just remember that you need to folders and I would recommend differentiating them with a blue/green suffix (e.g. c:\inetpub\wwwroot\my.website.blue)

Now let’s set our blue and green instance up. ARR requires that the different instances each have their own host names but we want to run these on the same machine. That’s why we trick the machine by editing the hosts file:

Open “C:\Windows\System32\drivers\etc\hosts” in a text editor and add the following:

127.0.0.1    my-website  
127.0.0.1   my-website-blue  
127.0.0.1   my-website-green  

Now, back to IIS. You’ll see a new entry on the left titled “Server Farms”. Right-clicking allows you to add a new farm:

Create-Server-Farm

Give the new farm a name as decided earlier. Then we need to define the servers in the farm. These names must match the hostnames added to the hosts file earlier:

Add-Server

Click the “Add” button, and then capture similar details for the green instance (with the green hostname and port). Then click “Finish”. Once done, expand the server farm name and click on “Servers”. You should see something like this:

Servers

Configuring websites

Next, we have to configure two websites, each pointing to their own file systems and listening on their own ports. In our case, we configured as follows (using the locations and ports decided on earlier):

MyWebsite.Blue

  • C:\inetpub\wwwroot\mywebsite.blue
  • Port 8888

MyWebsite.Green

  • C:\inetpub\wwwroot\mywebsite.green
  • Port 9999

Sites

You should now be able to open a browser on the server and navigate to:

You’ll notice that the ARR module is not yet routing to these servers if you hit IIS on port 80. To enable this, we’ll need to add a URL Rewrite Rule.

Adding a Url Rewrite Rule

Click on the root node in IIS, locate the icon named “URL Rewrite” and open it. Click “Add Rule” in the top right and configure your rule as follows:

Edit the inbound rule:

Rules1

Set the conditions:

Rules2

Set the server variables:

Rules3

Any traffic directed at http://my-website on port 80 will now be routed to the server farm called “my-website”. ARR will then take over and route to our instances. If you now navigate to http://my-website you should see your site.

If you were to differentiate the two deployments, you’d notice that ARR round-robins between the two sites. This behaviour is nice for load balancing, but not for our use case, because we want to have one instance offline so we can deploy to it without affecting users.

Right click on an instance (doesn’t really matter which one) and select “Take Server Offline.

Now we're all set. If you happen to read the article I mentioned, you’ll see they suggest taking the staging server out of the farm using the health monitoring. I found this didn’t work for us because:

  • The health check runs every second and was causing our request logs to fill up really quickly on both the staging and live instances
  • More importantly, IIS would still occasionally route traffic to the unhealthy node. This would mean the client getting the old version of the app, or if the app pool is turned off to save resources the customer would get an error page.

To get around these issues, I made the following changes:

  • I kept the health check in place, but changed it to only run every 30 seconds
  • I take the staging server offline completely, it is still accessible internally, but ARR stops routing traffic to it With that all set up, it's time to automate the deployment.
Automating all the things - Level up on Powershell

We’ve been on a drive to make sure all of our deployments are automated and triggered with a single click. This ensures deployments are reproducible, predictable and can be performed by someone without a detailed knowledge of the system. We’ve achieved that using our on-premise instance of Microsoft’s Team Foundation Server (also known as Visual Studio Team Services). TFS basically runs on PowerShell scripts, so I’ll focus on the scripts I wrote instead of getting too much into TFS. Hopefully, you will be able to use the scripts on other build servers as well. I’ve picked out parts of the script to highlight how they work.

Powershell Modules and Utilities

There are a few functions that I ended up using all over the place. These have been pulled out into their own modules. There are also a couple of functions used to assist in running the scripts remotely on the target machine. I won’t cover those here but they are in the repo.

IIS Management Module

We need to make use of the PowerShell cmdlets that allow us to query and control IIS.

For this, we first need to load the assembly into our context:

[System.Reflection.Assembly]::LoadFrom("$env:systemroot\system32\inetsrv\Microsoft.Web.Administration.dll")

We can then get an instance of the ServerManager:

$mgr = new-object Microsoft.Web.Administration.ServerManager

Using this, we can get a handle on a specific server farm using its name:

function Get-ServerFarm([string]$serverFarmName)) {  
    # Get the ServerManager
    $mgr = new-object Microsoft.Web.Administration.ServerManager
    # Get the server configuration and find the section managing the farms
    $conf = $mgr.GetApplicationHostConfiguration()
    $section = $conf.GetSection("webFarms")
    $webFarms = $section.GetCollection()
    $webFarm = $webFarms | Where-Object {
        $_.GetAttributeValue("name") -eq $serverFarmName
    }
    // Return the farm matching the name we’re looking for
    $webFarm
}

Once we’ve found the farm, we need to locate the individual servers within the farm:

function Get-Server($serverFarmName,$instanceName) {  
    $webFarm = Get-ServerFarm $serverFarmName
    $servers = $webFarm.GetCollection()

    $server = $servers | Where-Object {
        $_.GetAttributeValue("address") -eq $instanceName
    }
    return $server;
}

Using the above two functions, we can then query and control IIS to coordinate the deployment.

Have a look at the full module for the rest of the code.

Preparation Script

Before we deploy anything, we still need to determine which instance is which. For this, we use a preparation script that queries IIS and returns the current state:

# Use a module function to check if the Blue instance is online
if (Get-ServerOnline $serverFarmName "$serverFarmName-blue") {  
    $result["LiveBlueGreen"] = "Blue"
    $result["LiveDeployPath"] = $bluePath
    $result["LiveServer"] = "$serverFarmName-blue"

    $result["StagingBlueGreen"] = "Green"
    $result["StagingDeployPath"] = $greenPath
    $result["StagingServer"] = "$serverFarmName-green"
}

If blue is not online, it means that green must be the live instance, so we have a similar block getting the state of the green instance.

Deploy

The preparation script helped us discover where we need to deploy our new code. We do this using the normal deployment steps of our deployment tool into $result.StagingDeployPath.

Swap

Now that staging contains the latest version, we can do the actual work of the zero-downtime deployment.

  1. First we need to warm the staging instance up. The application pool has likely been recycled and we don’t want users to experience the delay of starting up. Warming it up will also help us check if there are any configuration errors preventing the instance from starting. This has saved us before when we forgot to make some changes to the server.
  2. # Loop until we have a satisfactory response time
    Do {  
        $time = Measure-Command {
            # Query the site on it’s internal port and measure the response time
            $res = Invoke-WebRequest $stagingSite
        }
        $ms = $time.TotalMilliSeconds
        If ($ms -ge $minTime) {
            Write-Host "$($res.StatusCode) from $stagingSite in $($ms)ms"
        }
    } While ($ms -ge $minTime)
    
  3. Now that it’s warmed up, we can bring the staging instance into the server farm to start processing requests. This is a pretty simple command that just tells IIS to bring it online.
  4. # Make sure it is set to state=Available
    Set-InstanceState $serverFarmName $stagingInstance 0  
    # Bring it 
    Set-ServerOnline $serverFarmName $stagingInstance  
    
  5. Next, we drain the connections to the old live instance and then take it offline by setting its state to “Drain” and then monitoring it’s counters until it is no longer serving requests.
  6. # Keep checking until the live instance is no longer processing requests. 
    $startDate = Get-Date
    Write-Host "Checking requests per second"  
    Do {  
        $currentConnections = Get-RequestsPerSecond $serverFarmName $liveInstance       
        If ($currentConnections -gt 0) {
            Write-Host "Still $currentConnections requests per second"
        }
        else {
            Write-Host "0 requests per second"
        }
    } While ($currentConnections -gt 0 -and $startDate.AddSeconds(10) -gt (Get-Date))
    
    # Now that it is finished servicing its requests, we can take it out of the farm
    Set-ServerOffline $serverFarmName $liveInstance  
    
  7. Now we just double check that everything is ok by querying the IIS status in a similar way to the beginning. We also do some additional checks to make sure each instance is in the state we expect (i.e. Live is online and Staging is not).
Architecture Considerations

Deploying in this way does mean that we needed to make some slight changes to the way that we build our applications. There may be cases where an older frontend tries to access a newer API. Or a newer API accessing a newer version of the database.

For this reason, we will need to ensure that the public API service of each component is always backwards compatible:

  • We never rename or delete a database column until all consumers are updated. We then remove the old columns in subsequent releases.
  • We never remove or break an existing API endpoint, we rather add new ones and deprecate the old ones until they can be safely removed.
Future Enhancements

We have this automated process running for a few of our deployments and are slowly rolling it out to the rest. I’ve also had some time to think about some enhancements I’d like to explore:

  • We’d like to find some way to pause the deployment while the staging instance has been updated but not yet live. We could then access it internally and verify that everything is ok before doing the final switch.
  • I’d like to share this with other people using TFS by publishing the scripts as a TFS extension. I’d love to hear your feedback and suggestions, would this be useful to you or your company? How would you improve it?
Resources

Required IIS Modules

PowerShell Resources

My Powershell skills were pretty basic when starting this project. It took a lot of Googling and experimenting to get things working how I needed. These resources were particularly helpful:

Bonus TFS Integration Resources

When running Powershell scripts on TFS/VSTS, specifically formatted log output can be used to communicate with the build agent.

There is also a PowerShell API that is quite useful and can be used instead of the log formats