Home  Tutorials  System Center Orchestrator


How to use Orchestrator to automatically recover Website failures by checking the HTTP status code (EN)

Introduction

  

When a Website is in HA (High Availability) mode, the Website HTTP status code may differ from a server to another if one or multiple servers are experiencing problems. These failures may not be noticed if the frontal Load Balancer have detected them and initiated a failover of the traffic by excluding the faulty servers from the Load Balancing pool. This “hides” the problems so that customers do not notice the failure but the Middleware administrator should be able to quickly recover the faulty servers to avoid running in degraded mode.

 

This article shares a way to use Orchestrator to get the HTTP status code for a Website running on a specific Web server and initiate recovery actions based on the status code it gets. This is explained through a scenario for CONTOSO Company.

 

 

 

Scenario

 

CONTOSO is a company hosting a critical business Web application that is running on Adobe ColdFusion 10. Adobe ColdFusion is running on Microsoft IIS and the URL is http://app1.contoso.com/ http://social.technet.microsoft.com/wiki/cfs-file.ashx/__key/communityserver-components-sitefiles/10_5F00_external.png.

 

When the website is working properly, ColdFusion is redirecting traffic from http://app1.contoso.com/ http://social.technet.microsoft.com/wiki/cfs-file.ashx/__key/communityserver-components-sitefiles/10_5F00_external.png to http://app1.contoso.com/home/ http://social.technet.microsoft.com/wiki/cfs-file.ashx/__key/communityserver-components-sitefiles/10_5F00_external.png.

 

CONTOSO is using Hyper-V as virtualization solution and SCVMM to manage VMs.

 

CONTOSO noticed some failures on their Web servers for http://app1.contoso.com/ http://social.technet.microsoft.com/wiki/cfs-file.ashx/__key/communityserver-components-sitefiles/10_5F00_external.pngand they have identified the required workarounds. A problem ticket was created internally and CONTOSO engineers are working on investigations to identify the root cause. In the meantime, CONTOSO would like to apply automatically the workarounds to hide the known problem from their customers.

 

Below are the results of the observations of CONTOSO engineers about HTTP codes received when querying http://app1.contoso.com/ http://social.technet.microsoft.com/wiki/cfs-file.ashx/__key/communityserver-components-sitefiles/10_5F00_external.pngservers.

 

HTTP Code

Comment

Workaround

302

The Website is working properly and the redirection is done as expected.

No action is required

200

The Website redirection is not working. Instead, IIS is responding but with a blank page

ColdFusion service "ColdFusion 10 Application Server Main Instance" should be restarted

500

The Website is facing an internal server error

The server should be rebooted

No answer

The Website is not responding

The server status should be checked on SCVMM. If it is not running then it should be started

  

CONTOSO have then requested the assistance of a Microsoft partner to study, propose and implement a solution that will allow them to do the required workaround automation.

 

 

 

Solution

 

Orchestrator could be used to support the requirements of CONTOSO and the automation. The implementation would need to include:

  • The detection of failures by checking the HTTP status code per server
  • The automatic recovery when a failure is detected  

In a real world implementation, including a mail notification to administrators that informs about the change of the status and the automatic actions that were executed would be very helpful to keep track of events. This is not covered in the implementation but is easy to add.

 

 

 

Implementation

 

For the implementation, all you need to have is a Runbook per server hosting the Web application. Every Runbook will separately check the health status of its server and will execute the workaround when a failure was detected.

 

To create a Runbook for a Web server, you need to use nine (9) activities:

  • Monitor Date / Time: It will allow you to specify the time of when the Runbook need to start running and the frequency for the checks and execution of workarounds.

 

 

  • Check Schedule: It allow you to specify the slot of days and hours when the Runbook can start running (By combining the settings in the activity and the ones from the previous activity, you will be able to precise the dates and times of when your runbook can start running)

 

  

  • Run .Net Script: It allows you to specify the variables to use:
    • Servername: This variable is used to specify the server name
    • Hostname: This variable is used to specify the Website DNS name
    • Port: This variable is used to specify the port on which the server is listening for Website
    • Path: This variable is used to specify the path that will be checked. If you specify “/” then the main page will be checked
    • Servicename: This variable is used to specify the name of the ColdFusion service to restart (mentioned previously in one of the workarounds)

 

 

 

  • Run .Net Script: This second Run .Net Script activity will allow getting the HTTP status code for the Website on the queried server by running the following script:

 

#################variables#################

 

$servername = "Servername from the previous activity"

 

$hotstname = "hostname from the previous activity"

 

$port = port from the previous activity

 

$path = "Path from the previous activity"

 

###########################################

 

#################Main######################

 

$http = @"

  GET $path  HTTP/1.1

  Host:$hotstname

  `n`n

"@

 

$httpanswer = $http | C:\SendTcp\Send-TcpRequest.ps1 $servername $port

 

$httpanswer = [int]$httpanswer.Substring(9,3)

  

The activity will store the HTTP code it gets in httpanswer variable

 

 

 

Remark: The Orchestrator activity will use Send-TcpRequest.ps1 Powershell script (http://www.leeholmes.com/blog/2009/10/28/scripting-network-tcp-connections-in-powershell/ http://social.technet.microsoft.com/wiki/cfs-file.ashx/__key/communityserver-components-sitefiles/10_5F00_external.png). In our case, we have downloaded it and added it under C:\SendTcp\ folder. If you would like to use a different path, do not forget to update it in the script.

 

The previous Runbook allow us to get the HTTP status code that will be used to execute the Workarounds. The following Runbooks are for the executed actions when a failure is detected.

 

Action 1: Restart of the ColdFusion service "ColdFusion 10 Application Server Main Instance" when an HTTP code 200 is received

  • Start/Stop Service: It will restart the ColdFusion service when the received HTTP status code is equal to 200

 

 

The link between the second Execute .Net Script and Start/Stop Service activities need to be set with the following include condition: httpanswer from Run .Net Script equals 200. This means that the ColdFusion service will be restarted only when the redirection is no longer done.

 

 

Action 2: Reboot of the server when an HTTP code 500 is received

  • Restart System: It will restart the server when the received HTTP status code is equal to 500

 

 

The link between the second Execute .Net Script and Restart System activities need to be set with the following include condition: httpanswer from Run .Net Script equals 500. This means that the ColdFusion service will be restarted only when the redirection is no longer done.

  

 

Action 3: The server status will be checked on SCVMM. If it is not running then it should be started

  • Get Computer/IP Status: It checks the server status. If it is not responding then the next activity will be launched.

 

  

The link between the second Execute .Net Script and Get Computer/IP Status activities need to be set with the following include condition: Run .Net Script returns warning or failed. This means that the connectivity check will be done if no answer was received from the server for the Website.

 

  

  • Get VM: It will allow searching the VM in SCVMM. This is used to get the VM status

 

 

The link between the second Get Computer/IP Status and Get VM activities need to be set with the following include condition: Get Computer/IP Status returns warning or failed. This means that Orchestrator will start checking the VM status in SCVMM only when it is not reachable.

 

 

  • Start VM: It will start the VM in SCVMM if it is not running.

 

 

The link between the second Get VM and Start VM activities need to be set with the following include condition: Status from Get VM equals Stopped. This means that Orchestrator will attempt to start the VM in SCVMM only when it is stopped.

 

  

Below is a screen capture of all the activities included in the Runbook:

 

  

 

 

Conclusion

 

This article shared how Orchestrator can be used to apply workarounds when a Website fails on a specific server. A per-server Runbook is required with the way we organized the activities and checks / workaround executions will be running in parallel. The described implementation can be improved by adding mail notifications to administrators and more advanced checks and recovery actions. The goal behind this article is to provide easy steps to demonstrate how Orchestrator can be used for automatic recovery of problems occurring on Web servers.