[recipe] Downtime percentage over 1h windows : Circonus Support

In the following example, we will compute the downtime of a service over the last hour.

To do so, we will probe our service API with an external check every minute to see if the service is running.

We can use a HTTP check and a duration metric for that purpose:

As you can see there are several times where the service not reachable during the view period.

We can isolate those times using the `is_missing()` CAQL function. It returns 1 if the value was missing and 0 if it was available:

Now, the downtime of a service during a reporing period is the number of times the service was not available divided by the length of the reporing period.

This is precisely the average of the values of the is missing function. Hence we can calculate the downtime over a reporting period as a rolling mean value:

The peek value of the green line at 0.25 tells you that over the last hour of time 20:07 to 21:07, your service was down 0.25 = 25% of the time.

To clean up the graph, we remove the second line and multiply the downtime ratio by 100 to get a percentage. The result looks as follows, and can be inspected live here.

Not using Circonus,yet? Get a free account here: circonus.com/free-account.

Circonus Support

How can we help you today?

[recipe] Downtime percentage over 1h windows Print

How can we help you today?

[recipe] Downtime percentage over 1h windows Print

Related Articles