Histograms

Histogram visualizations in Circonus use a technique called "heat maps." A typical histogram will take many values and place them into bins, representing those bins in a bar chart or pictogram.

You can also see a video describing histograms.

What is a histogram?

Let's assume we have a set of numbers: {11, 17, 21, 21, 23, 26, 28, 29, 31, 31, 33, 34, 36, 36, 36, 41, 42, 43, 44, 47, 49, 49, 49, 50, 51, 51, 56, 57, 63, 64, 68, 73, 74, 78, 82, 85}. We've witnessed these numbers over a period of 1 minute. (For example, let's say they are latencies on database queries in ms, or let's say they are the signal strength of connected wireless client, or let's say they are abandonment times for a website, etc.)

Understanding that the average (or arithmetic mean) of these numbers is 45.25 (their sum, 1629, divided by number of elements in the set, 36) and that the standard deviation is approximately 19.06 tells us something about the data set. However, these number can be quite misleading. The standard deviation has strong and telling properties on normal distributions of data. Real data, in complex systems, rarely presents itself as a normal distribution.

 

To better understand the distribution, we can use histograms. By taking the above data and placing numbers in bins of 5 ([10-15], [15-20], [20-25], etc.), we can then plot the count of samples in each bin. We arrive at a traditional histogram. This classic histogram is illustrated to the right.

This histogram shows the population distribution of the data in question. Within Circonus, we wish to show this richness of data with time as an added dimension. This requires using a visual dimension other than height or width, so we choose color (or more specifically, saturation).

 

By coloring the histogram, we can provide a visual indicator based on something other than height. In the sample illustration to the left, we can see that those bins with no data [0-5) and [90-95) are white and as you progress through each bin, the color (blue in this case) is more saturated when the bin contains more samples. The darkest of the blues are for the 3 bins containing four samples each. Each bin is assigned a color saturation level according to the number of samples it contains.

For the next step, we need to eliminate the use of height (so that we free up a graph dimension for the visualization of time). As you can see, if we made all of the bars in the histogram the same height, we can still see the distribution of data based on color saturation.

 

The darker bars here have more samples. It can take some practice to read this sort of output, but Circonus provides a heads-up display for translation to make this easy.

It is clear in this histogram, that we can no longer easily tell the exact frequency values (because humans can measure spacial differences more accurately than color differences). As the height has no meaning, we can eliminate the height completely:

This is often called a "sparkline" in the visualization world. This line of color points can be considered a terse visual representation of the population density of a given set of samples.

Adding in time

In every Circonus view, time is presented as "progressing" along a horizontal axis; left is "before" right, right is "after" left (just like a timeline). In order to show histograms over time, we need to rotate the histogram sparkline vertically. Each of these vertical color strips represents a set of samples over a period of time. Like the typical numeric graphs, that time period depends on your zoom level.

If you are zoomed out to a year view, those may be 1 day periods, if you are zoomed into a window covering only an few hours, those could be 1 minute periods.

Further details concerning how to understand these in Circonus graphs can be demonstrated in the context of actual histogram graphs.