Saturday, 1 October 2016

Calculate Real Time Percentiles On Streamed Data

When monitoring the operational performance of a sub system or component one can use the PerformanceCounter classes to define a custom counter. The PerformanceCounter classes can create counters that can measure rates, averages, percentages, differences or instantaneous values.These counter types cover a broad range of measurement use cases.

One of the common features of these performance counter types is that they are single values computed at an instance of time. Without recording successive value and aggregating the results they cannot provide any indication of the distribution of the values measured.

Often when monitoring the operational performance of a subsystem it is extremely useful to determine the values of the high order percentiles e.g. 90th, 95th, 99th. One of the difficulties in determining the values for these percentiles on a operational system that produces a continuous stream of metric values, for example response time, is the storage requirements for the sample data.

There are a number of algorithms that have been proposed that can compute percentile values on a stream of data values utilising a limited of fixed amount of storage space. After some investigation I decided to implement the P2 algorithm (P squared algorithm).

It is beyond the scope of this post to discuss the P2 algorithm in detail. It is however worth outlining some of the pros and cons of P2 algorithm.

Pros

  • The amount of memory per percentile value calculated is fixed
  • The execution time for calculating a given percentile when a new observation is added is linear

Cons

  • The algorithm does not store the observations there for it performs an estimation of a given percentile. The estimate will deviate from the actual value. The P2 does not have an upper bound for this error. In practical terms the deviation is not that significant as will be shown later in this post

P2 Implementation

To use the PSquared  type to calculate a number of percentiles an instance is constructed passing a list of percentiles expressed as values in the range of 0 – 1.

var pSquared = new PSquared(new List<double> {0d, 0.1d, 0.2d, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1});

As samples of data are generated or measured they are added to the instance

pSquared.AddSample(d);

To calculate a given percentile value call the Result method passing the percentile to calculate

var percentile = pSquared.Result(0.9);

P2 Performance

Using test data generated with R using the following functions x <-runif(200), quantile(x). This produced test data and expected quantiles values to compare against those generated by the P2 implementation.

Expected Actual % Error
0.002780183 0.00471922292652855 -37.7260207657616
0.250419949 0.247412034395431 0.803984453455285
0.515937582 0.508948571191173 0.90717849799471
0.757101159 0.747366391997607 0.860886247617353
0.996617917 0.992925258643177 0.2473180945208

As can be seen from the results that the error in all percentile is less than 1%. When the execution time is measured per call to the AddSample() method the average execution time is 1 us.

The code is available here: GitHub Statistics