Saturday, 1 October 2016

Calculate Real Time Percentiles On Streamed Data

When monitoring the operational performance of a sub system or component one can use the PerformanceCounter classes to define a custom counter. The PerformanceCounter classes can create counters that can measure rates, averages, percentages, differences or instantaneous values.These counter types cover a broad range of measurement use cases.

One of the common features of these performance counter types is that they are single values computed at an instance of time. Without recording successive value and aggregating the results they cannot provide any indication of the distribution of the values measured.

Often when monitoring the operational performance of a subsystem it is extremely useful to determine the values of the high order percentiles e.g. 90th, 95th, 99th. One of the difficulties in determining the values for these percentiles on a operational system that produces a continuous stream of metric values, for example response time, is the storage requirements for the sample data.

There are a number of algorithms that have been proposed that can compute percentile values on a stream of data values utilising a limited of fixed amount of storage space. After some investigation I decided to implement the P2 algorithm (P squared algorithm).

It is beyond the scope of this post to discuss the P2 algorithm in detail. It is however worth outlining some of the pros and cons of P2 algorithm.


  • The amount of memory per percentile value calculated is fixed
  • The execution time for calculating a given percentile when a new observation is added is linear


  • The algorithm does not store the observations there for it performs an estimation of a given percentile. The estimate will deviate from the actual value. The P2 does not have an upper bound for this error. In practical terms the deviation is not that significant as will be shown later in this post

P2 Implementation

To use the PSquared  type to calculate a number of percentiles an instance is constructed passing a list of percentiles expressed as values in the range of 0 – 1.

var pSquared = new PSquared(new List<double> {0d, 0.1d, 0.2d, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1});

As samples of data are generated or measured they are added to the instance


To calculate a given percentile value call the Result method passing the percentile to calculate

var percentile = pSquared.Result(0.9);

P2 Performance

Using test data generated with R using the following functions x <-runif(200), quantile(x). This produced test data and expected quantiles values to compare against those generated by the P2 implementation.

Expected Actual % Error
0.002780183 0.00471922292652855 -37.7260207657616
0.250419949 0.247412034395431 0.803984453455285
0.515937582 0.508948571191173 0.90717849799471
0.757101159 0.747366391997607 0.860886247617353
0.996617917 0.992925258643177 0.2473180945208

As can be seen from the results that the error in all percentile is less than 1%. When the execution time is measured per call to the AddSample() method the average execution time is 1 us.

The code is available here: GitHub Statistics

Wednesday, 28 September 2016

Interoperability between Nanopb and googles Protocol Buffers

As part of a project to build an AVR controlled toaster oven reflow controller I decided to use Googles Protocol Buffers and Nanopb to serialise the command and control data between the .net pc software and the atmega32u2 based control board. The details of the toaster oven reflow controller will be covered in a separate blog entry.
There are where a number of compatibility issues between the two libraries that I found workarounds for. I didn't find a huge amount of information about this topic on line so I decided to share what I found.

Version Compatibility

The first issue is one of Protocol Buffers and Nanopb protocol version compatibility. Nanopb only supports version 2, proto2 definitions, and the current .net Google.Protobuf only supports version 3, proto3 definitions.
For nanopb a message must be defined as follows:
syntax = "proto2";

message Request {

    enum RequestType {
        REQUESTONE = 0;
        REQUESTTWO = 1;

    required RequestType Command = 1;
For Google Protocol Buffers the corresponding message must be defined as follows:
syntax = "proto3";

message Request {

    enum RequestType {
       REQUESTONE = 0;
       REQUESTTWO = 1;

    RequestType Command = 1;
The version 3.0.0 release of Google Protocol Buffers proto2 is not supported, the required keyword has been deprecated as it not supported in proto3.

Stack space and error strings

Nanopb is designed to run on memory constrained devices. The atmega32u2 device has 1024 bytes of SRAM available for stack and heap storage. I discovered during the course of testing the controller that the code on the atmega32u2 device could not decode messages that contained more than 3 or 4 fields. Using the debugger I determined that this was due to stack overrun. Luckily the only side effect as there was not a huge amount of free SRAM memory available on the device. Looking at the memory usage of the Nanopb code I could see that there is some overhead in the definition of the fields for a message. Each field defined consumes 10 bytes of SRAM for the number of fields defined in the messages that was 240 bytes.

When the atmega32u2 code is complied the static analysis of the compiler indicates the following:

Program Memory Usage     :    17732 bytes   54.1 % Full
Data Memory Usage         :    747 bytes   72.9 % Full

Looking through the code for Nanopb I noticed that the error strings in the stream methods where not loaded from flash memory. Which is default behaviour in AVR code unless steps are taken to instruct the compiler, using the PROGMEM macro to locate the literal string constants in flash.

PB_RETURN_ERROR(stream, "end-of-stream");

Launching the debugger and browsing through the IRAM of the device shows the strings taking up memory.

Nanopb does support turning off error messages by using a compiler flag PB_NO_ERRMSG. Defining this flag for the build results in the following:

Program Memory Usage     :    16248 bytes   49.6 % Full
Data Memory Usage         :    377 bytes   36.8 % Full

Basically a small, ~5% reduction in flash memory but a 50% reduction in SRAM usage.