Standard Deviation Filter for ESPHome

The standard deviation of a set of sensor readings measures the variation in the sensor’s observations. A standard deviation close to zero implies the sensor observations tend to be near the mean/average of all observations. In other words, the measurements are consistent and not dispersed. A standard deviation much larger than zero implies the opposite. In other words, the observations are dispersed and not consistent. If we compute the standard deviation from the set of measurements from a sensor, we can determine whether or not those observations are consistent.

The following code for ESPHome uses a custom lambda filter to compute the standard deviation of measurements from a sensor. It applies the filter to a copy integration sensor, which requires another defined sensor with a designated id. We could apply the lambda filter directly to a sensor itself, but then ESPHome would send only the standard deviation of the measurements to Home Assistant and not the actual measurement value. The parameters window_size_ and send_every_ correspond to the configuration values window_size and send_every for any of the built-in ESPHome sensor filters.

sensor:
  - platform: copy
    name: "Standard Deviation of measurements"
    source_id: another_sensor_id
    filters:
      - lambda: |-
          // max measurements to store for computing standard deviation
          const uint8_t window_size_ = 60;
          // compute and send the standard deviation after this many measurements
          const uint8_t send_every_ = 15; 

          static std::deque<float> queue_;
          static uint8_t send_at_ = 0;
          
          // If we have more entries in queue_ than the window_size_, 
          // then pop them off
          while (queue_.size() >= window_size_) {
            queue_.pop_front();
          }

          // add the newest reading to queue_
          queue_.push_back(x);

          if (++send_at_ >= send_every_) {
            send_at_ = 0;

            float Ex = 0.0;
            float Ex2 = 0.0;
            size_t count = 0;

            float K = queue_.front();
            
            for (auto v: queue_) {
              if (!std::isnan(v)) {
                // Welford's algorithm to avoid catastrophic cancellation
                //  - This is achieved by subtracting the oldest reading from 
                //    each measurement. If not done, then the sum of the 
                //    measurements squared and the square of the measurements 
                //    summed may be quite large, and their difference can be 
                //    problematic resulting in catastrophic cancellation

                // counts valid measurements
                count += 1;
                // sums the measurement minus the oldest reading
                Ex += v - K;
                 // sums the measurement minus the oldest reading squared
                Ex2 += pow(v-K,2);
              }
            }

            float standard_deviation = NAN;
            // If we have at least one valid reading, then compute the 
            // variance and standard deviation, otherwise it will remain NAN
            if (count) {
              float variance = (Ex2 - pow(Ex, 2)/count) / (count-1);
              // standard deviation is the square root of the variance
              standard_deviation = sqrt(variance); 
            }

            return standard_deviation;
          }
          return {};

We compute the standard deviation by taking the square root of the variance. If we naïvely compute the variance, then we may encounter catastrophic cancellation. Using the naïve approach, we would compute the sum of the measurements and the sum of the squared measurements. Computing the variance involves taking the difference between the sum of the squared measurements and the sum of the measurements that are then squared and divided by the number of measurements. If the measurements are large values to start, and if there are a large number of measurements, then these two numbers can be very similar, which can cause cancellation to a degree where the floating-point arithmetic cannot be precise enough. We can avoid this by using Welford’s algorithm to compute the variance in a single pass.