Suppose we have a random variable \(x\). Observations arrive in a stream, \(x_t\) indicates the observation at time \(t\). If we have access to all the historical observations, the mean is \(\bar{x}_t = \frac{1}{t}\sum_{i=1}^t x_i\).

If we do not have access to all the historical data, the mean can be estimated online recursively. Using \(\bar{x}_{t-1} = \frac{1}{t-1}\sum_{i=1}^{t-1} x_i\) we get \(\bar{x}_t = \frac{1}{t}\sum_{i=1}^t x_i = \frac{1}{t} \left(\sum_{i=1}^{t-1} x_i + x_t \right) = \frac{t-1}{t}\bar{x}_{t-1}+ \frac{1}{t} x_t.\) We need to keep in memory only the previous estimate \(\bar{x}_{t-1}\) and the counter \(t\).

This recursive estimate does not have any forgetting mechanism. When \(t\) gets large, the contribution of the next \(x_t\) to the estimate becomes very small. In addition, it may happen that the distribution of data changes over time. Therefore, we may want to have an adaptive mean estimate \(\bar{x}_t^\star\), which can be recursively computed using a simple exponential smoothing \(\bar{x}_t^\star = (1-\alpha)\bar{x}_{t-1}^\star + \alpha x_t,\) here \(\alpha \in (0,1)\) is the smoothing weight. The larger \(\alpha\), the faster the forgetting.

Similarly, we can estimate the variance \(\mathit{Var}(x)\). If we have access to all the historical data, sample variance is estimated as \(\mathit{Var}(x)_t = \frac{1}{t-1}\sum_{i=1}^t (x_i - \bar{x})^2\), where \(\bar{x}\) is the mean.

Online recursive estimation is given by \(\mathit{Var}(x)_t = \frac{1}{t-1}\sum_{i=1}^t (x_i - \bar{x})^2 = \frac{1}{t-1} \left( \sum_{i=1}^{t-1} (x_i - \bar{x})^2 + (x_t - \bar{x})^2 \right) = \frac{t-2}{t-1} \mathit{Var}(x)_{t-1} + \frac{1}{t-1} (x_t - \bar{x})^2.\)

Adaptive online estimation is given by \(\mathit{Var}(x)_t^\star = (1-\alpha)\mathit{Var}(x)_{t-1}^\star + \alpha (x_t - \bar{x})^2,\) where \(\alpha \in (0,1)\) is the smoothing weight.